Introduction
I recently wanted to explore areas of machine learning that I do not usually interact with as part of my DPhil research on antibody drug discovery. This post explores how to train and deploy a speech generation model for parliamentary speeches in the style of Jeremy Corbyn and Boris Johnson. You can play around with the resulting model at https://con-schneider.github.io/theytalktoyou.html.
Transformers
The state-of-the-art in text generation at the moment are transformer models, with the OpenAI gpt-2 model published in February 2019 being able to generate large coherent blocks of text with impressive accuracy (so accurate in fact that OpenAI refused to release the full model to the public for fear of abuse for the generation of fake news campaigns). For an explanation of transformers, I refer to the OpenAI paper and this blog by Peter Bloem, which gives a nice overview of the underlying theory and practical code samples, including a straightforward explanation of the defining feature of transformer architectures, the self-attention layer.
Collecting a speaker corpus
Since OpenAi released several smaller versions of their gpt-2 model (with 124M, 355M and 774M parameters respectively, rather than the 1.5B parameters of the model showcased in the paper), I aimed to finetune one of these models to be able to generate speeches in the style of MPs. I choose this example for three reasons: For one, it is rather topical with the general election in the UK approaching. Parliamentary speeches in the UK further also tend to have a quite distinct style, which would hopefully be captured well in the finetuned model and make for interesting output. Lastly, the speech contributions in the House of Commons are catalogued in detail in the Hansard database, which is made available through an API in easily parseable xml format by TheyWorkForYou. After parsing, this provided me with 3MB (Corbyn) and 1MB (Johnson) respectively of speeches in parliament.
Training the model
Finetuning the pre-trained gpt-2 model is made straightforward with the python package gpt-2-simple by Max Woolf, which provides an easy-to-use python API to download pre-trained gpt-2 models and finetune them with tensorflow (I used the 124M parameter model). Using Google’s Colaboratory notebooks (essentially jupyter notebooks with a free GPU attached to it), finetuning the models is reasonably quick (~2-4 hours) and the output from the models was able to capture style and content specific to the 2 speakers:
Sample output
Prompt: Mr Speaker, this is what I think about Christmas.
Corbyn model: Mr Speaker this is what I think about Christmas. Is it not the case that we should all celebrate Christmas with aplomb—with peace, music, laughter, reading, movies and lots of activities?
Johnson model: Mr Speaker this is what I think about Christmas. I bet they can do it in December. I see Christmas in the Conservatives’s future.
Prompt: Mr Speaker, our NHS is in a crisis.
Corbyn model: Mr Speaker, our NHS is in a crisis. The numbers of people who are suffering are the same for every other public sector worker. Part-time workers are also suffering, as is the case with some nurses. Will the Secretary of State tell us how many nurses are being let go, the numbers of teaching assistants, and the morale of those in our NHS because of that situation?
Johnson model: Mr Speaker, our NHS is in a crisis. As my right hon. Friend the Secretary of State for Health has just pointed out, the number of operations cancelled by the Department for International Development has hit all-time highs, and the latest figures—[Interruption.] Well, there they are, shrieking in protest. The figures are horrifying.
Deploying the model
Using again a tutorial by Max Woolf, the two models were packaged into Docker images that can be queried via POST requests. These are then deployed via Google Cloud Run (which allows hosting of Docker apps for free if the monthly usage remain within certain bounds), making it possible to query the two apps (one for each speaker) via the internet.
Modifying an example website in the above tutorial (and learning some JavaScript and JQuery in the process), I then used my github website (con-schneider.github.io/theytalktoyou.html) to host a GUI to request speeches in the style of either of the two speakers.