Neural Network-Generated Celebrity PSAs

"Fake news spreads faster and more easily than this virus, and is just as dangerous." -Tedros Adhanom Ghebreyesus, Director General of the WHO

Infodemic is a neural network-generated video that questions the mediated narratives created by social media influencers and celebrities about the coronavirus. The speakers featured in the video are an amalgam of celebrities, influencers, politicians, and tech moguls that have contributed to the spread misinformation about the coronavirus by either repeating false narratives, or developing technologies that amplify untrue content. The talking heads are generated using a conditional generative adversarial network (cGAN), which is used in some deepfake technologies. Unlike deepfake videos where a neural network is trained on images of a single person to produce a convincing likeness of that person saying things they did not say, we trained our algorithms on a corpora of multiple individuals simultaneously. The result is a talking head that morphs between different speakers or becomes a glitchy Frankensteinian hybrid of different people that contributed to the current infodemic speaking the words of academics, medical experts, or journalists that are correcting false narratives or explaining how misinformation is created and spread. The plastic, evolving, and unstable speakers in the video evoke the mutation of the coronavirus, the instability of truth, and the limits of knowledge. This project is a collaboration with Jennifer Gradecki.

Screen shots

Video generated from the facial landmarks of an input speaker.


Infodemic was created using an experimental Pix2Pix model that was trained on a corpus of multiple individuals simultaneously. Pix2Pix is a conditional generative adversarial network (cGAN) that is trained on sets of two images where one image becomes a map to produce a second image. In Infodemic, the model was frames from multiple videos of different individuals that are mapped to their corresponding facial landmarks. Video frames were then generated from the facial landmarks of a new speaker who is a news anchor, academic, or health professional talking about the uncertainty of the coronavirus or the role of predictive content recognition algorithms in the spread of misinformation. Because the new speaker was not in the training corpus, the generated frames are often a hybrid of multiple speakers simultaneously. Finally, the new frames are assembled into a video with the audio track of the new speaker.