If Twitter goes down in flames, what happens to its huge and historically important collection of tweets?

This blog has just written about the likely loss of a very particular kind of culture – K-pop live streams. Culture is culture, and a loss is a loss. But potentially we are facing the disappearance of a cultural resource that is indisputably more important. I’m talking about Twitter, and its vast store of tweets that have been written over the last 16 years of its existence.

We have rather taken Twitter and its key role in modern culture and public discourse for granted. But the recent purchase of the company by Elon Musk, and his idiosyncratic decisions since doing so, have (a) raised the possibility that Twitter will go bankrupt, as Musk himself has allegedly said, and (b) made people realise how much of value would be lost if that happens.

There is no ongoing independent backup of Twitter. There was to begin with: the US Library of Congress (LoC) signed an agreement allowing it to create a complete Twitter Archive for a while. That ran for 12 years, during which time billions of tweets were collected. As an update on the Twitter Archive explained in 2017, the decision not to collect everything thereafter was taken because of the dramatic increase in the number of tweets; the fact that the Library of Congress only received text, but many tweets were more visual than textual; and the increase in potential tweet length from 140 to 280 characters. The LoC also noted that its partial collection already “documents the rise of an important social media platform”, and that in any case, it does not aim to “collect comprehensively”. As a result, it started adding tweets on a more selective basis. It concluded:

The Twitter Archive may prove to be one of this generation’s most significant legacies to future generations. Future generations will learn much about this rich period in our history, the information flows, and social and political forces that help define the current generation.

I would argue that this was still true after the archive was halted; whether it will be in the future, remains to be seen. Nonetheless, at the very least we are faced with losing many, perhaps most tweets from the years 2017 until 2022. That’s because as far as I am aware, no one else is receiving a full feed of tweets in the way the Library of Congress was. The indispensable Internet Archive holds snapshots, but there is no guarantee it has a particular tweet.

Downloading and storing all tweets directly from the public Twitter service is not possible. That’s not so much for technical reasons – it would be a challenge but surely not beyond today’s advanced systems – but because of copyright. Twitter’s Terms of Service state:

You retain your rights to any Content you submit, post or display on or through the Services. What’s yours is yours — you own your Content (and your incorporated audio, photos and videos are considered part of the Content).

Making copies of billions of tweets without permission would be too risky for any organisation to contemplate, given the huge costs involved in such a project. Obtaining that permission from hundreds of millions of Twitter users to make copies of their tweets would be a licensing nightmare. Whatever happens as a result of Elon Musk’s changes to the service, that copyright problem is not something that is going to disappear. As a result, what the Library of Congress rightly called “one of this generation’s most significant legacies to future generations” will always be at risk of disappearing forever, leaving the valuable but incomplete archive the LoC holds, but does not make publicly available.

Featured image generated using Stable Diffusion.

Follow me @glynmoody on Twitter, or Mastodon.