Making Twitter research reproducible through archiving
In an article now published in Big Data & Society Katharina Kinder-Kurlanda, Katrin Weller, Wolfgang Zenk-Möltgen, Jürgen Pfeffer and Fred Morstetter present a solution to sharing social media data with the help of a social science data archive.
Researchers working with data gathered from social media platforms have long been struggling with the difficulty that sharing such data faces many obstacles, not the least of which are restrictions imposed by social media companies and issues of privacy and consent. However, sharing social media data used in research for the purposes of reproducibility and transparency currently seems more urgent than ever – and this article shows how collaborations with established archives may be able to help.
With research data repositories and other research infrastructure institutions starting to target social media researchers some important steps are being taken to improve social media data sharing for the sake of research transparency. Such efforts contribute to enhancing comparability and reproducibility in social media research by taking some first steps towards setting standards for sustainable data archiving. The article showcases the example of a big dataset containing geotagged tweets which was archived at the GESIS Data Archive for the Social Sciences, a publicly funded German data archive for secure and long-term archiving of social science data. Tweet IDs and additional information were archived to improve reproducibility of the initial research while also attending to ethical and legal considerations, and taking into account Twitter’s terms of service in particular. The authors also provide some general background to the process of long-term archiving of research data, considers current obstacles for sharing and archiving social media data.
The solution for archiving the geotagged tweets balances three requirements: sharing legally and ethically, sharing to allow for reproducibility (e.g., precise documentation) and sharing to allow for novel questions and reuse (i.e. researcher friendly data provision).
The sharing solution balances privacy requirements and the (ethical) obligation to make research reproducible and comparable and the authors suggest that to advance ethically reflective social media data sharing, it needs to be best practice to establish a carefully considered balance between protecting user interests and ensuring research transparency that is also in adherence with the data provider’s terms of service. They advocate for facilitating the sharing of information and materials in addition to the data to make it easier to find such a balance. However, it will require deliberation and careful consideration for every individual social media dataset shared in the future in order to find similar (and never ideal) compromises that balance these conflicting demands.