Skip to Main Content

University Library

LibGuides

Finding Text Data Sets

Social Media Data

Many social media sites have APIs that provide access to textual platform content of various kinds. Generally speaking, APIs require some technical know-how and an access key from the data provider. These APIs are approved by their respective platforms and do not violate their Terms of Service. 

X API 

  • X has several paid API access tiers available for researchers. Please see their website for more information about the different API access levels, their cost, and what they allow researchers to do and access. 

Meta Content Library and API (Facebook and Instagram)

  • The Meta Content Library and API provides access to Instagram’s and Facebook’s content archives. Researchers can apply for access to the API through the University of Michigan. 

TikTok API

  • TikTok’s API provides researchers access to public data on user accounts and content. Researchers must create a research account and submit an application to access the API. 

Reddit API

  • Reddit provides a Data API to access user content. Researchers must read their Developer Terms and Data API Terms, then register an account to use their API. Please check the Data API Terms for more information about costs associated with the Reddit API. 

Youtube Data API

  • Youtube’s API allows researchers to access video and channel metadata. Researchers can visit their API page to sign up and should review their API Terms of Service before starting a new project.

There are other tools that use web-scraping to gather content from social media platforms, but they violate the terms of service. If you are publishing your research, check the policies for your target journals as attitudes towards this technique vary by discipline and journal. 

Other social media analysis tools:

Documenting the Now 

  • Documenting the Now collects tweet data (tweet IDs) and publishes them as an Open Access data sets. They also maintain a tool called Hydrator that turns the tweet IDs into full tweets.

Social Media Macroscope

  • Developed at Illinois, the Social Media Macroscope has the goal of providing social media analytics tools and data to students and researchers. Check out tools like the Brand Analytics Environment to see how the public interacts with brands, or download a dataset.

SMILE

  • SMILE, or the Social Media Intelligence & Learning Environment, collects social media from Reddit and Youtube and provides a variety of analytics tools for computational analysis, including phrase mining, text classification, network analysis, and more. U of I Researchers can access SMILE using their U of I credentials. 

Each tool above has its own limitations and strengths; visit the links for more information.

Song Lyrics

Genius

  • Genius, formerly Rap Genius, is a reliable web source of song lyrics from all genres. They also publish news, interviews with artists, and other content related to popular music.
  • Data Mining Instructions: Access the Genius API. An API key and Genius account is required. The Genius API does not have downloading functionality; you will need to use another method to download the data (try this Python package!).

Case Law Documents

Court Listener 

  • Court Listener is an archive of legal opinions, filings, judges, and judicial financial records maintained by the Free Law Project. Court Listener maintains five API services that are available according to means-based pricing. Visit their site for more information about their specific services and how to access their APIs.