Data & Datasets

The data is available for research purposes only.

Labeled Data

The following links contain labeled datasets that are available for download by members of the research community. Questions about the content and structure of the data should be sent to April Edwards.


Additional Data

The following datasets are also available from the authors upon request. Additional information and requests about the data can be addressed by emailing April Edwards:

  • A large manually labeled dataset (1.6 MB, archived size) for 170019 posts from the perverted-justice.com dataset
  • Additional labeled cyberbullying data from Formspring
  • A dataset for the study of Internet Identity, where posts and profiles from the same user have been collected from different platforms (seeded from 81 unique individuals)
  • A large unlabeled dataset of MySpace data (1.43 GB, archived size), from a Summer 2010 crawl, including profiles and user "wall" posts for 127,974 MySpace users.
  • A large unlabeled Formspring dataset (187 MB, archived size), from a Summer 2010 crawl containing all of the questions and answers for 18,554 Formspring users.

This material is based upon work supported by the National Science Foundation under Grant Nos. 0916152, 1421896, and 1812380. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

ChatCoder is proudly supported by E2 Unlimited Technologies