Social media corpora

From K-Dutch ATO
Jump to navigation Jump to search

SoNaR New Media Corpus[edit | edit source]

The SoNaR New Media Corpus 1.0 contains new media texts collected within the STEVIN project SoNaR. The corpus contains text messages, tweets and chat messages. The texts were tokenized, POS-tagged and lemmatized.

Whatsapp corpus Verheijen[edit | edit source]

Whatsappdata collected for the PhD research of Lieke Verheijen (Radboud University). Informed consent was only obtained from the contributor and not from the conversational partner. Consequently, the subcorpus only contains contributions from the submitter.