Editing JASMIN-spraakcorpus

Jump to navigation Jump to search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
== Properties ==
== Properties ==
* 115 hours of spoken Dutch
* 115 hours of spoken Dutch
* speech of children, elderly people and non-natives, and human-machine interaction
* tagged, lemmatized, parsed, available in several file formats
* verbatim transcription, a transcription of the human-machine interaction (HMI) phenomena, POS tagging of the words, and an automatic phonetic transcription
* version 1.2
* version 1.0 (2008)
* [https://limo.libis.be/primo-explore/fulldisplay?docid=LIRIAS2859003&context=L&vid=Lirias&search_scope=Lirias&tab=default_tab&lang=en_US&fromSitemap=1 Vincent Vandeghinste, Bram Bulté & Liesbeth Augustinus (2019). Wablieft: An Easy-to-Read Newspaper corpus for Dutch. In ''CLARIN Annual Conference 2019 Proceedings''. pp.188-191. Leipzig, Germany.]
* [https://taalmaterialen.ivdnt.org/wp-content/uploads/documentatie/jasmin_lrec2008_en.pdf Recording Speech of Children, Non-Natives and Elderly People for HLT Applications: the JASMIN-CGN Corpus (LREC Proceedings 2008)]

== Description ==
== Description ==
The Wablieft corpus contains the digital archive of the Wablieft newspaper (period 2011-2017), as also available on the website http://www.wablieft.be/krant/archief.

It contains 2 million words of newspaper material in easy to read Dutch. Metadata is available regarding the newspaper section (interior, sport, ...) and the publication date. This concerns all material since the newspaper became fully available digitally and online, from 2011 to December 2017.
The data is available in different formats: original text files, text files with one sentence per line, annotated with Frog (POS tagging, lemmatisation, morphology, named entity recognition, chunking, dependency relationships) in FoLiA or CoNNL, and analyzed syntactically with Alpino, in Alpino-XML.
There is an agreement with Wablieft for the distribution of this material for non-commercial purposes. Commercial parties can contact Wablieft to obtain a license for the material.

== Download page ==
== Download page ==
[http://hdl.handle.net/10032/tm-a2-j7 http://hdl.handle.net/10032/tm-a2-j7]
[https://taalmaterialen.ivdnt.org/download/tstc-wablieft-corpus-1-2/ https://taalmaterialen.ivdnt.org/download/tstc-wablieft-corpus-1-2/]

Please note that all contributions to K-Dutch ATO may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see Clarin K Centre:Copyrights for details). Do not submit copyrighted work without permission!

Cancel Editing help (opens in new window)