Leiden Weibo Corpus

This corpus was created by Daan van Esch, a graduate student of Chinese linguistics at Leiden University, the Netherlands. It contains 5.1 million messages from Sina Weibo that were posted in January 2012, along with linguistic annotations. You can read more about how this corpus was built here.

Many thanks to my supervisor, Jeroen Wiedenhof, for his enthusiastic supervision of this project. Thanks, too, to Rint Sybesma and Lisa Cheng, for getting me interested in linguistic databases and natural-language processing while I worked as an assistant at their Sino-Kwa project.

Citing
See here for some information on citing messages from the LWC.