Disputed Claims Corpus

As part of the Confrontational Computing project, we automatically search the web, looking for web pages that suggest that a particular claim is disputed.

We plan to continually update and improve this data set, and provide tool for querying it in interesting ways.

For the moment, we are making our data available as a collection of simple text files, each of which represents a single day, and each line of which contains a claim that we believe is disputed.

Download a list of good claims (55Mb)

Download a smaller, more selective, list of disputed claims (9.3Mb)

Download good claims with metadata (315Mb)

This data is available free of charge under a Creative Commons Attribution License. You are free to use the data however you like, as long as you say that you got the data from us.

This is a small subset of the information we have available about disputed claims. For example, we also have urls, titles, text context, etc for each of the claims listed here.

Please mail Rob Ennals if you want a different slice of our data set, or want help in using our data. More generally, we'd also be interested in hearing how you would like to use our data.