Actions

WikiHarass: Difference between revisions

From Algolit

Line 7: Line 7:
 
| Collectively developed by: || & Algolit
 
| Collectively developed by: || & Algolit
 
|}
 
|}
 +
 +
The [https://meta.wikimedia.org/wiki/Research:Detox Detox] [https://figshare.com/projects/Wikipedia_Talk/16731 dataset] was used by Wikimedia and [[Crowd Embeddings| Perspective API]] to train a neural network that would detect the level of toxicity of a comment.
 +
 +
The dataset consists of:
 +
*A corpus of all 95 million user and article talk diffs made between 2001–2015 scored by the personal attack model.
 +
*A human annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff).
  
  
 
[[Category:Algoliterary-Encounters]]
 
[[Category:Algoliterary-Encounters]]

Revision as of 14:19, 25 October 2017

Type: Algoliterary dataset
Source(s):
Collectively developed by: & Algolit

The Detox dataset was used by Wikimedia and Perspective API to train a neural network that would detect the level of toxicity of a comment.

The dataset consists of:

  • A corpus of all 95 million user and article talk diffs made between 2001–2015 scored by the personal attack model.
  • A human annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff).