Actions

WikiHarass

From Algolit

Revision as of 17:55, 25 October 2017 by Cristina (talk | contribs)
Type: Dataset
Source: English Wikipedia
Developed by: Wikimedia Foundation

The Detox dataset is a project by Wikimedia and Perspective API to train a neural network that would detect the level of toxicity of a comment.

The dataset consists of:

  • A corpus of all 95 million user and article talk diffs made between 2001–2015 scored by the personal attack model.
  • A human annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff).