Actions

WikiHarass: Difference between revisions

From Algolit

Line 1: Line 1:
 
{|
 
{|
 
|-
 
|-
| Type: || Algoliterary dataset
+
| Type: || Dataset
 
|-
 
|-
| Source(s): ||
+
| Developed by: || English Wikipedia
|-
 
| Collectively developed by: || English Wikipedia
 
 
|}
 
|}
  

Revision as of 15:02, 25 October 2017

Type: Dataset
Developed by: English Wikipedia

The Detox dataset was used by Wikimedia and Perspective API to train a neural network that would detect the level of toxicity of a comment.

The dataset consists of:

  • A corpus of all 95 million user and article talk diffs made between 2001–2015 scored by the personal attack model.
  • A human annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff).