WikiHarass: Difference between revisions
From Algolit
Line 7: | Line 7: | ||
| Collectively developed by: || & Algolit | | Collectively developed by: || & Algolit | ||
|} | |} | ||
+ | |||
+ | The [https://meta.wikimedia.org/wiki/Research:Detox Detox] [https://figshare.com/projects/Wikipedia_Talk/16731 dataset] was used by Wikimedia and [[Crowd Embeddings| Perspective API]] to train a neural network that would detect the level of toxicity of a comment. | ||
+ | |||
+ | The dataset consists of: | ||
+ | *A corpus of all 95 million user and article talk diffs made between 2001–2015 scored by the personal attack model. | ||
+ | *A human annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff). | ||
[[Category:Algoliterary-Encounters]] | [[Category:Algoliterary-Encounters]] |
Revision as of 14:19, 25 October 2017
Type: | Algoliterary dataset |
Source(s): | |
Collectively developed by: | & Algolit |
The Detox dataset was used by Wikimedia and Perspective API to train a neural network that would detect the level of toxicity of a comment.
The dataset consists of:
- A corpus of all 95 million user and article talk diffs made between 2001–2015 scored by the personal attack model.
- A human annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff).