WikiHarass

From Algolit

Revision as of 18:03, 25 October 2017 by Cristina (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Type:	Dataset
Source:	English Wikipedia
Developed by:	Wikimedia Foundation

The Detox dataset is a project by Wikimedia and Perspective API to train a neural network that would detect the level of toxicity of a comment.

The original dataset consists of:

A corpus of all 95 million user and article talk diffs made between 2001–2015 scored by the personal attack model.
A human annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff).

For Algolit, a smaller section of the Detox dataset was used, taken from Jigsaw's Github, which contains both constructive and vandalist edits.

Retrieved from "https://www.algolit.net/index.php?title=WikiHarass&oldid=9965"

Algoliterary-Encounters