WikiHarass: Difference between revisions

Latest revision as of 13:55, 2 November 2017

Type:	Dataset
Number of words:	1.039.789
Unique words:	64.136
Source:	English Wikipedia
Developed by:	Wikimedia Foundation

The Detox dataset is a project by Wikimedia and Perspective API to train a neural network that would detect the level of toxicity of a comment.

The original dataset consists of:

A corpus of all 95 million user and article talk diffs made between 2001 and 2015 scored by the personal attack model.
A human annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff).

For Algolit, a smaller section of the Detox dataset was used, taken from Jigsaw's Github, which contains both constructive and vandalist edits.

@@ Line 3: / Line 3: @@
 | Type: || Dataset
 |-
-| Developed by: || English Wikipedia
+|Number of words: || 1.039.789
+|-
+|Unique words: || 64.136
+|-
+| Source: || English Wikipedia
+|-
+| Developed by: || Wikimedia Foundation
 |}
-The [https://meta.wikimedia.org/wiki/Research:Detox Detox dataset] was used by Wikimedia and [[Crowd Embeddings| Perspective API]] to train a neural network that would detect the level of toxicity of a comment.
+The [https://meta.wikimedia.org/wiki/Research:Detox Detox dataset] is a project by Wikimedia and [[Crowd Embeddings| Perspective API]] to train a neural network that would detect the level of toxicity of a comment.
-The [https://figshare.com/projects/Wikipedia_Talk/16731 dataset] consists of:
+The [https://figshare.com/projects/Wikipedia_Talk/16731 original dataset] consists of:
-*A corpus of all 95 million user and article talk diffs made between 2001–2015 scored by the personal attack model.
+*A corpus of all 95 million user and article talk diffs made between 2001 and 2015 scored by the personal attack model.
 *A human annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff).
+For Algolit, a smaller section of the Detox dataset was used, taken from [https://conversationai.github.io/wikidetox/testdata/tox-sorted/Wikipedia%20Toxicity%20Sorted%20%28Toxicity%405%5BAlpha%5D%29.html Jigsaw's Github], which contains both constructive and vandalist edits.
 [[Category:Algoliterary-Encounters]]

WikiHarass: Difference between revisions

From Algolit

Latest revision as of 13:55, 2 November 2017