Interactive Quality Analytics of User-generated Content: An Integrated Toolkit for the Case of Wikipedia

Cecilia Di Sciascio, David Strohmaier, Marcelo Errecalde, Eduardo Veas

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success but also a hindrance to good quality. Although Wikipedia has established guidelines for the “perfect article,” authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever-growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. Instead, our contribution is an interactive tool that combines automatic classification methods and human interaction in a toolkit, whereby experts can experiment with new quality metrics and share them with authors that need to identify weaknesses to improve a particular article. A design study shows that experts are able to effectively create complex quality metrics in a visual analytics environment. In turn, a user study evidences that regular users can identify flaws, as well as high-quality content based on the inspection of automatic quality scores
Original languageEnglish
Article number13
Pages (from-to)1-42
Number of pages42
JournalACM Transactions on Interactive Intelligent Systems
Volume9
Issue number2-3
DOIs
Publication statusPublished - Apr 2019

Fingerprint

Defects
Digital libraries
Inspection
Internet
Experiments

Keywords

  • Information quality assessment
  • Text analytics
  • User-generated content
  • Visual analytics
  • Wikipedia
  • information quality assessment
  • visual analytics
  • user-generated content

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction

Cite this

Interactive Quality Analytics of User-generated Content: An Integrated Toolkit for the Case of Wikipedia. / Sciascio, Cecilia Di; Strohmaier, David; Errecalde, Marcelo; Veas, Eduardo.

In: ACM Transactions on Interactive Intelligent Systems , Vol. 9, No. 2-3, 13, 04.2019, p. 1-42.

Research output: Contribution to journalArticleResearchpeer-review

@article{f3d142f4f31b4fbc8a7b55647f3d1c61,
title = "Interactive Quality Analytics of User-generated Content: An Integrated Toolkit for the Case of Wikipedia",
abstract = "Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success but also a hindrance to good quality. Although Wikipedia has established guidelines for the “perfect article,” authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever-growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. Instead, our contribution is an interactive tool that combines automatic classification methods and human interaction in a toolkit, whereby experts can experiment with new quality metrics and share them with authors that need to identify weaknesses to improve a particular article. A design study shows that experts are able to effectively create complex quality metrics in a visual analytics environment. In turn, a user study evidences that regular users can identify flaws, as well as high-quality content based on the inspection of automatic quality scores",
keywords = "Information quality assessment, Text analytics, User-generated content, Visual analytics, Wikipedia, information quality assessment, visual analytics, user-generated content",
author = "Sciascio, {Cecilia Di} and David Strohmaier and Marcelo Errecalde and Eduardo Veas",
year = "2019",
month = "4",
doi = "10.1145/3150973",
language = "English",
volume = "9",
pages = "1--42",
journal = "ACM Transactions on Interactive Intelligent Systems",
issn = "2160-6455",
publisher = "Association of Computing Machinery",
number = "2-3",

}

TY - JOUR

T1 - Interactive Quality Analytics of User-generated Content: An Integrated Toolkit for the Case of Wikipedia

AU - Sciascio, Cecilia Di

AU - Strohmaier, David

AU - Errecalde, Marcelo

AU - Veas, Eduardo

PY - 2019/4

Y1 - 2019/4

N2 - Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success but also a hindrance to good quality. Although Wikipedia has established guidelines for the “perfect article,” authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever-growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. Instead, our contribution is an interactive tool that combines automatic classification methods and human interaction in a toolkit, whereby experts can experiment with new quality metrics and share them with authors that need to identify weaknesses to improve a particular article. A design study shows that experts are able to effectively create complex quality metrics in a visual analytics environment. In turn, a user study evidences that regular users can identify flaws, as well as high-quality content based on the inspection of automatic quality scores

AB - Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success but also a hindrance to good quality. Although Wikipedia has established guidelines for the “perfect article,” authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever-growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. Instead, our contribution is an interactive tool that combines automatic classification methods and human interaction in a toolkit, whereby experts can experiment with new quality metrics and share them with authors that need to identify weaknesses to improve a particular article. A design study shows that experts are able to effectively create complex quality metrics in a visual analytics environment. In turn, a user study evidences that regular users can identify flaws, as well as high-quality content based on the inspection of automatic quality scores

KW - Information quality assessment

KW - Text analytics

KW - User-generated content

KW - Visual analytics

KW - Wikipedia

KW - information quality assessment

KW - visual analytics

KW - user-generated content

UR - http://www.scopus.com/inward/record.url?scp=85065191460&partnerID=8YFLogxK

U2 - 10.1145/3150973

DO - 10.1145/3150973

M3 - Article

VL - 9

SP - 1

EP - 42

JO - ACM Transactions on Interactive Intelligent Systems

JF - ACM Transactions on Interactive Intelligent Systems

SN - 2160-6455

IS - 2-3

M1 - 13

ER -