Assessing the Quality of Web Content

Elisabeth Lex, Inayat Khan, Horst Bischof, Michael Granitzer

Publikation: Beitrag in einer FachzeitschriftArtikelForschungBegutachtung

Abstract

This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual quality task (German and French). In our approach, we create an ensemble of three classifiers to predict unseen Web hosts whereas each classifier is trained on a different feature set. Our final NDCG on the whole test set is 0:575 for Task 1, 0:852 for Task 2, and 0:81 (French) and 0:77 (German) for Task 3, which ranks second place in the ECML/PKDD Discovery Challenge 2010.
Originalspracheenglisch
FachzeitschriftarXiv.org e-Print archive
PublikationsstatusVeröffentlicht - 12 Jun 2014

Fingerprint

Classifiers

Schlagwörter

    ASJC Scopus subject areas

    • !!Computer Science(all)

    Fields of Expertise

    • Information, Communication & Computing

    Dies zitieren

    Assessing the Quality of Web Content. / Lex, Elisabeth; Khan, Inayat; Bischof, Horst; Granitzer, Michael.

    in: arXiv.org e-Print archive, 12.06.2014.

    Publikation: Beitrag in einer FachzeitschriftArtikelForschungBegutachtung

    @article{69c98d954f794f96bf4ac5ec18dfb1c6,
    title = "Assessing the Quality of Web Content",
    abstract = "This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual quality task (German and French). In our approach, we create an ensemble of three classifiers to predict unseen Web hosts whereas each classifier is trained on a different feature set. Our final NDCG on the whole test set is 0:575 for Task 1, 0:852 for Task 2, and 0:81 (French) and 0:77 (German) for Task 3, which ranks second place in the ECML/PKDD Discovery Challenge 2010.",
    keywords = "information quality, classification, machine learning, big data",
    author = "Elisabeth Lex and Inayat Khan and Horst Bischof and Michael Granitzer",
    note = "4 pages, ECML/PKDD 2010 Discovery Challenge Workshop",
    year = "2014",
    month = "6",
    day = "12",
    language = "English",
    journal = "arXiv.org e-Print archive",
    publisher = "Cornell University Library",

    }

    TY - JOUR

    T1 - Assessing the Quality of Web Content

    AU - Lex, Elisabeth

    AU - Khan, Inayat

    AU - Bischof, Horst

    AU - Granitzer, Michael

    N1 - 4 pages, ECML/PKDD 2010 Discovery Challenge Workshop

    PY - 2014/6/12

    Y1 - 2014/6/12

    N2 - This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual quality task (German and French). In our approach, we create an ensemble of three classifiers to predict unseen Web hosts whereas each classifier is trained on a different feature set. Our final NDCG on the whole test set is 0:575 for Task 1, 0:852 for Task 2, and 0:81 (French) and 0:77 (German) for Task 3, which ranks second place in the ECML/PKDD Discovery Challenge 2010.

    AB - This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual quality task (German and French). In our approach, we create an ensemble of three classifiers to predict unseen Web hosts whereas each classifier is trained on a different feature set. Our final NDCG on the whole test set is 0:575 for Task 1, 0:852 for Task 2, and 0:81 (French) and 0:77 (German) for Task 3, which ranks second place in the ECML/PKDD Discovery Challenge 2010.

    KW - information quality

    KW - classification

    KW - machine learning

    KW - big data

    M3 - Article

    JO - arXiv.org e-Print archive

    JF - arXiv.org e-Print archive

    ER -