Abstract
This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual quality task (German and French). In our approach, we create an ensemble of three classifiers to predict unseen Web hosts whereas each classifier is trained on a different feature set. Our final NDCG on the whole test set is 0:575 for Task 1, 0:852 for Task 2, and 0:81 (French) and 0:77 (German) for Task 3, which ranks second place in the ECML/PKDD Discovery Challenge 2010.
Original language | English |
---|---|
Journal | arXiv.org e-Print archive |
Publication status | Published - 12 Jun 2014 |
Keywords
- information quality
- classification
- machine learning
- big data
ASJC Scopus subject areas
- Computer Science(all)
Fields of Expertise
- Information, Communication & Computing