A comparison of stylometric and lexical features for Web genre classification and emotion classification in blogs

Elisabeth Lex, Andreas Juffinger, Michael Granitzer

Research output: Contribution to conferencePaperResearchpeer-review

Abstract

In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify people's feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.

Original languageEnglish
Publication statusPublished - 1 Jan 2010
Event7th International Workshop on Text-Based Information Retrieval, TIR 2010 - In Conjunction with DEXA 2010 - Bilbao, Spain
Duration: 30 Aug 20103 Sep 2010

Conference

Conference7th International Workshop on Text-Based Information Retrieval, TIR 2010 - In Conjunction with DEXA 2010
CountrySpain
CityBilbao
Period30/08/103/09/10

Fingerprint

Blogs
Classifiers
Search engines
Experiments

Keywords

  • Data mining
  • Document classification
  • Features

ASJC Scopus subject areas

  • Information Systems

Cite this

Lex, E., Juffinger, A., & Granitzer, M. (2010). A comparison of stylometric and lexical features for Web genre classification and emotion classification in blogs. Paper presented at 7th International Workshop on Text-Based Information Retrieval, TIR 2010 - In Conjunction with DEXA 2010, Bilbao, Spain.

A comparison of stylometric and lexical features for Web genre classification and emotion classification in blogs. / Lex, Elisabeth; Juffinger, Andreas; Granitzer, Michael.

2010. Paper presented at 7th International Workshop on Text-Based Information Retrieval, TIR 2010 - In Conjunction with DEXA 2010, Bilbao, Spain.

Research output: Contribution to conferencePaperResearchpeer-review

Lex, E, Juffinger, A & Granitzer, M 2010, 'A comparison of stylometric and lexical features for Web genre classification and emotion classification in blogs' Paper presented at 7th International Workshop on Text-Based Information Retrieval, TIR 2010 - In Conjunction with DEXA 2010, Bilbao, Spain, 30/08/10 - 3/09/10, .
Lex E, Juffinger A, Granitzer M. A comparison of stylometric and lexical features for Web genre classification and emotion classification in blogs. 2010. Paper presented at 7th International Workshop on Text-Based Information Retrieval, TIR 2010 - In Conjunction with DEXA 2010, Bilbao, Spain.
Lex, Elisabeth ; Juffinger, Andreas ; Granitzer, Michael. / A comparison of stylometric and lexical features for Web genre classification and emotion classification in blogs. Paper presented at 7th International Workshop on Text-Based Information Retrieval, TIR 2010 - In Conjunction with DEXA 2010, Bilbao, Spain.
@conference{510d0a984b0a4dd7b6b04fc3be6a3255,
title = "A comparison of stylometric and lexical features for Web genre classification and emotion classification in blogs",
abstract = "In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify people's feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.",
keywords = "Data mining, Document classification, Features",
author = "Elisabeth Lex and Andreas Juffinger and Michael Granitzer",
year = "2010",
month = "1",
day = "1",
language = "English",
note = "7th International Workshop on Text-Based Information Retrieval, TIR 2010 - In Conjunction with DEXA 2010 ; Conference date: 30-08-2010 Through 03-09-2010",

}

TY - CONF

T1 - A comparison of stylometric and lexical features for Web genre classification and emotion classification in blogs

AU - Lex, Elisabeth

AU - Juffinger, Andreas

AU - Granitzer, Michael

PY - 2010/1/1

Y1 - 2010/1/1

N2 - In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify people's feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.

AB - In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify people's feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.

KW - Data mining

KW - Document classification

KW - Features

UR - http://www.scopus.com/inward/record.url?scp=84903189240&partnerID=8YFLogxK

M3 - Paper

ER -