Word Length (Distributions) in Slavic Texts. QUANTA.The Graz Project on Quantitative Text-Analysis. Research Project P15485

Word length as a theoretical category in its own right has been largely neglected in linguistics and text-oriented disciplines. Only recently has the question of the frequency of occurrence of words of specific lengths ("word length frequencies") in texts (of a given language, a given author, a given genre, etc.) been theoretically integrated in systematic contexts, and only recently has a particular theory of word length distribution(s) been developed. Empirical results thus far available indeed show that the frequency with which one-, two-, three-, etc. syllable words occur in texts, is organized not chaotically, but by specific laws.Thus far, no systematic studies are available on word length frequencies in Slavic texts. Also, the problem of how the specific "peripheral" factors influence word length frequency (distributions) has never been studied in detail.
In this research project, these questions shall be approached systematically, using approximately 1,000 texts in three Slavic languages (Russian, Croatian, Slovenian).
Since the regularities to be observed can be understood to be of importance for information processing in general (i.e., not only for language processing), and due to the statistical methods which will necessarily have to be applied in studying them, the present project represents an inter-disciplinary attempt to bridge the "two cultures" of natural and cultural sciences.
Effective start/end date1/04/0231/03/05