An Information-Theoretic Characterization of Morphological Fusion (bibtex)
by Neil Rathi, Michael Hahn, Richard Futrell
Abstract:
Linguistic typology generally divides synthetic languages into groups based on their morphological fusion. However, this measure has long been thought to be best considered a matter of degree. We present an information-theoretic measure, called informational fusion, to quantify the degree of fusion of a given set of morphological features in a surface form, which naturally provides such a graded scale. Informational fusion is able to encapsulate not only concatenative, but also nonconcatenative morphological systems (e.g. Arabic), abstracting away from any notions of morpheme segmentation. We then show, on a sample of twenty-one languages, that our measure recapitulates the usual linguistic classifications for concatenative systems, and provides new measures for nonconcatenative ones. We also evaluate the long-standing hypotheses that more frequent forms are more fusional, and that paradigm size anticorrelates with degree of fusion. We do not find evidence for the idea that languages have characteristic levels of fusion; rather, the degree of fusion varies across part-of-speech within languages.
Reference:
An Information-Theoretic Characterization of Morphological FusionNeil Rathi, Michael Hahn, Richard FutrellProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021.
Bibtex Entry:
@inproceedings{rathi-information-2021,
    title = "An Information-Theoretic Characterization of Morphological Fusion",
    author = "Rathi, Neil  and
      Michael Hahn  and
      Futrell, Richard",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    github = {https://github.com/neilrathi/morphological-fusion},
    month = nov,
    png = {figs/rathi2021.png},
    year = "2021",
    url = "https://aclanthology.org/2021.emnlp-main.793",
    pages = "10115--10120",
    abstract = "Linguistic typology generally divides synthetic languages into groups based on their morphological fusion. However, this measure has long been thought to be best considered a matter of degree. We present an information-theoretic measure, called informational fusion, to quantify the degree of fusion of a given set of morphological features in a surface form, which naturally provides such a graded scale. Informational fusion is able to encapsulate not only concatenative, but also nonconcatenative morphological systems (e.g. Arabic), abstracting away from any notions of morpheme segmentation. We then show, on a sample of twenty-one languages, that our measure recapitulates the usual linguistic classifications for concatenative systems, and provides new measures for nonconcatenative ones. We also evaluate the long-standing hypotheses that more frequent forms are more fusional, and that paradigm size anticorrelates with degree of fusion. We do not find evidence for the idea that languages have characteristic levels of fusion; rather, the degree of fusion varies across part-of-speech within languages.",
}
Powered by bibtexbrowser