Build a Japanese extensive reading tool with Python.

Xuetong Qing Computational Linguist & Chinese, JavaScript & Japanese, Pythonista & Polyglot

Abstract

Unlike English, Japanese doesn't use spaces to separate words. Furthermore, with complex word conjugations, beginners often spend a lot of time identifying the base form of words when reading extensively. Although morphological analyzers (like Sudachi) can automatically extract word base forms, they often encounter significant Out-of-Vocabulary (OOV) problems when processing colloquial texts such as subtitles, manga, and Galgames, which affects learning efficiency. This sharing session will not only introduce how to use Python to call Sudachi, but will also focus on how to solve this problem.

Abstract

Details