Sarkozy, Gabor N
Decades of work have been conducted on automated building of parallel corpus and automatic dictionary in the field of natural language processing. However, rarely have any studies been done between high-density character-based languages and medium-density word-based languages due to the lack of resources and fundamental linguistic differences. In this paper, we describe a methodology for creating a sentence-level paralleled corpus and an automatic bilingual dictionary between Chinese (a high-density character-based language) and Hungarian (a medium-density word-based language). This method will possibly be applied to create Chinese-Hungarian bilingual dictionary for the Sztaki Dictionary project [http://szotar.sztaki.hu/].
Worcester Polytechnic Institute
Major Qualifying Project
All authors have granted to WPI a nonexclusive royalty-free license to distribute copies of the work, subject to other agreements. Copyright is held by the author or authors, with all rights reserved, unless otherwise noted.