Faculty Advisor

Sarkozy, Gabor N

Center

Budapest, Hungary

Abstract

Decades of work have been conducted on automated building of parallel corpus and automatic dictionary in the field of natural language processing. However, rarely have any studies been done between high-density character-based languages and medium-density word-based languages due to the lack of resources and fundamental linguistic differences. In this paper, we describe a methodology for creating a sentence-level paralleled corpus and an automatic bilingual dictionary between Chinese (a high-density character-based language) and Hungarian (a medium-density word-based language). This method will possibly be applied to create Chinese-Hungarian bilingual dictionary for the Sztaki Dictionary project [http://szotar.sztaki.hu/].

Publisher

Worcester Polytechnic Institute

Date Accepted

April 2013

Major

Computer Science

Project Type

Major Qualifying Project

Accessibility

Unrestricted

Advisor Department

Computer Science

Project Center

Budapest, Hungary

Share

COinS