Combining Information Extraction and Web Mining
for Cross Language Information Retrieval
General Information
This project extends a dissertation research and aims at thoroughly exploring the usefulness of the LKB model [1] on Cross Language Information Retrieval (CLIR), an important research field for global information access and knowledge sharing. The LKB model suggests that a lexical knowledge base (LKB) could be constructed by first applying natural language processing (NLP) to the document collection and then conducting flexible translation knowledge mining to augment the translation knowledge for query translation in CLIR. The effectiveness of this model has been partially evaluated [1]. However, the methods and effects of translation knowledge mining based on information extraction remain uninvestigated. This project is planned to examine the complete LKB model for CLIR. English/Chinese information extraction and Web mining techniques will be combined to construct the lexical knowledge base, and the performance of the LKB will be evaluated through participating in NTCIR-5 CLIR. We are going to participate in Chinese->English Bilingual CLIR, Japanese->English->Chinese Pivot Bilingual CLIR (Japanese-> Chinese using English as the pivot language), and Chinese-> Chinese Monolingual Information Retrieval. The research is funded by UNT Junior Faculty Research Grant.
Current Project Members:
Jiangping Chen (jpchen@unt.edui)
Rowena Li (rowenali@yahoo.com)
Bing Jing (bjing@syr.edu)
Shikun Jiang (sj0071@unt.edu)
Interested in Joining our Team? Please contact Dr. Jiangping Chen (jpchen@unt.edu).
[1] Chen, Jiangping.
(2003). The construction, use, and evaluation of a lexical knowledge base for
English-Chinese cross language information retrieval. Dissertation,