HisTrace: Building a Search Engine of Historical Events
- Lian'en Huang(Peking University)
- Jonathan J. H. Zhu(City University of Hong Kong)
- Xiaoming Li(Peking University)
In this paper, we describe an experimental search engine on our Chinese web archive since 2001. The original data set contains nearly 3 billion Chinese web pages crawled from past 5 years. From the collection, 430 million “article-like” pages are selected and then partitioned into 68 million sets of similar pages. The titles and publication dates are determined for the pages. An index is built. When searching, the system returns related pages in a chronological order. This way, if a user is interested in news reports or commentaries for certain previously happened event, he/she will be able to find a quite rich set of highly related pages in a convenient way.
Inquiries can be sent to: