TA3 - Learning to rank for information retrieval
Time: Monday, April 21 (half-day, morning, 8:30am to 12:00noon)
Location: Room 201B (Level 2)
In this tutorial, an introduction will be given to the new research area, learning to rank for information retrieval. As for learning, a training set of queries and their associated documents (with relevance judgments) are provided. The ranking model is then trained in a supervised fashion, by minimizing certain loss functions. For ranking, the model is applied to new queries and sorts their associated documents.
With the rapid development of this research area, three approaches to learning to rank have emerged, i.e., pointwise, pairwise and listwise approaches. The pointwise approach solves the problem of ranking by means of regression or classification on single documents. Representative work includes discriminative model for IR (Nallapati, 2004) and MCRank (Li et al., 2007). The pairwise approach transforms ranking to classification on document pairs. Representative work includes Ranking SVM (Herbrich et al., 1999), RankBoost (Freund et al., 1998), RankNet (Burges et al., 2005), GBRank and QBRank (Zheng et al., 2007). The advantages of these two approaches lie in that they can leverage existing theory and practice in regression and classification. However, ranking is actually a problem different from regression and classification, with many distinct characteristics. To bridge the gap, some work has been done by making modifications on the existing approaches. Example methods include IR-SVM (Cao et al., 2006), FRank (Tsai et al., 2007), and MHR (Qin et al., 2007).
A more fundamental approach, which we call listwise approach, has also been proposed. The listwise approach tackles the ranking problem directly, by adopting listwise loss functions, or directly optimize IR evaluation measures. Representative work includes ListNet (Cao et al., 2007), RankCosine (Qin et al., 2007), SVM-MAP (Yue et al., 2007), AdaRank (Xu and Li, 2007), SoftRank (Taylor et al., 2007), LamdaRank (Burges et al., 2006), etc. In addition to focusing on the objective function of learning, Qin et al. (2008) proposed using a multi-variant function (referred to as relational ranking function) to perform listwise ranking, instead of using a single-document based ranking function.
The introductions in the tutorial to the aforementioned approaches are not limited to the algorithms, but also extended to the related theoretical issues (e.g., statistical consistency and generalization ability). In the mean time, a benchmark dataset named LETOR (Liu et al., 2007) will also be introduced, which has been widely used by learning to rank researchers. After that, the future research directions regarding learning to rank for IR, and open questions will also be discussed.
Tie-Yan Liu, Microsoft Research Asia (China)
Dr. Tie-Yan Liu is a lead researcher at Microsoft Research Asia. His current research interests include learning to rank for information retrieval, infrastructure and algorithms for large-scale machine learning. So far, Dr. Liu has more than 60 quality papers published in referred international conferences and journals, including SIGIR(9), WWW(3), KDD(2), ICML, etc. He has over 30 filed US / international patents or pending applications. He is the winner of the Most Cited Paper Award for the Journal of Visual Communication and Image Representation. He has been the program committee members for about 30 international conferences, such as WWW, SIGIR, ICML, ACL, and ICIP. He has been a Senior Program Committee member (formerly Area Coordinator) of SIGIR 2008, and the co-chair of the SIGIR 2007 workshop on learning to rank for information retrieval (LR4IR 2007). He has been or will be the tutorial speaker of several conferences, such as AIRS 2008, SIGIR 2008, etc.
Inquiries can be sent to: