Details of CS4201 (Spring 2023)

Level: 4 Type: Theory Credits: 4.0

Course CodeCourse NameInstructor(s)
CS4201 Information Retrieval and Web Search Dwaipayan Roy

Preamble
Information Retrieval forms the foundation of the modern search engines, and IR (popular acronym for Information Retrieval) is often called as the science behind search. Although IR systems are mostly associated with Web search engines (e.g., Bing, Google, Yandex etc.), there are significant applications of IR in digital library search, patent search, and automatic question-
answering, to name a few. Likewise, IR models (the underlying algorithm behind retrieval systems) are adopted to solve a wide range of problems, such as organizing documents into an
ontology, recommending news stories to users, detecting spam, and efficiently address information need to the users. This course will provide an overview of the theory, implementation, and
evaluation of IR techniques. In particular, we will explore how search engines work, how they interpret human language, what different users expect from them, how they are evaluated, why
they sometimes fail, and how they might be improved in the future. For hands-on experience, we will use PyLucene1, a robust, industry standard search engine with a Python wrapper.

Syllabus
Basic idea of Information Retrieval (IR)
Index structures
Retrieval Models
Probabilistic model for IR
Language modeling for IR
IR model evaluation
Relevance feedback
Web search
Discussion on different corpora, forums
Practical with Lucene (Python wrapper)

Prerequisite
Basic concepts of Computer Science and Data Structures (CS3101, CS3201).
Basic probability (conditional probability, Bayes theorem etc.).
Programming knowledge for practicals (Programming in Python: knowledge of packages,
modules, functions etc.).

References
Introduction to Information Retrieval
C. D. Manning, P. Raghavan and H. Schutze
ISBN: 978-0-521-86571-5
https://nlp.stanford.edu/IR-book/information-retrieval-book.html
Information Retrieval: Implementing and Evaluating Search Engines
S. Buttcher, C. L. A. Clarke, G. Cormack.
ISBN: 978-0-262-02651-2
http://www.ir.uwaterloo.ca/book/

Course Credit Options

Sl. No.ProgrammeSemester NoCourse Choice
1 IP 2 Elective
2 IP 4 Elective
3 IP 6 Not Allowed
4 MP 2 Not Allowed
5 MP 4 Not Allowed
6 MR 2 Not Allowed
7 MR 4 Not Allowed
8 MS 10 Elective
9 MS 4 Not Allowed
10 MS 6 Not Allowed
11 MS 8 Elective
12 RS 1 Elective
13 RS 2 Elective