Modulinformationssystem Informatik


Web Information Retrieval URL PDF XML

Modulcode: WInf-WebIR
Englische Bezeichnung: Web Information Retrieval
Modulverantwortliche(r): Prof. Dr. Ansgar Scherp
Turnus: unregelmäßig (SS17 SS18)
Präsenzzeiten: 2V 2Ü
Workload: 30 Std. Vorlesung, 30 Std. Präsenzübung, 120 Std. Selbststudium
Dauer: ein Semester
Modulkategorien: WI (BSc Inf (15)) WI (BSc Inf) WI (MSc Inf (15)) WWi (MSc WInf (15))
Lehrsprache: Englisch
Voraussetzungen: Info


The ability to find information on the web is an essential technique in our digital age. In order to understand today’s search engines and retrieval systems, the course offers an introduction to basic as well as advances techniques. The topics cover the crawling and processing of large document corpora, different retrieval models as well as evaluation of information retrieval systems.


The students will be enabled to understand, reflect, and apply different methods and techniques in web information retrieval.


This course gives an introduction to basic and advanced methods of information retrieval. Specific focus will be put on dealing with Web data. The course introduces the topic by briefly looking into the process of information retrieval and information seeking. Subsequently, the evaluation of information retrieval systems is discussed. This includes the classical Cranfield paradigm, set-based metrics, ranking-aware metrics, and significance tests. Furthermore, different tasks in the pre-processing of the data are presented such as tokenization and filtering. The core part of the course covers different information retrieval models such as the Boolean Retrieval Model, Vector Space Model and Probabilistic Retrieval Models. Further topics of the course include crawling of web documents and understanding the web as graph. The latter notion is used for authority ranking such as PageRank and HITS. Finally, techniques from machine learning for information retrieval are considered such as Learning to Rank and Language Models.

Weitere Voraussetzungen:

Knowledge in Algorithms and Data Structures as well as Programming from the Bachelor studies computer science or business informatics.


The exam will in written. Active participation in the tutorials is prerequisite for admission to the exam.

Lehr- und Lernmethoden:

Learning material will be provided in form of presentation slides.


The acquired knowledge and skills can be applied for a Bachelor thesis or Master thesis.


  • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
  • Christopher D. Manning and Hinrich Schütze: Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999.

