Modulinformationssystem Informatik

 

Data Science für die Naturwissenschaften URL PDF XML

Modulcode: infDaSciNat-01a
Englische Bezeichnung: Data Science for Natural Science
Modulverantwortliche(r): Prof. Dr. Matthias Renz
Turnus: jedes Jahr
Präsenzzeiten: 3V 1Ü 2P
ECTS: 8
Workload: 45 h lectures, 15 h exercises, 30 h practical excercises, 150 h self studies
Dauer: ein Semester
Modulkategorien: INF-Phy (Inf. als NF)
Lehrsprache: Deutsch
Voraussetzungen: Info

Kurzfassung:

The lecture is intended to convey the basics for the presentation, processing and use of data to gain (new) knowledge and derive recommendations for action. The most important aspects of the life cycle of data are addressed, starting with data formats and structures, which play an important role in the collection and management of the data, through methods for processing and using the data, through to the representation and communication of the data knowledge and knowledge gained.

Lernziele:

The students

  • understand the term "data science" and its meaning (context)
  • know the common data formats, data structures and data models
  • know the most important models (from statistics) for describing data sets (data collections), their data quality and metadata.
  • know the most basic methods of data (pre) processing and basic introduction to methods for knowledge acquisition (machine learning, data mining, knowledge discovery, ...)
  • understand fundamental aspects of the interpretation and presentation of the results from the data processing
  • can apply the learned techniques to simple practical Data Science applications

Lehrinhalte:

  1. Data acquisition, data collection procedures and data models
  2. Statistical data description and data exploration (frequencies, graphical representation of data, description of distributions, concentration measures, univariate and multivariate data descriptors, box plots, correlation analysis, hypothesis test)
  3. Data cleaning (handling of missing / noisy data, interpolation, extrapolation, regression analysis, kriging, smoothing)
  4. Data integration and transformation (redundancy analysis, correlation analysis, chi-square test, smoothing, dimension reduction, feature extraction (spatial, temporal, multimedia), data cubes, index structures)
  5. Introduction to methods for searching in data (exact match, similarity search, kNN, ...)
  6. Introduction to methods for analyzing data (DM, ML, ect.)
  7. Data visualization (basics)

The module will be completed by a small practical project in which the learned techniques will be applied to some practical problems, possibly in the context of topics from natural science.

Weitere Voraussetzungen:

Prüfungsleistung:

Prerequisits for the exam: home work, written text The exam will be the presentation of results of the prectical exercises.

Lehr- und Lernmethoden:

In the lecture, the material is conveyed in different forms (blackboard, projector), which are selected depending on the respective content. For the most part of the lecture there are slides, which are made available together with other documents on the website of the event. Theoretical and practical tasks related to the subject matter taught in the lecture are dealt with in the exercises. The solutions are discussed.

The learned techniques will by applied to practical applications in the final project.

Verwendbarkeit:

Literatur:

  1. Statistik: Der Weg zur Datenanalyse (Fahrmeir, Künstler, Pigeot, Lutz)
  2. Data Mining: Concepts and Techniques (Jiawei Han, Micheline Kamber)
  3. Interactive Data Visualization: Foundations, Techniques, and Applications (Ward, Grinstein, Keim)

Verweise:

Kommentar:

The lecture will start in winterterm 22/23.