# Dr. Steffen Liebscher

## Contact

via the Chair of Statistics (sek.statistik@wiwi.uni-halle.de)

## Overview

## Curriculum Vitae

- 09/2016 teaching and research stay at the University of Cassino and Southern Lazio, Italy
- 02/2016 teaching and research stay at the University of Cassino and Southern Lazio, Italy
- 09/2015 teaching and research stay at the University of Cassino and Southern Lazio, Italy
- 03/2008 - 04/2017 research assistant at the chair of statistics at the Martin-Luther-University Halle-Wittenberg, Germany
- born in 1982

## Main research

### robust statistical methods

### Scope and Objectives

Object of research is the development of new statistical methods for outlier detection and the construction of multivariate, non-parametric location and scatter estimators with high breakdown point.

### Short Description

A good estimation of location and scatter is not only important in explanatory statistics, but is also fundamental for advanced statistical methods like analysis of variance or factor analysis. There already exists a huge amount of location and scatter estimators, these include well known ones like the arithmetic mean (univariate/multivariate) or the variance (univariate) and covariance (multivariate). The aforementioned estimators have one drawback in common: some extreme points in the sample (so-called outliers) can arbitrarily distort the estimation (a way more formal: the above-mentioned estimators have a finite-sample breakdown point of 1/n, i.e. the alteration of one single observation suffices for the estimator to break down). This problem is addressed by so-called robust estimators, that are able to handle data containing a certain fraction of outliers without breaking down. The best-known and most popular robust estimators for location and scatter are the minimum covariance determinant estimator (MCD), the minimum volume ellipsoid estimator (MVE) and the minimum volume zonoid estimator (MZE). The general idea of these estimators is almost the same: separate the "good" and the "bad" points and estimate the parameters of interest on the basis of a subsample with all outliers removed. But how is this separation going to happen? How can one distinguish between "good" and "bad" points? The major challenge and the basic difficulty is to identify the outliers in an appropriate way! This task is also emphasized by the outlier definition, usually given in the following way: "An outlier is a point far away from the majority of the other points." But what does far away actually mean? Between which objects is the distance measured? Which metric is used? These questions can be answered in different ways, leading to different decisions whether a point is an outlier or not, i.e. it is not a priori defined, which point is an outlier. This depends on the identification procedure itself! And this is where our research is going to take place. The aim of the project is the development of new statistical methods for outlier detection (outlier definition) and subsequently the construction of multivariate, non-parametric location and scatter estimators with high breakdown point. We will focus on transferring methods and concepts already well-known in other research fields:

- artificial neural networks, especially self-organizing-maps (computer science)
- graph- and tree-based approaches (operations research)
- dimensionality reduction (statistics)

Beyond the problem of defining and identifying outliers there is another problem arising in practical use: In most cases a successive approach at identifying outliers yields worse results as a global one-step-optimization (e.g. the determination of a whole subsample). Especially the MCD-, MVE- and MZE-estimator use the latter approach. But regardless of the optimization criteria actually used, this kind of optimization procedure is computationally very hard, because the number of possible subsamples grows exponential with increasing sample size. That is why these problems belong to the complexity class NP and large, praxis-oriented examples can only be solved approximately by heuristics. Therefore the second research objective deals with the algorithmic implementation and the algorithmic properties (running time and memory requirements) of the methods to be developed.

## Scientific Work

### Contributions

- Kloss, M./Kirschstein, T./Liebscher, S./Petrick M. (2019): "Robust Productivity Analysis: An application to German FADN data",
*arXiv*, 1902.00678 . - Kirschstein, T./Liebscher, S. (2018): "Assessing the market values of soccer players - A robust analysis of data from German 1. and 2. Bundesliga",
*Journal of Applied Statistics*, DOI: 10.1080/02664763.2018.1540689 . - Kirschstein, T./Liebscher, S./Pandolfo, G./Porzio, G./Ragozini, G. (2018): "On finite-sample robustness of directional location estimators",
*Computational Statistics & Data Analysis*, DOI: 10.1016/j.csda.2018.08.028 . - Liebscher, S./Kirschstein, T. (2017): "Predicting the outcome of professional darts tournaments",
*International Journal of Performance Analysis in Sport*, DOI: 10.1080/24748668.2017.1372162 . - Kirschstein, T./Liebscher, S./Pandolfo, G./Porzio, G./Ragozini, G. (2016): "A robust estimator for the mean direction of the von Mises-Fisher distribution",
*Proceedings of the 48th scientific meeting of the Italian Statistical Society*, ISBN: 9788861970618. - Kirschstein, T./Liebscher, S./Porzio, G./Ragozini, G. (2016): "Minimum volume peeling: A robust nonparametric estimator of the multivariate mode",
*Computational Statistics and Data Analysis*, 93, 456-468, DOI: 10.1016/j.csda.2015.04.012 . - Liebscher, S./Kirschstein, T. (2015): "Knot deletion for robust penalized spline regression",
*Proceedings of the 60th ISI World Statistics Congress*, 2452-2457, Rio de Janeiro. - Liebscher, S./Kirschstein, T. (2015): "Efficiency of the pMST and RDELA Location and Scatter Estimators",
*Advances in Statistical Analysis*, 99(1), 63-82, DOI: 10.1007/s10182-014-0231-7 . - Liebscher, S./Kirschstein, T./Becker, C. (2013): "RDELA - A Delaunay-Triangulation-based, Location and Covariance Estimator with High Breakdown Point",
*Statistics and Computing*, 23(6), 677-688, DOI: 10.1007/s11222-012-9337-5 . Erratum. - Kirschstein, T./Liebscher, S./Becker, C. (2013): "Robust estimation of location and scatter by pruning the minimum spanning tree",
*Journal of Multivariate Analysis*, 120, 173-184, DOI: 10.1016/j.jmva.2013.05.004 . - Becker, C./Liebscher, S./Kirschstein, T. (2013): "Multivariate outlier identification based on robust estimators of location and scatter", In Becker, C./Fried, R./Kuhnt, S. (Eds.), Robustness and Complex Data Structures - Festschrift in Honour of Ursula Gather, 103-115, Springer, DOI: 10.1007/978-3-642-35494-6_7 .
- Liebscher, S./Kirschstein, T. (2012): "Identification of unbalanced warship designs using multivariate outlier detection procedures",
*Military Operations Research*, 17(1), 31-43, DOI: 10.5711/1082598317131 . - Liebscher, S./Kirschstein, T./Becker, C. (2012): "The Flood Algorithm - A Multivariate, Self-Organizing-Map-based, Robust Location and Covariance Estimator",
*Statistics and Computing*, 22(1), 325-336, DOI: 10.1007/s11222-011-9250-3 .

### Talks (selection)

- Liebscher, S./Kirschstein, T./Becker, C.: Finding the bad ones - Identifying unsuccessful warship designs; JSM2011 - Joint Statistical Meetings, Miami, 04.08.2011.
- Liebscher, S./Kirschstein, T./Becker, C.: RDELA - A Delaunay-Triangulation-based, Location and Covariance Estimator with High Breakdown Point; ICORS2011 - International Conference on Robust Statistics, Valladolid, 30.06.2011.
- Liebscher, S./Kirschstein, T./Becker, C.: The Flood Algorithm - A Multivariate, Self-Organizing-Map-based, Robust Location and Covariance Estimator; ICORS2010 - International Conference on Robust Statistics, Prague, 29.06.2010.

### R packages

- Liebscher, S./Kirschstein, T. (2015): restlos: Robust Estimation of Locationand Scatter , R package version 0.2-2.

### Conference organization

- Pfingsttagung 2009 der Deutschen Statistischen Gesellschaft, Merseburg
- International Conference on Robust Statistics 2014, Halle (Saale)