Selected Publications

Machine-learning algorithms have gained popularity in recent years in the field of ecological modeling due to their promising results in predictive performance of classification problems. While the application of such algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages such as R, there are several practical challenges in the field of ecological modeling related to unbiased performance estimation, optimization of algorithms using hyperparameter tuning and spatial autocorrelation. We address these issues in the comparison of several widely used machine-learning algorithms such as Boosted Regression Trees (BRT), k-Nearest Neighbor (WKNN), Random Forest (RF) and Support Vector Machine (SVM) to traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM). Different nested cross-validation methods including hyperparameter tuning methods are used to evaluate model performances with the aim to receive bias-reduced performance estimates. As a case study the spatial distribution of forest disease Diplodia sapinea in the Basque Country in Spain is investigated using common environmental variables such as temperature, precipitation, soil or lithology as predictors. Results show that GAM and RF (mean AUROC estimates 0.708 and 0.699) outperform all other methods in predictive accuracy. The effect of hyperparameter tuning saturates at around 50 iterations for this data set. The AUROC differences between the bias-reduced (spatial cross-validation) and overoptimistic (non-spatial cross-validation) performance estimates of the GAM and RF are 0.167 (24%) and 0.213 (30%), respectively. It is recommended to also use spatial partitioning for cross-validation hyperparameter tuning of spatial data
Ecological Modelling (submitted), 2018

Integrating R with Geographic Information Systems (GIS) extends R’s statistical capabilities with numerous geoprocessing and data handling tools available in a GIS. QGIS is one of the most popular open-source GIS, and it furthermore integrates other GIS programs such as the System for Automated Geoscientific Analyses (SAGA) GIS and the Geographic Resources Analysis Support System (GRASS) GIS within a single software environment. This and its QGIS Python API makes it a perfect candidate for console-based geoprocessing. By establishing an interface, the R package RQGIS makes it possible to use QGIS as a geoprocessing workhorse from within R. Compared to other packages building a bridge to GIS (e.g., rgrass7, RSAGA, RPyGeo), RQGIS offers a wider range of geoalgorithms, and is often easier to use due to various convenience functions. Finally, RQGIS supports the seamless integration of Python code using reticulate from within R for improved extendability.
The R Journal, 2017

Recent Publications

(2018). Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data. Ecological Modelling (submitted).

Preprint

(2017). RQGIS: Integrating R with QGIS for Statistical Geocomputing. The R Journal.

PDF Project

(2016). Forest DRAGON-3: Decadal trends of Northeastern Forests in China from Earth Observation Synergy. In Proc.‘Dragon 3 Final Results & Dragon 4 Kick-Off Symposium’, Wuhan, PR China.

PDF

Recent blog posts

More Posts

This guide reflects my view on how to setup a working Arch Linux system tailored towards data science, R and spatial analysis. If you …

Maybe you know that for some packages in R there is an entry ‘Package NEWS’ in the help pane of RStudio. However, it is a …

Introduction Autofs Notes Introduction At work I usually have to connect to several servers. Some are Windows Servers, some are Linux …

Teaching

University of Munich

I teach within the Münchner R Kurse hosted by the Computational Stats group of LMU Munich.

University of Jena

  • Geo 404: Applied Geoinformatics (WS 1718, M.Sc.)
  • Geo 311: Geoinformatics III (WS 1617, B.Sc.)