Selected Publications

Machine-learning algorithms have gained popularity in recent years in the field of ecological modeling due to their promising results in predictive performance of classification problems. While the application of such algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages such as R, there are several practical challenges in the field of ecological modeling related to unbiased performance estimation, optimization of algorithms using hyperparameter tuning and spatial autocorrelation. We address these issues in the comparison of several widely used machine-learning algorithms such as Boosted Regression Trees (BRT), k-Nearest Neighbor (WKNN), Random Forest (RF) and Support Vector Machine (SVM) to traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM). Different nested cross-validation methods including hyperparameter tuning methods are used to evaluate model performances with the aim to receive bias-reduced performance estimates. As a case study the spatial distribution of forest disease Diplodia sapinea in the Basque Country in Spain is investigated using common environmental variables such as temperature, precipitation, soil or lithology as predictors. Results show that GAM and RF (mean AUROC estimates 0.708 and 0.699) outperform all other methods in predictive accuracy. The effect of hyperparameter tuning saturates at around 50 iterations for this data set. The AUROC differences between the bias-reduced (spatial cross-validation) and overoptimistic (non-spatial cross-validation) performance estimates of the GAM and RF are 0.167 (24%) and 0.213 (30%), respectively. It is recommended to also use spatial partitioning for cross-validation hyperparameter tuning of spatial data
Ecological Modelling (submitted), 2018

Integrating R with Geographic Information Systems (GIS) extends R’s statistical capabilities with numerous geoprocessing and data handling tools available in a GIS. QGIS is one of the most popular open-source GIS, and it furthermore integrates other GIS programs such as the System for Automated Geoscientific Analyses (SAGA) GIS and the Geographic Resources Analysis Support System (GRASS) GIS within a single software environment. This and its QGIS Python API makes it a perfect candidate for console-based geoprocessing. By establishing an interface, the R package RQGIS makes it possible to use QGIS as a geoprocessing workhorse from within R. Compared to other packages building a bridge to GIS (e.g., rgrass7, RSAGA, RPyGeo), RQGIS offers a wider range of geoalgorithms, and is often easier to use due to various convenience functions. Finally, RQGIS supports the seamless integration of Python code using reticulate from within R for improved extendability.
The R Journal, 2017

Recent Publications

(2018). Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data. Ecological Modelling (submitted).

Preprint

(2017). RQGIS: Integrating R with QGIS for Statistical Geocomputing. The R Journal.

PDF Project

(2016). Forest DRAGON-3: Decadal trends of Northeastern Forests in China from Earth Observation Synergy. In Proc.‘Dragon 3 Final Results & Dragon 4 Kick-Off Symposium’, Wuhan, PR China.

PDF

Recent blog posts

More Posts

This guide reflects my view on how to setup a working Arch Linux system tailored towards data science, R and spatial analysis. If you have suggestions for modifications, please open an issue at https://github.com/pat-s/antergos_setup_guide. Enjoy the power of Linux!

CONTINUE READING

Maybe you know that for some packages in R there is an entry ‘Package NEWS’ in the help pane of RStudio. However, it is a bit of mistery how to provide this NEWS entry there for maintainers, especially since the recent wide spread use of NEWS.md in R package development.

CONTINUE READING

Introduction Autofs Notes Introduction At work I usually have to connect to several servers. Some are Windows Servers, some are Linux Servers. On my local Linux machines (running Kubuntu 17.10 at the time writing this) I usually used /etc/fstab entries. However, the fstab way does not mount on boot and always needs manual re-mounting. I was told that there have been times in which automatic mounting during boot using fstab was working but I never managed to get it working although I tried several mount options like _netdev and others.

CONTINUE READING

Projects

.science
R package 'mlr' (Contributor)

Machine Learning in R

Forest-DRAGON 3 (2013 - 2016)

ESA-MOST Dragon 3 cooperation program

LIFE Healthy Forest (2015 - 2019)

Early detection and advanced management systems to reduce forest decline caused by invasive and pathogenic agents

R package 'RQGIS' (Author)

Integrating R with QGIS

R package 'oddsratio' (Creator)

Simplified odds ratio calculation of binomial GAM/GLM models

R package 'sperrorest' (Author)

Integrating R with QGIS

Teaching

Munich R Courses (University of Munich)

I contribute to various courses of the Münchner R Kurse hosted by the Computational Stats group of LMU Munich. Currently, I teach the ggplot2 course.

M.Sc. Geoinformatics (University of Jena)

Winter term 20172018

  • Geo 404: Applied Geoinformatics
    • R Introduction
    • Handling of raster data in R
    • Literate Programming in R
    • Big data handling & parallelization in R

B.Sc. Geography (University of Jena)

Winter term 20162017

  • Geo 311 (3rd year B.Sc.): Geoinformatics III
    • Introduction to Python
    • Python for geoprocessing -> arcpy
    • Introduction to scientific reporting using Rmarkdown
    • Tools: R, Python, ArcGIS, arcpy