Antergos/Arch Linux setup guide tailored towards data science, R and spatial analysis

This guide does not claim to be complete. It reflects my view on how to setup a working Arch Linux system tailored towards data science, R and spatial analysis. If you have suggestions for modifications, please open an issue. Enjoy the power of Linux!

Table of Contents

I recommend using Antergos. Officially its a distribution but most people refer to it as a graphical installer for Arch Linux. It comes with the choice of 6 different desktop environments. Choose a desktop that suites you. The desktop environment is responsible for the look, feel and standard applications of your installation. See this comparison for some inspiration.
But don’t worry: You can seamlessly switch between the desktop at the login screen of Antergos. This way you can try out all options and choose the one that suites you most (my favorite is KDE). What makes Antergos a distribution rather than an “installer only” is the fact that it also comes with its own libraries maintained by the Antergos developers.

First, create a installer by following this guide. If you want to set up a dual boot, this guide is a good resource. Make sure to check out the ArchWiki FAQs and Arch compared to other distributions - ArchWiki to get a better understanding of Arch.

1. Installation

1.1 Install options

During installation you have several options to choose from. Some are up to personal liking (e.g. Browser choice), others are important for a solid system:

Whether you want to use the LTS Linux kernel or the most recent one is up to you. I never faced any problems with the most recent one but the LTS one is theoretically the safer option.

1.2 Setting up the partitions

Several valid concepts exists on how to partition a Linux system. The following reflects my current view:

  1. Select “Manual” partitioning when being prompted
  2. Create a partition of 1 GB. Mount point: /boot/efi. Format: fat32
  3. Within the extended partition create a sub-partition of the remaining space (the remaining space should be slightly larger than your RAM size!) (e.g. for 16 GB RAM use 16.5 GB partition size). Format: Linux Swap
  4. Create a 50 - 100 GB GB partition for “root”. Mount point: /. Format: ext4
  5. With the remainng space create “home”. Mount point: /home. Format: ext4

2. Installing the package manager

(Sidenote: If you discover that your input source is missing (e.g. german keyboard layout), then do sudo gedit /etc/locale and remove the comment from your desired locale. Afterwards run sudo locale-gen and the locale is now selectable in “Region & Language”. This happened to me in GNOME Desktop.)

Currently the best one is trizen. Here is a list (AUR helpers - ArchWiki) comparing alternatives (scroll to the bottom).

Install trizen:

git clone https://aur.archlinux.org/trizen-git.git
cd trizen-git
makepkg -si

(Whether to install the git version or the latest release is up to you).

In ~/.config/trizen/trizen.conf set “noedit” to “1” to not being prompted to edit source code on every install. (Optional) Install cyclon -> Wrapper around trizen and other tasks (system maintenance, etc.) .

2.1 (Optional) Install and configure zsh

The zsh (Z-shell) is an alternative to the default installed bash (Bourne-again Shell). It has several advantages (file globbing, visual appearance, etc.). You can of course also stick with bash. However, then you need to adapt the following script to make it work with bash if you want to use all the defined helper functions.

To set it up, do the following (see GitHub - sorin-ionescu/prezto: The configuration framework for Zsh). First, install the “Z-shell”: trizen zsh and use it: zsh. Within the z-shell, execute the following:

git clone --recursive https://github.com/sorin-ionescu/prezto.git "${ZDOTDIR:-$HOME}/.zprezto"

setopt EXTENDED_GLOB
for rcfile in "${ZDOTDIR:-$HOME}"/.zprezto/runcoms/^README.md(.N); do
  ln -s "$rcfile" "${ZDOTDIR:-$HOME}/.${rcfile:t}"
done

chsh -s /bin/zsh

Logout/login. I prefer using the agnoster theme. Simply set theme: agnoster in line 116 of ~/.zpreztorc. Afterwards set up some custom wrapper functions (aliases) around trizen to simplify usage:

In ~/.zshrc, append the following line:

source "${ZDOTDIR:-$HOME}/.zprezto/pac.zsh"

Next, create the following script .zprezto/pac.zsh.
KDE: kate .zprezto/pac.zsh
GNOME: sudo gedit .zprezto/pac.zsh
(Using kate (KDE) or gedit (GNOME) you can also solve all following “file opening/creation” tasks.)

pac () {
  case $* in
    install* ) shift 1; cd ~ && trizen -S "$@";;
    get* ) shift 1; cd ~ && trizen -G --aur "$@" ;;
    remove* ) shift 1; cd ~ && trizen -R --aur "$@" ;;
    search* ) shift 1; cd ~ && trizen -s "$@";;
    update-git* ) shift 1; cd ~ && trizen -Syu --devel --show-ood;;
    update* ) shift 1; cd ~ && trizen -Syu --needed --show-ood;;
    * ) echo "Invalid choice, see ~/.zpreto/pac.zsh for available commands." ;;
  esac
}

Open a new terminal window and the function pac should be available now. You can now call pac with all arguments listed above (install, search, etc.). Check GitHub - trizen/trizen: Lightweight AUR Package Manager for an explanation of the created aliases.

  • pac install <pkg>: Install the specified package (if it exists).
  • pac search <pkg>: Executes a search with the specified <pkg> returning all matches. You can then type a number of the package you want to install. Package will be moved to ~/pkgs.
  • pac update: Update all installed packages (from both Arch repos and AUR). Shows packages which are marked as “out-of-date” by the community.
  • pac update-git: Updates all packages installed from git.

Note: Git packages are always build from source and certain packages may take some time to install. Don’t call that command daily. On the other hand, git packages will never update automatically as they are just a snapshot build of the (at the time of installation) most recent state of the respective repository. So think twice if you need a git package as it is in your responsibility to update it.

A helpful additional argument to the wrapper functions that you could add is --movepkg-dir=pkgs. It will move all built packages (<package.tar.xz>) into ~/pkgs. This has the advantage that you do not need to rebuild a package that took a long time to install if you want to re-install it - just do a pacman -U ~/pkgs/<package.tar.xz>. However, it will store all packages and upgrades that you do - so it can quickly becomes a very large folder that clutters your disk space. Think twice if you need it!

2.2 Enabling parallel compiling

Compiling packages from source can take time. To speed up the process by enabling parallel compiling, set the MAKEVARS variable in /etc/makepkg.conf: MAKEFLAGS="-j$(nproc)". This will use all available cores on your machine for compiling.

3. System related

3.1 Installing system libraries

For the following install calls, you can either use trizen or (if you added the zsh wrapper functions above) pac. While calling trizen <package> will first do a search in AUR and then install the package, the complementary function for this would be pac search <package>. Calling pac install will directly install the given package.

Never install python libraries via pip! All AUR packages try to install required python packages from AUR and if these have been installed via pip you will face conflicts.

Always install them via your package manager, e.g. for numpy: pac install python-numpy.
Python Modules for QGIS: QGIS needs some external python libraries to not throw errors on startup:

pac install python-gdal python-yaml python-jinja python-psycopg2 python-owslib python-numpy python-pygments

Other important system libraries:

  • pac install gdal
  • pac install udunits
  • pac install postgis
  • pac install jdk8-openjdk openjdk8-src (JDK9 still has problems with some R packages)
  • pac install texlive-most (this is a wrapper installation that installs the most important tex libraries. Similar to texlive-full on other Linux distributions.)
  • pac install pandoc-bin pandoc-citeproc-bin (for all kind of Rmarkdown stuff. Make sure to install this library as the one in the community repository comes with 1 GB Haskell dependencies!)
  • pac install hugo (if you are a blogger using the R package blogdown)

3.2 Apps

Opinionated applications:

Messaging: pac install franz
Mail: pac install mailspring
Notes: pac install boostnote
Reference Manager: pac install Jabref
Google Drive: pac install insync
Dropbox: pac install nautilus-dropbox
GIS: pac install qgis SAGA: pac install saga-gis
Skype: pac install skypeforlinux-preview-bin
Screenshot tool: pac install shutter
Image viewer: pac install xnviewmp
Virtualbox: VirtualBox – wiki.archlinux.de
Terminal: pac install tilix
Browser: pac install vivaldi-snapshot
Dock: pac install latte-dock ([KDE only] If you prefer a dock layout over the default layout)
Twitter client: pac install corebird

A note on Mailspring: If you are on battery, quit Mailspring because it consumes a lot of battery as of v1.2.2.

3.3 Editors

Editors are an important topic so I devote an extra section to them. You can use editors to only edit text files but they can also be used as an IDE for coding. There are many editors out there, all loved by a certain amount of people.

Here’s a list of the most common ones (this list does not claim to be complete):

  • Vim
  • Emacs
  • Sublime Text
  • Visual Studio Code
  • Atom
  • Kate (KDE default)
  • Nano
  • etc.

Some are more tailored towards programmers and command-line action, some are more aiming to be a full coding IDE.

3.3.1 Sublime Text 3

Recently I started to use Sublime Text 3 as my system editor. This means all files that are not bound to a different application are opened by SublimeText3. Although SublimeText is not free anymore since v3, you can find free license keys on the web. But it is also usable without a license key. You can find my user settings here. A downside is that is is not capable of displaying HTML or PDF files and does not come with an auto-compile feature if you are editing markdown files. The biggest advantage is probably its speed. Compared to Atom it is much faster in opening larger text files (e.g. logs).

3.3.2 Atom

I use Atom for all text editing + compiling. It is completely free and also relies heavily on user packages. However, finding properly maintained packages is a bit tedious. Once configured correctly, it does a great job for all kind of writing (LaTeX, Markdown) as it has some neat packages for auto-reload of HTML and PDF files. I usually have two panes open: On the left I edit my document and on the right I have the live preview. You can check my settings and discover some packages that you may find useful. (See Visual Studio Code)

3.3.3 Visual Studio Code

I usually try to pass on Microsoft products. But this IDE is by far the best I’ve ever come across. The plugins are awesome AND maintained! Its fast, has a nice look and does everything the way I always hoped that Atom would do it. So give it a try!

3.4 Office

pac install libreoffice-fresh

If you are on a KDE Desktop, Libreoffice may flicker black/white. This is caused by OpenGl.

To solve it, set the value item in following two lines of ~/.config/libreoffice/4/user/registrymodifications.xcu to false:

<item oor:path="/org.openoffice.Office.Common/VCL"><prop oor:name="ForceOpenGL" oor:op="fuse"><value>false</value></prop></item>
<item oor:path="/org.openoffice.Office.Common/VCL"><prop oor:name="UseOpenGL" oor:op="fuse"><value>false</value></prop></item>

Additionally, I recommend to install the “Papirus Icon theme” for Libreoffice: pac install papirus-libreoffice-theme.

4. R

4.1 General

ccache

For fast package (re-)installation using ccache, put the following into ~/.R/Makevars:

VER=
CCACHE=ccache
CC=$(CCACHE) gcc$(VER)
CXX=$(CCACHE) g++$(VER)
C11=$(CCACHE) g++$(VER)
C14=$(CCACHE) g++$(VER)
FC=$(CCACHE) gfortran$(VER)
F77=$(CCACHE) gfortran$(VER)

Additionally, install ccache on your system: pac install ccache. See this blog post by Dirk Eddelbuettel as a reference.

To use R from the shell without a prior defined mirror, you need the system libraries tcl and tk to launch the mirror selection popup (pac install tcl tk).

4.2 R & RStudio

R with optimized Openblas / LAPACK

Next, install either the “Intel MKL” or libopenblas to be used in favor of the standard “libRlapack/libRblas” libraries that are shipped with the default R installation. These libraries are responsible for numerical computations and have impressive speedups compared to the default ones being used by R. Thanks @marcosci for the hint. While the “Intel MKL” library is the fasted according to the benchmarks, its also much more complicated to install.

libopenblas will automatically be used if its installed since the default R installation on Arch is configured with the --with-blas option (see section A.3.1 in https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Installation). I recommend installing the AUR package openblas-lapack as its package cominbing multiple libraries: pac install openblas-lapack.

To verify your installation in R, simply run sessionInfo() and check the printed information:

sessionInfo()

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblas_haswellp-r0.2.20.so

Nevertheless, if you want to try out the “Intel-MKL” library, follow these instructions:

There is an AUR package that provides R compiled with intel-mkl named r-mkl. Note: The download size of intel-mkl is around 4 GB and takes a lot of memory during installation. Most of it will stored in the swap (around 10 GB) so make sure your SWAP space is > 10 GB.

Also to successfully install intel-mkl, you need to temporarly increase the /tmp directory as intel-mkl requires quite some space: sudo mount -o remount,size=20G,noatime /tmp.

RStudio

Use pac search rstudio and pick your favorite release channel. During installation R will get installed as a dependency (if you have not already done so).

4.3 Packages

Open RStudio and install the R package usethis (it will install quite a few dependencies, get a coffee :D) and then call usethis::browse_github_pat(). Follow the instructions to set up a valid GITHUB_PAT environment variable that will be used for installing packages from Github.

4.3.1 Task view “Spatial”

Of course you it is not required to install all packages of a task view. You will never use all packages of a task view. In my opinion, however, it is pretty neat to have one command that installs (almost) all packages I use of a certain field. I do not care about the additional packages installed.

Required system libraries:

  • jq (pac install jq)
  • fortran (pac install gcc-fortran)
  • v8-3.14 (pac install v8-3.14)
  • tk (pac install tk)
  • nlopt (pac install nlopt)
  • gsl (pac install gsl)

Some R packages (geojsonlite, etc.) require the V8 package which depends on the outdated v8-314 library.

For rJava we need to do sudo R CMD javareconf.

Now you can install the ctv package and then do ctv::install.views("Spatial"). This will install all packages listed in the spatial task view.

Packages that error during installation (Please report back if you have a working solution):

  • ProbitSpatial
  • spaMM
  • RPyGeo (Windows only)

4.3.2 Task view “Machine Learning”

Required system libraries:

  • pac install nlopt

Packages that error during installation (Please report back if you have a working solution):

  • interval (requires Icens from Bioconductor) LTRCtrees (requires Icens from Bioconductor)

4.4 Github repos

I like the command line way of creating a repo from Github using the usethis package. To make this work, we need to make some prior steps:

  1. Make sure that the local directory in which you want to store your Github repos has 777 permissions. This usually is not the case if you create the directory. If the permissions are wrong, usethis::create_from_github() will not be able to write files there. Example: sudo chmod 777 ~/git
  2. Make sure your ssh keys have the right permissions: sudo chmod 600 ~/.ssh/id_rsa, sudo chmod 644 ~/.id_rsa.pub
  3. Add your ssh-key to the keychain: ssh-add -K ~/.ssh/id_rsa
  4. For me, the sshaskpass does not work even though everything seems to be set up correctly. That’s why I always have to hand over the information manually when using create_from_github(). To simplify this process, I have the following information in my ~/.Rprofile:

cred <- git2r::cred_ssh_key(publickey = "~/.ssh/id_rsa.pub", privatekey = "~/.ssh/id_rsa")

This object is then passed to the credentials argument in create_from_github().

Now clone all your repos from Github, e.g. create_from_github(repo = "pat-s/oddsratio", destdir = "~/git", cred = credentials).

The little overhead is really worth it: You have a working ssh setup and by reusing the command and just replacing the repo name the cloning off all your repos is done within minutes!

5. Accessing remote servers

5.1 File access (file manager)

There are multiple ways to do so (Auto-mount network shares (cifs, sshfs, nfs) on-demand using autofs | Patrick Schratz, fstab - ArchWiki).

Here is an example of a fstab setup for a sshfs (to Linux server) and cifs (to Windows server) mount. Append those lines to /etc/fstab; don’t overwrite the existing content as this will result in boot errors otherwise!

# sshfs
sshfs#<username>@<ip>:<remote mount point> <local mount point> fuse        reconnect,idmap=user,transform_symlinks,identityFile=~/.ssh/id_rsa,allow_other,cache=yes,kernel_cache,compression=no,default_permissions,uid=1000,gid=100,umask=0,_netdev,x-systemd.after=network-online.target   0 0

# cifs
//<ip>/<remote mount point> <local mount point> cifs        credentials=/etc/.smbcredentials.txt,uid=1000,file_mode=0775,dir_mode=0775,gid=100,sec=ntlm,vers=1.0,dom=ads.uni-jena.de,forcegid,_netdev,x-systemd.after=network-online.target 0 0

Notes:

  • (cifs) Depending how new the Windows server is, you do not need vers=1.0.
  • (cifs) Store your login credentials for the windows server in a file, e.g. /etc/.smbcredentials.txt with contents being username = <username> and password = <password>.
  • (sshfs) Copy .ssh/id_rsa to root/.ssh/ as the mount will be executed by the root user.
  • (cifs) Install the Arch Linux kernel headers for the cifs package to work (and later on for Virtualbox): pac install linux-headers

Reboot.

5.2 Command-line access (Terminal)

You can easily connect to all servers you have access to with a little one-time effort.

Terminal applications are capable of storing “Profiles” that save the configuration to connect to a specific server. In this example I refer to the application tilix.

  1. Create a profile for each server
  2. Under <profile name> -> Command -> [x] Run a custom command instead of my shell put your ssh command in, e.g. ssh <username>@<servername>.
  3. Open all profiles in tabs: Select the desired profile and click “New session” (the left one of the three buttons at the top).
  4. Save each open profile with save as in a folder of your liking.
  5. Create a wrapper script that loads exactly this configuration:

#!/bin/bash

TILIX_SESSIONS_FOLDER=<path to folder with all saved configuration files>

TILIX_OPTS=""

for session in $TILIX_SESSIONS_FOLDER/*; do
  TILIX_OPTS="$TILIX_OPTS -s $session"
done

tilix $TILIX_OPTS

For convenience, set the execution of this script as an alias in ~/.zpreztorc:

alias servers='bash ~/tilix_all.sh'

Now all you need to do is typing servers to load a terminal configuration with all your server connections.

6. Desktop related

KDE

If you want to use an automatic login to a VPN and the networkmanager-daemon (e.g. Openconnect) does not store your password, try the network-manager-applet package. It is the GNOME network-manager and has for some reason no problems with storing the password.

7. Laptop battery life optimization

Although the Linux kernel has a lot of power saving options, they are not all enabled by default.

There are two main power optimization tools:

  • Powertop
  • TLP

I prefer tlp as powertop often causes trouble with USB devices going into sleep mode. Also, applying the changes on boot is easier with tlp.

Do pac install tlp and then follow the instructions on TLP - ArchWiki to configure it correctly. powertop though is useful to check the applied settings. Do sudo powertop and go to the “tunables” section and check if most settings are “GOOD” (most are “BAD” before applying tlp).

8. Additional stuff

8.1 arara

GitHub - cereda/arara: arara is a TeX automation tool based on rules and directives. An automatization tool for TeX: pac install arara-git

8.2 latexindent.pl: Required perl modules

latexindent is a library which automatically indents your LaTeX document during compilation: GitHub - cmhughes/latexindent.pl

pac install perl-log-dispatch perl-dbix-log4perl perl-file-homedir perl-unicode-linebreak

8.3 Editor schemes

I use the Dracula scheme in almost all applications. While its comes integrated into RStudio, here are installation instructions for Kate and Tilix.

8.4 Fonts

I enjoy using Fira Code. I use it as a coding font in all editors (monospace ftw) but also as a system wide font (the “medium” variant) with size 10. Another great monospace coding font is Iosevka.

pac search fira-code

In browsers I like to use Asana math.

8.5 Icon themes

There are two awesome icon themes: Papirus and numix.

Try them and choose for yourself. You will see what a tremendous impact good icons can have on your daily work.

8.6 Desktop Themes

I am using KDE. My overall desktop theme favorite is “Adapta”. Set it via “System Settings -> Workspace theme -> Desktop Theme”. For “Look and Feel” I prefer “Arc Dark”.

To install these, simply click on “Get new looks” on the bottom right when you are in “System Settings -> Workspace theme”.

8.7 Presentations

To create presentations I use the R package xaringan. Usually I convert the resulting HTML slides to PDF using decktape (install with pac install nodejs-decktape) and present the talk using impressive (install with pac install impressive).

Related

Previous
comments powered by Disqus