Automating My Workstation Setup

For as much as I love to daily drive Linux in my main workstation for both work and personal usage, it became a double-edged sword - the tremendous level of customization, which on one hand made my day-to-day more ergonomic and powerful, on the other became an asset I had to manage, with its increased complexity. For instance, formatting my computer to its factory settings became a dreaded task, as I’d have to carefully piece back together the many configuration files and dependencies of my increasingly unstable digital house of cards. It was error-prone, took a lot of time, and forget about replicating the configurations on other machines, or sharing it with other people.

As the adage usually goes, don’t repeat yourself (DRY) - I’ve identified a task I have to manually execute, which can be automated. A single source of truth, a point of control which documents the entire process of configuring my machine, facilitating synchronization between devices, and sharing with others. More importantly, giving me something to write about in this blog.

I’ll mention dotfiles a few times, which in case you don’t know are files prefixed with a dot (such as “.gitignore”). Sometimes used as user configuration files that control how a program behaves, nowadays some of these may or may not begin with a dot. They are a big part of the challenge for configuring a machine after a fresh install, as I’ve collected these files over the years that define keybindings, aliases, font types, visual themes, etc etc.

Through the years, I’ve iterated through, or at least considered, different solutions. I’ll briefly go over some of them, which are worth your consideration.

Home partition

This is the simplest of the bunch - when installing a new distro, always setup the home directory as a separate partition, and keep a copy of it with you at all times. This makes it easy to aggregate your dotfiles, and it creates a single point of control. This separate partition strategy also allows you to do something neat, as mentioned in this reddit post:

I used to have the problem of switching between PC’s and Laptop’s whilst changing my setup/rice almost daily.

Now I just mount an USB stick to my home directory on every device.

(…)

With USB 3.0 there are no noticeable performance differences. Though keep in mind that I don’t have the biggest home directory, its about 2G.

To implement this setup I’ve just added the following to the fstab of every install and gave the USB the label SHARED_HOME.

LABEL=SHARED_HOME /home/user/ ext4 rw,relatime,nofail 0 2

Certainly interesting. This has some nuances, though. First of all, consider backups! It’s pretty dangerous to centralize your work in a feeble external device that will be subject to wear and tear. Additional to this, your performance will take a hit as well, as you’ll be downgrading from the SSD you’d normally be using, but this can be softened if you use an external NVME stick. Separate home partition apologists might say that this gives you flexibility for swapping your home partition between different Linux distributions, but in practice your mileage may vary, as these different distributions may setup programs differently, so don’t expect a “plug and play” experience. If you keep the same distribution between devices, this is a cute way to keep one point of work between, say, a work desktop and a work laptop.

I considered this for a while, but it’s ultimately too limited. It’s a good way to work alone on files between multiple places, but not so much to configure a freshly-setup machine. While it can mount program configuration files, it won’t install the programs for you, nor will it setup systemd services, or configure new users, …

GNU Stow

The first tool I used. It’s mentioned a bunch by people with the same problem as I, and with good reason. It works as a symlink mapper, and it’s extremely simple in nature - a shell tool that takes as input a directory, and creates an exact structure, with symbolic links, to the output destination path. This is very convenient for managing your configuration files - for example, consider this example file structure:

    my_dotfiles/
    ├─ git/
    │  ├─ .gitconfig
    ├─ tmux/
    │  ├─ .tmux.conf
    ├─ htop/
    │  ├─ .config/
    │  │  ├─ htop/
    │  │  │  ├─ htoprc
    ├─ dunst/
    │  ├─ .config/
    │  │  ├─ dunst/
    │  │  │  ├─ dunstrc
    │  ├─ bin/
    │  │  ├─ dunst_pause
    │  │  ├─ dunst_resume
    ├─ i3/
    │  ├─ .i3/
    │  │  ├─ config

There’s a clear separation between programs, in this case git, tmux, htop, dunst, and i3. For each, there’s a directory structure, which stow will directly map out to the target directory - in this case, the home directory. Let’s say we want to configure dunst, we’d run the following command, from within my_dotfiles, in the target machine:

stow --target=~/ -D dunst

This would recursively symlink the content of the dunst/ directory into our home directory. In this case, this would create the symlinks in ~/.config/dunst/dunstrc, ~/bin/dunst_pause and ~/bin/dunst_resume.

I like this approach because, as mentioned, it’s extremely simple. I explained in a few paragraphs how it works, and it doesn’t get much more complicated than that. It handles all your symlink needs, which is useful for both synchronizing user configurations, as well as system files - such as Synaptics touchpad settings.

Its simplicity is also a weakness. Much like the separate home partition solution, it’s good for configuring software configurations, but not much else. In this case at least, you can control which programs are configured at any given time, so it gives some more flexibility. It also suits itself to be hosted in a git repository so you can track all your changes.

Chezmoi

NixOS

The most ambitious of the bunch. You’d have to tie yourself to the distribution NixOS, where a lot of your Linux knowledge may not apply. It definitely does things differently - for example, in regards to how it installs software and other system files, not following the Filesystem Hierarchy Standard one would be used to. It does so however with good reason - it’s a total paradigm shift of how to setup and use your machine.

As its central design philosophy, a declarative configuration system. Configuration files are read and serve as the main input by the build system. These files have a huge range of configuration options, including but not limited to:

What users exist, what are their credentials, where will their home directory be setup
What packages are installed, which are their version, what is their checksum
What system services are setup, enabled, which are their dependencies, running order
What filesystems are used, what are their mountpoints
What firewall rules exist
Which bootloader is used, and what are the settings
What desktop environment is used
etc, etc, etc.

Its declarative and checksum-obsessive approach to system configuration gives you plenty of neat features - your fresh install is guaranteed to be exactly as you’d expect it, and be consistently so between different devices, system upgrades are atomic and thus resilient to failures as they are performed, as you can easily rollback to previous instances. The used package manager, Nix, is also gaining crazy traction, already having more packages than the Arch User Repository (AUR), which itself is known for its large number of packages.

I personally did not choose this way to configure my machine because it’s a very big time and knowledge investment to one very specific way to do things. However, I am tremendously impressed by the way it flips on its head the way by which we configure our devices. It’s certainly a solution I will keep my eye on with great interest.

Ansible

Hopefully you’ve read long enough to reach the solution I ended up using. Ansible is pretty neat. It also takes a declarative approach to system configuration, but instead of building an operating system distribution build based on your configuration, it turns your configuration file into the required instructions to transform the target machine, with an already installed OS, into the desired state. Let’s first start with a traditional approach - here’s a way one could write a Bash script to create a user and a directory:

#!/bin/bash
useradd -m my_user

mkdir -m 0700 /home/my_user/.local/state/bash

Well, it may run on the first time under some pre-requisites, being that neither the user nor the folder already exist - are you sure that will be the case? If so, will it run however on a second run? What if the parent directories leading to the destination directory don’t exist? Our script isn’t idempotent, i.e., it won’t reach the same output on subsequent script calls, nor is it handling, or at the very least, reporting errors.

Okay, let’s change the script a little with some checks and a useful mkdir flag for recursive calls.

#!/bin/bash

if id my_user >/dev/null 2>&1; then
    echo 'user already exists'
else
    useradd -m my_user || { echo "Error creating user" && exit -1; }
fi

mkdir -p -m 0700 /home/my_user/.local/state/bash || { echo "Error creating directories" && exit -1; }

A little too verbose already. One would have to do this for every actions of the same type, or build a bunch of wrapper functions to do it. Never mind the hard-to-digest Bash syntax. This will only get more complicated as you parametrize inputs, add functionality, make additional checks, …

What about the same use case in Ansible? Rather pleasant for my long-term sanity, I would say:

- name: User exists
  become: true
  user:
    name: my_user
    shell: /bin/bash

- name: Bash history directory location exists
  file:
    path: /home/my_user/.local/state/bash/
    state: directory
    mode: 0700

With this approach, you use a yaml file to detail what you want, not how you want it done. Ansible takes your state and translates it into instructions that are idempotent, and logs back which steps it was able to reproduce and which it wasn’t.

Given that system configuration of my workstation mostly boils down to installing packages, configuring files, enabling systems, users, etc, it is well suited for my use-case, and the resulting Ansible playbook is easy to parse, share, and maintain.

I’ve also used it in my DevOps ventures to do similar tasks in development and production machines, requiring also networking and firewall configurations, as well as coordination with Docker to pull images and run containers to run applications or databases. Ansible does all this very well, and this orchestration only requires one dependency on the target machines, which is SSH. It also has a built-in way to reference encrypted secrets within the playbook, when you need to, say, download content from a private source that is password-protected. What a treat!