Feuerfest

Just the private blog of a Linux sysadmin

Using and configuring unattended-upgrades under Debian Bookworm - Part 2: Practise

Photo by Markus Winkler: https://www.pexels.com/photo/the-word-update-is-spelled-out-in-scrabble-tiles-18524143/

Preface

At the time of this writing Debian Bookworm is the stable release of Debian. It utilizes Systemd timers for all automation tasks such as the updating of the package lists and execution of the actual apt-get upgrade. Therefore we won't need to configure APT-Parameters in files like /etc/apt/apt.conf.d/02periodic. In fact some of these files don't even exist on my systems. Keep that in mind if you read this article along with others, who might do things differently - or for older/newer releases of Debian.

Part 1 where I talk about the basics and prerequisites of unattended-upgrades along with many questions you should have answered prior using it is here: Using and configuring unattended-upgrades under Debian Bookworm - Part 1: Preparations

Note: As it took me considerably longer than expected to write this post please ignore discrepances in timestamps and versions.

Installing unattended-upgrades - however it's (most likely) not active yet

Enough with theory, let's switch to the shell. The installation is rather easy, a simple apt-get install unattended-upgrades is enough. However if you run the installation in an interactive way like this unattended-upgrades isn't configured to run automatically. The file /etc/apt/apt.conf.d/20auto-upgrades is missing. So check if it is present!

Note: That's one reason why you want to set the environment variable export DEBIAN_FRONTEND=noninteractive prior to the installation in your Ansible Playbooks/Puppet Manifest/Runbooks/Scripts, etc. or execute an dpkg-reconfigure -f noninteractive unattended-upgrades after the installation. Of course placing the file also solves the problem.😉

If you want to re-configure unattended-upgrades manually execute: dpkg-reconfigure unattended-upgrades and select yes at the prompt. But I advise you not just do it right yet.

root@host:~# apt-get install unattended-upgrades
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  gir1.2-glib-2.0 libgirepository-1.0-1 python3-dbus python3-distro-info python3-gi
Suggested packages:
  python-dbus-doc bsd-mailx default-mta | mail-transport-agent needrestart powermgmt-base
The following NEW packages will be installed:
  gir1.2-glib-2.0 libgirepository-1.0-1 python3-dbus python3-distro-info python3-gi unattended-upgrades
0 upgraded, 6 newly installed, 0 to remove and 50 not upgraded.
Need to get 645 kB of archives.
After this operation, 2,544 kB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://debian.tu-bs.de/debian bookworm/main amd64 libgirepository-1.0-1 amd64 1.74.0-3 [101 kB]
Get:2 http://debian.tu-bs.de/debian bookworm/main amd64 gir1.2-glib-2.0 amd64 1.74.0-3 [159 kB]
Get:3 http://debian.tu-bs.de/debian bookworm/main amd64 python3-dbus amd64 1.3.2-4+b1 [95.1 kB]
Get:4 http://debian.tu-bs.de/debian bookworm/main amd64 python3-distro-info all 1.5+deb12u1 [6,772 B]
Get:5 http://debian.tu-bs.de/debian bookworm/main amd64 python3-gi amd64 3.42.2-3+b1 [219 kB]
Get:6 http://debian.tu-bs.de/debian bookworm/main amd64 unattended-upgrades all 2.9.1+nmu3 [63.3 kB]
Fetched 645 kB in 0s (1,618 kB/s)
Preconfiguring packages ...
Selecting previously unselected package libgirepository-1.0-1:amd64.
(Reading database ... 33397 files and directories currently installed.)
Preparing to unpack .../0-libgirepository-1.0-1_1.74.0-3_amd64.deb ...
Unpacking libgirepository-1.0-1:amd64 (1.74.0-3) ...
Selecting previously unselected package gir1.2-glib-2.0:amd64.
Preparing to unpack .../1-gir1.2-glib-2.0_1.74.0-3_amd64.deb ...
Unpacking gir1.2-glib-2.0:amd64 (1.74.0-3) ...
Selecting previously unselected package python3-dbus.
Preparing to unpack .../2-python3-dbus_1.3.2-4+b1_amd64.deb ...
Unpacking python3-dbus (1.3.2-4+b1) ...
Selecting previously unselected package python3-distro-info.
Preparing to unpack .../3-python3-distro-info_1.5+deb12u1_all.deb ...
Unpacking python3-distro-info (1.5+deb12u1) ...
Selecting previously unselected package python3-gi.
Preparing to unpack .../4-python3-gi_3.42.2-3+b1_amd64.deb ...
Unpacking python3-gi (3.42.2-3+b1) ...
Selecting previously unselected package unattended-upgrades.
Preparing to unpack .../5-unattended-upgrades_2.9.1+nmu3_all.deb ...
Unpacking unattended-upgrades (2.9.1+nmu3) ...
Setting up python3-dbus (1.3.2-4+b1) ...
Setting up libgirepository-1.0-1:amd64 (1.74.0-3) ...
Setting up python3-distro-info (1.5+deb12u1) ...
Setting up unattended-upgrades (2.9.1+nmu3) ...

Creating config file /etc/apt/apt.conf.d/50unattended-upgrades with new version
Created symlink /etc/systemd/system/multi-user.target.wants/unattended-upgrades.service → /lib/systemd/system/unattended-upgrades.service.
Synchronizing state of unattended-upgrades.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable unattended-upgrades
Setting up gir1.2-glib-2.0:amd64 (1.74.0-3) ...
Setting up python3-gi (3.42.2-3+b1) ...
Processing triggers for man-db (2.11.2-2) ...
Processing triggers for libc-bin (2.36-9+deb12u3) ...
root@host:~# 

After the installation we have a new Systemd service called unattended-upgrades.service however, if you think that stopping this service will disable unattended-upgrades you are mistaken. This unit-file solely exists to check if an unattended-upgrades run is in progress and ensure it isn't killed mid-process, for example, during a shutdown.

To disable unattended-upgrades we need to change the values in /etc/apt/apt.conf.d/20auto-upgrades to zero. But as written above: This file currently isn't present in our system yet.

root@host:~# ls -lach /etc/apt/apt.conf.d/20auto-upgrades
ls: cannot access '/etc/apt/apt.conf.d/20auto-upgrades': No such file or directory

However, as we want to have a look at the internals first, we do not fix this yet. Instead, let us have a look at the relevant Systemd unit and timer-files.

The Systemd unit and timer files unattended-upgrades relies upon

An systemctl list-timers --all will show you all in-/active timer files on our system along with the unit-file which is triggered by the timer. On every Debian system utilizing Systemd you will most likely have the following unit and timers files per-default. Even when unattended-upgrades is not installed.

root@host:~# systemctl list-timers --all
NEXT                         LEFT          LAST                         PASSED       UNIT                         ACTIVATES
Sat 2024-10-26 00:00:00 CEST 15min left    Fri 2024-10-25 00:00:00 CEST 23h ago      dpkg-db-backup.timer         dpkg-db-backup.service
Sat 2024-10-26 00:00:00 CEST 15min left    Fri 2024-10-25 00:00:00 CEST 23h ago      logrotate.timer              logrotate.service
Sat 2024-10-26 06:39:59 CEST 6h left       Fri 2024-10-25 06:56:26 CEST 16h ago      apt-daily-upgrade.timer      apt-daily-upgrade.service
Sat 2024-10-26 10:27:42 CEST 10h left      Fri 2024-10-25 08:18:00 CEST 15h ago      man-db.timer                 man-db.service
Sat 2024-10-26 11:13:30 CEST 11h left      Fri 2024-10-25 21:06:00 CEST 2h 38min ago apt-daily.timer              apt-daily.service
Sat 2024-10-26 22:43:26 CEST 22h left      Fri 2024-10-25 22:43:26 CEST 1h 0min ago  systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Sun 2024-10-27 03:10:37 CET  1 day 4h left Sun 2024-10-20 03:11:00 CEST 5 days ago   e2scrub_all.timer            e2scrub_all.service
Mon 2024-10-28 01:12:51 CET  2 days left   Mon 2024-10-21 01:40:06 CEST 4 days ago   fstrim.timer                 fstrim.service

9 timers listed.

Relevant are apt-daily.timer which activates apt-daily.service. This unit-file will execute /usr/lib/apt/apt.systemd.daily update. The script takes care of reading the necessary APT parameters and performing an apt-get update to update the package lists.

The timer apt-daily-upgrade.timer triggers the service apt-daily-upgrade.service. The unit-file will perform an /usr/lib/apt/apt.systemd.daily install and this is where the actual "apt-get upgrade-magic" happens. If the appropriate APT-values are set outstanding updates will be downloaded and installed.

Notice the absence of a specific unattended-upgrades unit-file or timer as unattended-upgrades is really just a script to automate the APT package system.

This effectively means: You are able to configure when package-list updates will be done and updates are installed by modifying the OnCalendar= parameter via drop-in files for the apt-daily(-upgrade).timer files.
The Systemd.timer documentation and man 7 systemd.time (systemd.time documentation) have all the glorious details.

I wont go into further detail regarding the /usr/lib/apt/apt.systemd.daily script. If you want to know more I recommend executing the script with bash's -x parameter (also called debug mode). This way commands and values are printed out as the script is run.

root@host:~# bash -x /usr/lib/apt/apt.systemd.daily update
# Read the output, view the script, trace the parameters and then execute:
root@host:~# bash -x /usr/lib/apt/apt.systemd.daily lock_is_held update

Or if you are more interested in the actual "How are the updates installed?"-part, perform the following:

root@host:~# bash -x /usr/lib/apt/apt.systemd.daily install
# Read the output, view the script, trace the parameters and then execute:
root@host:~# bash -x /usr/lib/apt/apt.systemd.daily lock_is_held install

It's a good lesson in understanding APT-internals/what Debian is running "under the hood".

But again: I advise you to make sure that no actual updates are installed when you do so. In order to learn how to make sure read along. 😇

How does everything work together? What makes unattended-upgrades being unattended?

Let us recapitulate. We installed the unattended-upgrades package, checked the relevant Systemd unit & timer files and had a brief look at the /usr/lib/apt/apt.systemd.daily script which is responsible for triggering the appropriate APT and unattended-upgrades commands.

APT itself is configured via the file in /etc/apt/apt.conf.d/. How can APT (or any human) know the setting of a specific parameter? Sure, you can grep through all the files - but that would also most likely include files with syntax errors etc.

Luckily there is apt-config this allows us to query APT and read specific values. This also makes sure of validating everything. If you've configured an APT-parameter in a file but apt-config doesn't reflect this - it's simply not applied and you must start to search where the error is.

Armed with this knowledge we can execute the following two commands to check if our package-lists will be updated automatically via the apt-daily.service and if unattended-upgrades will install packages. If there is no associated value, apt-config won't print out anything.

The two settings which are set inside /etc/apt/apt.conf.d/20auto-upgrades are APT::Periodic::Update-Package-Lists and APT::Periodic::Unattended-Upgrade. The first activates the automatic package-list updates while the later enables the automatic installation of updates. If we check them on our system with the manually installed unattended-upgrades package we will get the following:

root@host:~# apt-config dump APT::Periodic::Update-Package-Lists
root@host:~# apt-config dump APT::Periodic::Unattended-Upgrade

This means no package-list updates, no installation of updates. And currently this is what we want to keep experimenting a little bit before we are ready to hit production.

A working unattended-upgrades will give the following values:

root@host:~# apt-config dump APT::Periodic::Update-Package-Lists
APT::Periodic::Update-Package-Lists "1";
root@host:~# apt-config dump APT::Periodic::Unattended-Upgrade
APT::Periodic::Unattended-Upgrade "1";

First dry-run

Time to start our first unattended-upgrades dry-run. This way we can watch what would be done without actually modifying anything. I recommend utilizing the -v parameter in addition to --dry-run as else the following first 8 lines from the unattended-upgrades output itself will be omitted. Despite them being the most valuable ones for most novice users.

root@host:~# unattended-upgrades --dry-run -v
Checking if system is running on battery is skipped. Please install powermgmt-base package to check power status and skip installing updates when the system is running on battery.
Starting unattended upgrades script
Allowed origins are: origin=Debian,codename=bookworm,label=Debian, origin=Debian,codename=bookworm,label=Debian-Security, origin=Debian,codename=bookworm-security,label=Debian-Security
Initial blacklist:
Initial whitelist (not strict):
Option --dry-run given, *not* performing real actions
Packages that will be upgraded: base-files bind9-dnsutils bind9-host bind9-libs dnsutils git git-man initramfs-tools initramfs-tools-core intel-microcode libc-bin libc-l10n libc6 libc6-i386 libcurl3-gnutls libexpat1 libnss-systemd libpam-systemd libpython3.11-minimal libpython3.11-stdlib libssl3 libsystemd-shared libsystemd0 libudev1 linux-image-amd64 locales openssl python3.11 python3.11-minimal qemu-guest-agent systemd systemd-sysv systemd-timesyncd udev
Writing dpkg log to /var/log/unattended-upgrades/unattended-upgrades-dpkg.log
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure --recursive /tmp/apt-dpkg-install-o5g9u2
/usr/bin/dpkg --status-fd 10 --no-triggers --configure libsystemd0:amd64 libsystemd-shared:amd64 systemd:amd64
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/systemd-sysv_252.30-1~deb12u2_amd64.deb /var/cache/apt/archives/udev_252.30-1~deb12u2_amd64.deb /var/cache/apt/archives/libudev1_252.30-1~deb12u2_amd64.deb
/usr/bin/dpkg --status-fd 10 --no-triggers --configure libudev1:amd64
/usr/bin/dpkg --status-fd 10 --configure --pending
Preconfiguring packages ...
Preconfiguring packages ...
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libc6-i386_2.36-9+deb12u8_amd64.deb /var/cache/apt/archives/libc6_2.36-9+deb12u8_amd64.deb
/usr/bin/dpkg --status-fd 10 --no-triggers --configure libc6:amd64
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libc-bin_2.36-9+deb12u8_amd64.deb
/usr/bin/dpkg --status-fd 10 --no-triggers --configure libc-bin:amd64
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libc-l10n_2.36-9+deb12u8_all.deb /var/cache/apt/archives/locales_2.36-9+deb12u8_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/intel-microcode_3.20240813.1~deb12u1_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/python3.11_3.11.2-6+deb12u3_amd64.deb /var/cache/apt/archives/libpython3.11-stdlib_3.11.2-6+deb12u3_amd64.deb /var/cache/apt/archives/python3.11-minimal_3.11.2-6+deb12u3_amd64.deb /var/cache/apt/archives/libpython3.11-minimal_3.11.2-6+deb12u3_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/git_1%3a2.39.5-0+deb12u1_amd64.deb /var/cache/apt/archives/git-man_1%3a2.39.5-0+deb12u1_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/base-files_12.4+deb12u7_amd64.deb
/usr/bin/dpkg --status-fd 10 --no-triggers --configure base-files:amd64
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/initramfs-tools_0.142+deb12u1_all.deb /var/cache/apt/archives/initramfs-tools-core_0.142+deb12u1_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/bind9-dnsutils_1%3a9.18.28-1~deb12u2_amd64.deb /var/cache/apt/archives/bind9-host_1%3a9.18.28-1~deb12u2_amd64.deb /var/cache/apt/archives/bind9-libs_1%3a9.18.28-1~deb12u2_amd64.deb /var/cache/apt/archives/dnsutils_1%3a9.18.28-1~deb12u2_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libexpat1_2.5.0-1+deb12u1_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/qemu-guest-agent_1%3a7.2+dfsg-7+deb12u7_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libssl3_3.0.14-1~deb12u2_amd64.deb /var/cache/apt/archives/openssl_3.0.14-1~deb12u2_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libcurl3-gnutls_7.88.1-10+deb12u7_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/linux-image-6.1.0-26-amd64_6.1.112-1_amd64.deb /var/cache/apt/archives/linux-image-amd64_6.1.112-1_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
All upgrades installed
The list of kept packages can't be calculated in dry-run mode.
root@host:~#

We get told which configured origins will be used (see part 1), all packages on the black- and whitelist and which actual packages will be upgraded. Just like if you are executing apt-get upgrade manually.

Also there is the neat line Writing dpkg log to /var/log/unattended-upgrades/unattended-upgrades-dpkg.log informing us of a logfile being written. How nice!

The dpkg-commands are what normally happens in the background to install and configure the packages. In 99% of all cases this is irrelevant. Nevertheless they become invaluable when an update goes sideways or dpkg/apt can't properly configure a package.

Where do we find this information afterwards?

Logfiles

There are two logfiles being written:

  1. /var/log/unattended-upgrades/unattended-upgrades-dpkg.log
  2. /var/log/unattended-upgrades/unattended-upgrades.log

And how do they differ?

/var/log/unattended-upgrades/unattended-upgrades-dpkg.log has all output from dpkg/apt itself. You remember all these Preparing to unpack... packagename, Unpacking packagename, Setting up packagename lines? The lines you encounter when you execute apt-get manually? Who weren't present in the output from our dry-run? This gets logged into that file. So if you are wondering when a certain packages was installed or what went wrong, this is the logfile to look into.

/var/log/unattended-upgrades/unattended-upgrades.log contains the output from unattended-upgrades itself. This makes it possible to check what allowed origins/blacklists/whitelists, etc. were used in a run. Conveniently the output also includes the packages which are suitable for upgrading.

Options! Give me options!

Now that we know how we can execute a dry-run and retrace what happened it's time to have a look at the various options unattended-upgrades offers.

I recommend reading the config file /etc/apt/apt.conf.d/50unattended-upgrades once completely as the various options to fine-tune the behaviour are listed at the end. If you want unattended-upgrades to send mails, or only install updates on shutdown/reboot this is your way to go.

Or do you want an automatic reboot after unattended-upgrades has done its job (see: Unattended-Upgrade::Automatic-Reboot)? Ensuring the new kernel and system libraries are instantly used? This is your way to go.

Adding the Proxmox Repository to unattended-upgrades

Enabling updates for packages in different repositories means we have to add a new Repository to unattended-upgrades first. Using our knowledge from Part 1 and looking at the Release file for the Proxmox pve-no-subscription Debian Bookworm repository we can build the following origins-pattern:

"origin=Proxmox,codename=${distro_codename},label=Proxmox Debian Repository,a=stable";

A new dry-run will show us that the packages proxmox-default-kernel and proxmox-kernel-6.5 will be upgraded.

root@host:~# unattended-upgrades -v --dry-run
Checking if system is running on battery is skipped. Please install powermgmt-base package to check power status and skip installing updates when the system is runnin on battery.
Checking if connection is metered is skipped. Please install python3-gi package to detect metered connections and skip downloading updates.
Starting unattended upgrades script
Allowed origins are: origin=Debian,codename=bookworm,label=Debian, origin=Debian,codename=bookworm,label=Debian-Security, origin=Debian,codename=bookworm-security,labl=Debian-Security, origin=Proxmox,codename=bookworm,label=Proxmox Debian Repository,a=stable
Initial blacklist:
Initial whitelist (not strict):
Option --dry-run given, *not* performing real actions
Packages that will be upgraded: proxmox-default-kernel proxmox-kernel-6.5
Writing dpkg log to /var/log/unattended-upgrades/unattended-upgrades-dpkg.log
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/proxmox-kernel-6.8.8-2-pve-signed_6.8.8-2_amd64.deb /var/cache/apt/archves/proxmox-kernel-6.8_6.8.8-2_all.deb /var/cache/apt/archives/proxmox-default-kernel_1.1.0_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/proxmox-kernel-6.5.13-5-pve-signed_6.5.13-5_amd64.deb /var/cache/apt/arhives/proxmox-kernel-6.5_6.5.13-5_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
All upgrades installed
The list of kept packages can't be calculated in dry-run mode.

In the next step we will use this to see how blacklisting works.

Note: Enabling unattended-upgrades for the Proxmox packages on the Proxmox host itself is a controversial topic. In clustered & productive setups I wouldn't recommend it too. But given this is my single-node, homelab Proxmox I have no problem with it.

However I have no problem with enabling OS updates as this is no difference from what Proxmox themselves does/recommends: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#system_software_updates

Blacklisting packages

Excluding a certain package from the automatic updates can be done in two different ways. Either you don't include the repository into the Unattended-Upgrade::Origins-Pattern block - effectively excluding all packages in the repository. Or you exclude only single packages by listing them inside the Unattended-Upgrade::Package-Blacklist block. The configuration file has examples for the various patterns needed (exact match, name beginning with a certain string, escaping, etc.)

The following line will blacklist all packages starting with proxmox- in their name.

Unattended-Upgrade::Package-Blacklist {
    // The following matches all packages starting with linux-
    //  "linux-";

    // Blacklist all proxmox- packages
    "proxmox-";

    [...]
};

Executing a dry-run again will give the following result:

root@host:~# unattended-upgrades -v --dry-run
Checking if system is running on battery is skipped. Please install powermgmt-base package to check power status and skip installing updates when the system is running on battery.
Checking if connection is metered is skipped. Please install python3-gi package to detect metered connections and skip downloading updates.
Starting unattended upgrades script
Allowed origins are: origin=Debian,codename=bookworm,label=Debian, origin=Debian,codename=bookworm,label=Debian-Security, origin=Debian,codename=bookworm-security,label=Debian-Security, origin=Proxmox,codename=bookworm,label=Proxmox Debian Repository,a=stable
Initial blacklist: proxmox-
Initial whitelist (not strict):
No packages found that can be upgraded unattended and no pending auto-removals
The list of kept packages can't be calculated in dry-run mode.

"The following packages have been kept back" - What does this mean? Will unattended-upgrades help me?

Given is the following output from a manual apt-get upgrade run.

root@host:~# apt-get upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages have been kept back:
  proxmox-default-kernel proxmox-kernel-6.5
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.

The message means that the packages are part of so-called PhasedUpdates. Sadly documentation regarding this in Debian is a bit sparse. The manpage for apt_preference(5) has a paragraph about it but that's pretty much it.

Ubuntu has a separate Wiki article about it, as their update manager honors the setting: https://wiki.ubuntu.com/PhasedUpdates

What it means is that some updates won't be rolled out to all servers immediately. Instead a random number is generated which determines if the update is applied or not.

Quote from apt_preferences(5):
"A system's eligibility to a phased update is determined by seeding random number generator with the package source name, the version number, and /etc/machine-id, and then calculating an integer in the range [0, 100]. If this integer is larger than the Phased-Update-Percentage, the version is pinned to 1, and thus held back. Otherwise, normal policy rules apply."

Unattended-upgrades however will ignore this and install the updates. If that is good or bad depends on your situation. Luckily there was some recent activity in the GitHub Issue regarding this topic, so it may be resolved soon-ish: unattended-upgrades on GitHub, issue 259: Please honor Phased-Update-Percentage for package versions

Enabling unattended-upgrades

Now it's finally time to enable unattended-upgrades for our system. Execute dpkg-reconfigure -f noninteractive unattended-upgrades and check that the /etc/apt/apt.conf.d/20auto-upgrades file is present with the following content:

root@host:~# cat /etc/apt/apt.conf.d/20auto-upgrades
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";

After that verify that APT has indeed the correct values for those parameters:

root@host:~# apt-config dump APT::Periodic::Update-Package-Lists
APT::Periodic::Update-Package-Lists "1";
root@host:~# apt-config dump APT::Periodic::Unattended-Upgrade
APT::Periodic::Unattended-Upgrade "1";

You are then able to determine when the first run will happen by checking via systemctl list-timers apt-daily.timer apt-daily-upgrade.timer.

root@host:~# systemctl list-timers apt-daily.timer apt-daily-upgrade.timer
NEXT                        LEFT          LAST                        PASSED  UNIT                    ACTIVATES
Sat 2024-10-26 03:39:58 UTC 16min left    Fri 2024-10-25 09:12:58 UTC 18h ago apt-daily.timer         apt-daily.service
Sat 2024-10-26 06:21:30 UTC 2h 58min left Fri 2024-10-25 06:24:40 UTC 20h ago apt-daily-upgrade.timer apt-daily-upgrade.service

Set yourself a reminder in your calendar to check the logfiles and enjoy your automatic updates!

How to check the health of a Debian mirror

Remember how I, in part 1 of this series, mentioned that Debian mirrors are rarely out-of-sync, etc.? Yep, and now it happened while I was typing this post. The perfect opportunity to show you how to check your Debian mirror.

The error I got was the following:

root@host:~# apt-get update
Hit:1 http://security.debian.org/debian-security bookworm-security InRelease
Hit:2 http://debian.tu-bs.de/debian bookworm InRelease
Get:3 http://debian.tu-bs.de/debian bookworm-updates InRelease [55.4 kB]
Reading package lists... Done
E: Release file for http://debian.tu-bs.de/debian/dists/bookworm-updates/InRelease is expired (invalid since 16d 0h 59min 33s). Updates for this repository will not be applied.

The reason is that the InRelease file contains a Valid-Until timestamp. And currently it has the following value:
Valid-Until: Sat, 22 Jun 2024 20:12:12 UTC

As this error originates on a remote host there is nothing I can do apart from switching to another Debian repository.

But how can you actually check that? https://mirror-master.debian.org/status/mirror-status.html lists the status of all Debian Mirrors. If we search for debian.tu-bs.de we can see that the mirror indeed has problems and since when the error exists.

Side note: Checking the mirror hierarchy can also be relevant sometimes: https://mirror-master.debian.org/status/mirror-hierarchy.html

Also.. 22 days without noticing that the Debian mirror I use is broken? Yeah.. Let's define a monitoring check for that. Something I didn't do for my setup at home.

As this would make this article unnecessarily larger I made a separate blog post about it. So feel free to read this next: How to monitor your APT-repositories with Icinga

Comments

Icinga2 error "check command does not exist" because of missing constant

Photo by Christina Morillo: https://www.pexels.com/photo/software-engineer-standing-beside-server-racks-1181354/

Apparently this problem kept me busy far too long, as I kept looking into the Icinga2 Master logfiles only. Main due to the service definition for my icinga CheckCommand still being from a time when it was only one Master without any Agents. This lead to it being executed on the Master and hence I never saw the problems on the agent..

Additionally the cluster and cluster-health checks only check if all endpoints are connected. Which was the case all the time. Therefore I got no error there too.

But what happened?

I defined a new CheckCommand. It worked fine on the master. Then I re-rewrote the service apply-Rule so that it matches for all Linux hosts being monitored. And then I got Check command not found for all these new service checks on all agent hosts.

I deleted the API config sync directories and restarted Icinga2 on the agents to trigger a new sync:

root@agent:/etc/icinga2# rm /var/lib/icinga2/api/zones-stage/* -rf && rm /var/lib/icinga2/api/zones/* -rf
root@agent:/etc/icinga2# systemctl restart icinga2.service

And suddenly all CheckCommands which are not part of the Icinga Template Library stopped working on the agents.

Uhm, ok. At this point I suspected I had somehow messed up my /etc/icinga2/zones.conf file some time ago. Turns out, this wasn't the case.

The root cause

Some weeks ago I defined a service check which is only executed on my Icinga2 master. However I stored the CheckCommand and Service-Configuration under /etc/icinga2/zones.d/master anyway as you never know when this comes in handy. (This has since been corrected in the article.) But the Telegram API requires a Token. And I defined that in /etc/icinga2/constants.conf - but this file isn't synced as it is outside of /etc/icinga2/zones.d/master. Something which I did on purpose, as I didn't want to sync the Token to all agents.

This apparently caused the config file sync to run into an syntax error as the constant for the Token couldn't be resolved.
But again.. This was only logged in the logfiles on the agents..

root@agent:/etc/icinga2# cat /var/log/icinga2/icinga2.log
[...]
[2024-07-17 22:39:04 +0200] information/ApiListener: Received configuration for zone 'global-templates' from endpoint 'master.domain.tld'. Comparing the timestamp and checksums.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/eventcommands.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/groups.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/host-templates.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/notifications.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/service-templates.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/telegrambot-notifications.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/templates.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/timeperiods.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/users.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/global-templates' (6688 Bytes).
[2024-07-17 22:39:04 +0200] information/ApiListener: Received configuration updates (2) from endpoint 'master.domain.tld' are different to production, triggering validation and reload.
[2024-07-17 22:39:04 +0200] critical/ApiListener: Config validation failed for staged cluster config sync in '/var/lib/icinga2/api/zones-stage/'. Aborting. Logs: '/var/lib/icinga2/api/zones-stage//startup.log'
[...]

The /var/lib/icinga2/api/zones-stage/startup.log has the details:

root@agent:/etc/icinga2# cat /var/lib/icinga2/api/zones-stage/startup.log
[2024-07-17 23:36:19 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2024-07-17 23:36:19 +0200] information/cli: Loading configuration file(s).
[2024-07-17 23:36:19 +0200] information/ConfigItem: Committing config item(s).
[2024-07-17 23:36:19 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 46:26-46:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(44):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(45):     SERVICEDISPLAYNAME = "$service.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(46):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(47):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(48):

[2024-07-17 23:36:19 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 20:26-20:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(18):     NOTIFICATIONCOMMENT = "$notification.comment$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(19):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(20):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(21):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(22):

[2024-07-17 23:36:19 +0200] critical/config: 2 errors
[2024-07-17 23:36:19 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

However... The tricky part is that a config validation will succeed!

root@agent:/etc/icinga2# icinga2 daemon -C
[2024-07-18 00:00:16 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2024-07-18 00:00:16 +0200] information/cli: Loading configuration file(s).
[2024-07-18 00:00:16 +0200] information/ConfigItem: Committing config item(s).
[2024-07-18 00:00:16 +0200] information/ApiListener: My API identity: agent.domaint.tld
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 5 Zones.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 2 Endpoints.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 235 CheckCommands.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2024-07-18 00:00:16 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2024-07-18 00:00:16 +0200] information/cli: Finished validating the configuration file(s).

And this was the reason why I was too focused on the master..

What I learned later is, that you can utilize the following command to validate the configuration from the stage-dir.
Documentation for the Config Sync: Receive Config is here.

root@agent:/var/log/icinga2# icinga2 daemon -C --define System.ZonesStageVarDir=/var/lib/icinga2/api/zones-stage/
[2024-07-21 16:28:51 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2024-07-21 16:28:51 +0200] information/cli: Loading configuration file(s).
[2024-07-21 16:28:51 +0200] information/ConfigItem: Committing config item(s).
[2024-07-21 16:28:51 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 20:26-20:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(18):     NOTIFICATIONCOMMENT = "$notification.comment$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(19):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(20):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(21):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(22):

[2024-07-21 16:28:51 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 46:26-46:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(44):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(45):     SERVICEDISPLAYNAME = "$service.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(46):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(47):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(48):

[2024-07-21 16:28:51 +0200] critical/config: 2 errors
[2024-07-21 16:28:51 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

The solution

On the master I moved the 2 files from /etc/icinga2/zones.d to /etc/icinga2/conf.d and restarted the service.

root@master:/etc/icinga2# mv /etc/icinga2/zones.d/global-commands/telegrambot-commands.conf /etc/icinga2/conf.d/
root@master:/etc/icinga2# mv /etc/icinga2/zones.d/global-templates/telegrambot-notifications.conf /etc/icinga2/conf.d/
root@master:/etc/icinga2# systemctl restart icinga2.service

On the agent a simple restart is enough:

root@agent:/etc/icinga2# systemctl restart icinga2.service

And after that everything worked again.

Another problem detected - an even deeper rooted cause

In the aftermath I was curious why & how Icinga didn't notify me that the config in the stage-dir couldn't be validated. Shouldn't there be some kind of included check for this?

Yes, turns out the built-in Icinga CheckCommand does exactly this. But it was never executed on my agent. As I still had a service definition from a time when I didn't have any agents. Initially the configuration was the following:

// Checks the agent health
apply Service "icinga" {
  import "generic-service"

  check_command = "icinga"

  assign where (host.address || host.address6) && host.vars.os == "Linux"
}

This was still a remnant of having only the Icinga Master and no agents. But this lead to it being executed on the Master. Which is... Not smart if you want to validate the configuration on the Agent.

After changing it to the following:

// Checks the agent health - must be executed on the agent
apply Service "icinga" {
  import "generic-service"

  check_command = "icinga"

  command_endpoint = host.vars.agent_endpoint

  assign where host.vars.agent_endpoint
}

The check worked as intended.

Oh, and I opened a pull request to enhance Icinga's documentation regarding the config sync: https://github.com/Icinga/icinga2/pull/10101. Let's see if it get's accepted.

Comments

Using and configuring unattended-upgrades under Debian Bookworm - Part 1: Preparations

Photo by Markus Winkler: https://www.pexels.com/photo/wood-writing-mathematics-typography-19915915/

Preface

At the time of this writing Debian Bookworm is the stable release of Debian. It utilizes Systemd timers for all automation tasks like the updating of the package lists and execution of the actual apt-get upgrade. Therefore we won't need to configure APT-Parameters in files like /etc/apt/apt.conf.d/02periodic. In fact some of these files don't even exist on my systems. Keep that in mind if you read this article along with others, who might do things differently - or for older/newer releases of Debian.

Part 2 can be found here: Using and configuring unattended-upgrades under Debian Bookworm - Part 2: Practise

unattended-upgrades is not your enemy

I've read and heard my fair share of people condemning unattended-upgrades as it:

  • Made things unnecessary complicated
  • Broke the system
  • Ignored APT-Pinnings and caused problems because of that
  • Just didn't work
  • Caused problems with High Availability (HA) services
  • Made the database fail
  • Split-Brains occurred only due to unattended-upgrades restarting services
  • Restarted services during work hours causing disruptions and annoyances

You should get the gist. And then there is my experience, running unattended-upgrades on over 6000 Debian servers running all sort of services. From Apaches, Tomcats, JBoss and HAProxy to DRBD, GlusterFS, Corosync, Pacemaker some NoSQL-DBs and services like keepalived and ipvsadm for Layer2/3-Loadbalancing or Quagga/FRR for BGP-Route announcements. And in all those years unattended-upgrades wasn't the root cause of a single problem, outage or incident.

Therefore I suspect that many of the people whose complaints I've read just installed it and never cared about it again. Or, at least, didn't care enough. Presumably they just wanted a quick solution to help them get their systems updated. Well yes, unattended-upgrades is the right tool for this. But as always:

Know your system and plan accordingly!

Things to consider beforehand

Let me give you a list of things to consider before you even install unattended-upgrades.

  1. Are all of your APT-Repositories in order?
    • GPG-keys valid and present?
    • Correct repository for your Debian release?
    • Do you use only official Debian repositories?
    • Do you use Debian repositories from other projects/vendors?
    • Are your repositories in sync on all your servers?
      • Preferably you use the same Debian Mirror on all of your systems. At least make sure you don't use different repositories for security and system updates across all your Debian systems.
      • This prevents problems from outdated mirrors. Happens rarely, but can happen.
        • Trust me: When 2 machines out of a cluster of 5 behave differently, your first thought or troubleshooting step won't be to check if the different Apt-Repositories have the same content
      • Remember: Those are often provided as-is by volunteers. They are NOT a commercial service. And yet most of these volunteers do an exceptional & awesome job despite not being paid for it.
      • Best case scenario: Your company has an internal Debian Mirror. Saving you money on bandwidth usage.
      • Have a look at https://www.debian.org/mirror/ftpmirror.en.html or Aptly if you plan on setting up a mirror.
    • Basically: If an apt-get update prints out anything other then your configured repositories, followed by the line Reading package lists... Done: Fix your repositories!
    • Yes, you can have vendors with broken Debian repositories. Most often the Release file will be buggy or missing at all. If you can't get your vendor to fix it, well, then the best option is to not specify the repository at all and ensure you have another form of automation to update those packages.
      • Sadly a wget http://some-company.tld/some.deb && dpkg -i some.deb is still considered valuable quality work at far too many enterprise software companies out there..
        • Or you just work around that nuisance, create an internal repository yourself and upload the packages there.
      • A good time to remind them that you pay them and that you need a fully working Debian repository, following the Debian guidelines so you can automatically patch all your systems.
    • Do NOT continue otherwise. You have been warned.
  2. What kind of services does your system provide?
    • Here it is essential to know what the system does. Technically and organizationally.
    • What services are provided, why, to whom?
    • Are there any Service Level Agreements (SLAs) in place?
      • Do downtimes of a service need to be scheduled first?
      • Or can service restarts happen at any time?
      • Also automatically? Or is human supervision required by law/standards? (Looking at you PCI DSS (Wikipedia) 😉)
      • Or is there already an agreed upon maintenance window during which the service is allowed to be unresponsive?
    • Do all services have High Availability (HA)?
      • Do the service(s) survive if the primary/master/main system is shut down?
      • How many systems can be unavailable at the same time?
      • Or do other tasks need to be done (manually) on secondary/slave/standby systems?
      • Does your failover work?
      • Is the failover tested regularly?
    • How are you informed if the service(s) stop responding?
      • Is your monitoring set to check right after updates did happen? (You can specify the time when unattended-upgrades does install the updates.)
  3. This will enable you to classify all your installed packages into the following categories for unattended-upgrades:
    • Can be updated anytime
    • Can be updated at certain times / Only a certain number of systems is allowed to be unavailable at any time
      • See: systemctl list-timers especially apt-daily-upgrade.timer and maybe apt-daily.timer
    • Only manual updates (Blacklisting)
      • Unattended-Upgrade::Package-Blacklist in /etc/apt/apt.conf.d/50unattended-upgrades is your friend
      • Then execute apt-get update && apt-get upgrade manually to update the blacklisted packages
    • Never update this package / We need a specific version
      • This will need blacklisting AND APT-Pinnings to be in place
      • As an apt-get update && apt-get upgrade can still be executed manually
  4. In certain situations (HA-Nodes, manually triggered fail-over, etc.) you will need to run specific commands which need to be executed before or after an package update.
    • This is something unattended-upgrades can't help you with
    • This is stuff that belongs in the Debian package itself. Namely in the preinst, prerm, postinst and postrm files. The so-called package maintainer scripts (Official Debian documentation).
    • A feasable workaround is the utilization of a drop-in file if your service is started via a Systemd Unit-File, see https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html and search for "drop-in". The ArchWiki also has a good article: https://wiki.archlinux.org/title/systemd#Drop-in_files - but keep in mind that Arch is based on Gentoo and hence maybe has different paths than Debian
    • If the correct procedure is missing from the package, or isn't suitable for your environment.. Then its best to exclude the packages from unattended-upgrades and get yourself some tool like Ansible, Puppet or Rundeck to automate the execution of manual tasks. (Good how I love Rundeck. 🥰) There you are able to ensure everything is valid and verified before switching your primary cluster node and running package updates.
  5. If you have some kind of ITIL ChangeManagement process in place this, of course, also can't be checked by unattended-upgrades
    • The only viable solution would be to have different repositories based on your environment classifications (development, QA testing, production) or approval classifications (untested, testing, approved) and push packages to the corresponding repository once the ChangeManagement process is completed
    • Then you can install the updates automatically from there
  6. Unattended-Upgrades is great! But you can't automate every single scenario by just using it!
    • Sometimes this even means to go back to the drawing board and optimize your internal company processes first. Especially in situations where approvals have to be given manually by real humans
  7. When you start rolling out unattended-upgrades it's better to have an overview first on how many patches are missing on each server. The more updates you install, the more likely it is that things will change or even break. Or may it even just those informal log messages that some parameter or option will be deprecated in the next major release.
    • I advise to start with non-critical systems first
    • Ideally systems which are somewhat up-to-date so you can identify & troubleshoot problems easier
    • It's perfectly fine to update all systems manually first and only then enable unattended-upgrades
      • As then you will have a common ground for all your systems and bugs are easier to identify as the same bug will affect all systems at somewhat the same time
  8. You need to have a quick & easy, bureaucracy-free process to add packages to the blacklist
    • There once were broken corosync and keepalived packages in Debian in the early 2010s
    • Once we saw that, we immediately added them to the blacklist and downgraded the other systems where the update was already installed
    • You don't want to spent one hour on the phone frantically trying to reach a member of your Change Advisory Board to greenlight the change which lets you modify the blacklist while unattended-upgrades happily wrecks one system after another
  9. Point 8 sets a requirement: How do you roll out a new blacklist/config for unattended-upgrades to 6000 servers in under 15 minutes?
    • You are too slow to do that manually, no matter how fast you can type
    • for HOST in $(cat host-list.txt); do ssh some-command; done? Yeah... No. It works, sure... But.. Come on, really? And this does only one host at a time.. Still taking too long.
    • You do want something like Puppet, Ansible or Rundeck for this very reason.
      • Change the configuration in hiera
      • Commit it to git
      • Log in to Rundeck and execute the "Do a manually triggered Puppet/Ansible run" job
      • Then drink a coffee while you watch Rundeck doing it's job and can care for the 10 or so servers who will fail for various other reasons.

Understanding apt-cache policy

The next thing that will help you vastly in providing a smoothly running unattended-upgrades service is understanding and using the apt-cache tool, especially with the policy argument: apt-cache policy. Even more if you use Debian-Repositories from other projects or vendors.

Why? unattended-upgrades comes with some default configured Origins-Pattern in /etc/apt/apt.conf.d/50unattended-upgrades. These are useable for the official Debian Security updates and when new point releases of Debian are published (12.5, 12.6, etc.). These incorporate all updates since the last point release. Non-Security Updates for installed packages between point releases won't be installed with the default configuration.

If you want these when they are ready, you have to uncomment the line for "origin=Debian,codename=${distro_codename}-updates";.

${distro_codename}-proposed-updates are updates which may not be stable yet. Read https://wiki.debian.org/StableProposedUpdates for the details. Personally I keep it disabled. Especially on production systems.

For Debian Bookworm the default Origins-Pattern are:

Unattended-Upgrade::Origins-Pattern {
[...]
//      "origin=Debian,codename=${distro_codename}-updates";
//      "origin=Debian,codename=${distro_codename}-proposed-updates";
        "origin=Debian,codename=${distro_codename},label=Debian";
        "origin=Debian,codename=${distro_codename},label=Debian-Security";
        "origin=Debian,codename=${distro_codename}-security,label=Debian-Security";
[...]
};

This means only packages matching these Origins-Pattern will be updated. Therefore you have to verify that these patterns include all packages you want to update and, additionally, that packages you do not wish to update are excluded. Although the use of the blacklist might be easier here, as this simply works on the name of the package.

Matching Origin-Patterns to apt-cache policy's output

How do you translate these Origins-Patterns into the lines apt-cache policy gives us?

Short side-note: apt-cache uses the metadata (repository information) obtained via apt-get update. Therefore it also works if the configured repositories are offline, but can also show outdated data if you haven't updated the package list information via apt-get update recently.

If executed you will see output like the following:

root@lanadmin:~# apt-cache policy
Package files:
 100 /var/lib/dpkg/status
     release a=now
 500 http://debian.tu-bs.de/debian bookworm-updates/non-free-firmware amd64 Packages
     release v=12-updates,o=Debian,a=stable-updates,n=bookworm-updates,l=Debian,c=non-free-firmware,b=amd64
     origin debian.tu-bs.de
 500 http://debian.tu-bs.de/debian bookworm-updates/main amd64 Packages
     release v=12-updates,o=Debian,a=stable-updates,n=bookworm-updates,l=Debian,c=main,b=amd64
     origin debian.tu-bs.de
 500 http://security.debian.org/debian-security bookworm-security/non-free-firmware amd64 Packages
     release v=12,o=Debian,a=stable-security,n=bookworm-security,l=Debian-Security,c=non-free-firmware,b=amd64
     origin security.debian.org
 500 http://security.debian.org/debian-security bookworm-security/main amd64 Packages
     release v=12,o=Debian,a=stable-security,n=bookworm-security,l=Debian-Security,c=main,b=amd64
     origin security.debian.org
 500 http://debian.tu-bs.de/debian bookworm/non-free-firmware amd64 Packages
     release v=12.5,o=Debian,a=stable,n=bookworm,l=Debian,c=non-free-firmware,b=amd64
     origin debian.tu-bs.de
 500 http://debian.tu-bs.de/debian bookworm/main amd64 Packages
     release v=12.5,o=Debian,a=stable,n=bookworm,l=Debian,c=main,b=amd64
     origin debian.tu-bs.de
Pinned packages:
root@lanadmin:~#

For the sake of easiness, let us look at just this single line:

 500 http://security.debian.org/debian-security bookworm-security/non-free-firmware amd64 Packages
     release v=12,o=Debian,a=stable-security,n=bookworm-security,l=Debian-Security,c=non-free-firmware,b=amd64
     origin security.debian.org

I wont go over each and every single listed value. If you want to know more: man apt_preferences probably has all the details and apt-cache policy packagename lists all policies for a single package.

We want to pay attention to the 2nd line starting with release. Here we have the relevant values. These are defined in the Release-File for each Debian Release/Repository. A documentation can be found here: https://wiki.debian.org/DebianRepository/Format#A.22Release.22_files

But what do they mean? man apt_preferences also explains them, but as they are relevant, let's make a short table.

Field Alias unattended-upgrades variable Description
Version v N/A Contains the release version
Origin o ${distro_id} Originator of the packages. (Debian if used for packages from the Debian project. If you have commercial packages most likely the company or product name will show up here.)
Archive or Suite a N/A Names the archive to which all the packages in the directory tree belong (on the repository server)
Codename n ${distro_codename} Codename of the Debian release. In our case: bookworm
Label l N/A Names the label of the packages in the directory tree of the Release file.
Component c N/A The licensing component associated with the packages in the directory tree. You may know this as: main, contrib, non-free-firmware, etc.
Architecture b N/A The processor architecture for which a package is compiled. (amd64, i386, arm, etc.)

All this information is usually listed in the first 30 lines of /etc/apt/apt.conf.d/50unattended-upgrades. Therefore: Read it, to ensure you get the currently valid information.

Understanding this makes it easy to match the configured Origin-Patterns to your configure Debian-Repositories.

If you are in doubt: Browse your Debian repository via HTTP/HTTPS and have a look at the Release file, for example: http://debian.tu-bs.de/debian/dists/bookworm/Release. The first 5 lines are:

Origin: Debian
Label: Debian
Suite: stable
Version: 12.5
Codename: bookworm

Filling in the variables we see that only the following Origin-Pattern matches:

"origin=Debian,codename=${distro_codename},label=Debian";

All others either have a different label and/or codename.

Looking at http://debian.tu-bs.de/debian/dists/bookworm-updates/Release we can see that this matches the following commented out Origin-Pattern:

"origin=Debian,codename=${distro_codename}-updates";

This means:

http://debian.tu-bs.de/debian/dists/bookworm/ holds all packages for the current point release of Debian Bookworm.
http://debian.tu-bs.de/debian/dists/bookworm-updates/ has all published updates which came out before a new point release is made. That is the reason why I always enable this repository too. But depending on your operating strategy going only with updates when a new point release is published is also fine.

Security updates are always published via the http://security.debian.org/debian-security/ repository and will be installed when available.

Key points / Story time

What you should have understood is:

  1. Each repository must be uniquely identifiable.
    • This means: Each Origins-Pattern should only match one of all your configured repositories
    • Of course you can have multiple repositories matching the same Origins-Pattern, but keep the possible implications in mind!
  2. Packages that share the same name must have the same content
    • And by content I mean: Their hashsums must be identical for each given version

If it doesn't you are potentially in for a wild ride.

At a former company we used many internal repositories. And this was fine for a long time. Suddenly some developer started pushing Debian packages with the exact same name as official Debian packages to those internal repositories. We immediately complained. Laying out how that could wreck havoc on our infrastructure as we have to use those repositories and at the same time we can't blacklist that package, as blacklisting works solely on name of a package.

You can't do things like: "Blacklist package test-foo, but only if its coming from repository on repo.coolhost.tld"

We urged him to simply rename those packages or upload them to another repository - as those packages had to share the same name as they contained a not-yet included fix for a bug the company encountered. He wouldn't as he saw no problem with his approach. "Just don't use them." (Yeah thanks.. That's not how it works with automation.. Especially not if you - sort of - hijack repositories which are used for something entirely different..)

The workaround we made was by utilizing the apt priority for each repository, so packages from our internal official Debian mirror took priority. And that worked, but it was still annoying.

Some weeks later those packages caused an incident and in the root cause analysis the problem was identified and those packages were move to a separate repository.

Lesson learned. Care for your repositories.

And this ends the first part of this post. The next part will focus on the practical side. We will look at the Systemd Unit- and Timer-File, how we can add new repositories to unattended-upgrades, for example to also upgrade our Proxmox installation using the Proxmox Debian repositories, how to blacklist packages and more.

Comments

Creating a systemd timer to regularly pull Git-Repositories and getting to know an uncomfortable systemd/journalctl bug along the way

Photo by Christina Morillo: https://www.pexels.com/photo/white-dry-erase-board-with-red-diagram-1181311/

I have a VM in my LAN which servers as central admin host. There I wanted to create a systemd unit and timer to automatically update my Git-Repositories. The reason is that it happens too often that I push some changes from other machines but forget to pull the repos before I commit my changes done on the admin host.

Sure, stash the commit. Fix any conflicts. No big deal. But still annoying. Therefore: A systemd unit file with a timer that updates the repos every hour and be done with it.

Encountering the bug

While reading the systemd documentation I learned that you can list multiple ExecStart parameters if the service is of Type=oneshot. Then all commands will be executed in sequential order.

Unless Type= is oneshot, exactly one command must be given. When Type=oneshot is used, zero or more commands may be specified. [...] If more than one command is specified, the commands are invoked sequentially in the order they appear in the unit file. If one of the commands fails (and is not prefixed with "-"), other lines are not executed, and the unit is considered failed.

From: https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#ExecStart=

This seems to be the easiest way to achieve my goal. Just add one ExecStart= line for every Git-Repo, done. For testing I didn't write the timer file. Wanting to verify that the unit file works. And the following unit file works flawlessly.

user@lanadmin:~$ systemctl --user cat git-update
# /home/user/.config/systemd/user/git-update.service
[Unit]
Description=Update git-Repositories
After=network-online.target
Wants=network-online.target

[Service]
# Allows the execution of multiple ExecStart parameters in sequential order
Type=oneshot
# Show status "dead" after commands are executed (this is just commands being run)
RemainAfterExit=no
# git pull = git fetch + git merge
ExecStart=/usr/bin/git -C %h/git/github/chrlau/dotfiles pull
ExecStart=/usr/bin/git -C %h/git/github/chrlau/scripts pull

[Install]
WantedBy=default.target

However, while initially writing it and looking at the output of systemctl --user status git-update I noticed something.

user@lanadmin:~$ systemctl --user status git-update
â—‹ git-update.service - Update git-Repositories
     Loaded: loaded (/home/user/.config/systemd/user/git-update.service; enabled; preset: enabled)
     Active: inactive (dead) since Fri 2024-04-26 22:16:21 CEST; 15s ago
    Process: 39510 ExecStart=/usr/bin/git -C /home/user/git/github/chrlau/dotfiles pull (code=exited, status=0/SUCCESS)
    Process: 39516 ExecStart=/usr/bin/git -C /home/user/git/github/chrlau/scripts pull (code=exited, status=0/SUCCESS)
   Main PID: 39516 (code=exited, status=0/SUCCESS)
        CPU: 40ms

Apr 26 22:16:18 lanadmin systemd[9234]: Starting git-update.service - Update git-Repositories...
Apr 26 22:16:21 lanadmin git[39521]: Already up to date.
Apr 26 22:16:21 lanadmin systemd[9234]: Finished git-update.service - Update git-Repositories.

There should be two log lines with git[pid]: Already up to date. After all we call git two times. But there is only one line. Why!?

At first I considered something like rate-limiting or the de-duplication of identical log messages. But I found nothing. Only an old RedHat Bugreports from 2013 & 2017 about how journalctl can't always catch the necessary process information (cgroup, etc.) from /proc before the process is gone (read: Bug 963620 - journald: we need a way to get audit, cgroup, ... information attached to log messages instead of asynchronously reading them in and Bug 1426152 - Journalctl miss to show logs from unit). Especially with short-running processes this occurred regularly. This can't be the reason, or?

I checked the journal for the unit file. Line still missing.

user@lanadmin:~$ journalctl --user -u git-update
Apr 26 22:16:18 lanadmin systemd[9234]: Starting git-update.service - Update git-Repositories...
Apr 26 22:16:21 lanadmin git[39521]: Already up to date.
Apr 26 22:16:21 lanadmin systemd[9234]: Finished git-update.service - Update git-Repositories.

Then I accidentally executed journalctl without any parameters and...

user@lanadmin:~$ journalctl
[...]
Apr 26 22:16:18 lanadmin systemd[9234]: Starting git-update.service - Update git-Repositories...
Apr 26 22:16:20 lanadmin git[39515]: Already up to date.
Apr 26 22:16:21 lanadmin git[39521]: Already up to date.
Apr 26 22:16:21 lanadmin systemd[9234]: Finished git-update.service - Update git-Repositories.
[...]

There it is. So why does a simple journalctl display both lines, while a systemctl --user status git-update doesn't?

Remembering the bug we just read? journalctl has a verbose mode. This displays all fields for every log line. This should tell us the difference between those to log messages.

At first we have the entry for the Starting git-update.service - Update git-Repositories message. Nothing suspicious here.

user@lanadmin:~$ journalctl -o verbose
Fri 2024-04-26 22:16:18.724396 CEST [s=78cb2a728dda4d579b41ba58b655d4c2;i=6a32;b=859f56a381394260854aeac3b77d87a3;m=1a58bdb7ae0;t=6170593979632;x=ed56438f3a913535]
    PRIORITY=6
    SYSLOG_FACILITY=3
    TID=9234
    SYSLOG_IDENTIFIER=systemd
    _TRANSPORT=journal
    _PID=9234
    _UID=1000
    _GID=1000
    _COMM=systemd
    _EXE=/usr/lib/systemd/systemd
    _CMDLINE=/lib/systemd/systemd --user
    _CAP_EFFECTIVE=0
    _SELINUX_CONTEXT=unconfined
    _AUDIT_SESSION=393
    _AUDIT_LOGINUID=1000
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/init.scope
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_UNIT=user@1000.service
    _SYSTEMD_USER_UNIT=init.scope
    _SYSTEMD_SLICE=user-1000.slice
    _SYSTEMD_USER_SLICE=-.slice
    _BOOT_ID=859f56a381394260854aeac3b77d87a3
    _MACHINE_ID=e83bb1062b594b79817a5c8a5605f9fd
    _HOSTNAME=lanadmin
    _RUNTIME_SCOPE=system
    CODE_FILE=src/core/job.c
    JOB_TYPE=start
    CODE_LINE=581
    CODE_FUNC=job_emit_start_message
    MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5
    MESSAGE=Starting git-update.service - Update git-Repositories...
    JOB_ID=10
    USER_INVOCATION_ID=8f476f2ef43245ba89a9cb69a26f8577
    USER_UNIT=git-update.service
    _SOURCE_REALTIME_TIMESTAMP=1714162578724396

Then comes the entry for the first Already up to date. log message. And it's entry is way shorter than the previous log message. No fields regarding systemd are associated.

Fri 2024-04-26 22:16:20.137988 CEST [s=78cb2a728dda4d579b41ba58b655d4c2;i=6a33;b=859f56a381394260854aeac3b77d87a3;m=1a58bf10cb2;t=6170593ad2804;x=730951cbebf4e84a]
    PRIORITY=6
    SYSLOG_FACILITY=3
    _UID=1000
    _GID=1000
    _BOOT_ID=859f56a381394260854aeac3b77d87a3
    _MACHINE_ID=e83bb1062b594b79817a5c8a5605f9fd
    _HOSTNAME=lanadmin
    _RUNTIME_SCOPE=system
    _TRANSPORT=stdout
    _STREAM_ID=36f2542db1e249da8c5c5b1342d065e8
    SYSLOG_IDENTIFIER=git
    MESSAGE=Already up to date.
    _PID=39515
    _COMM=git

And yep, here is the second Already up to date. log message. It contains all fields and this is the message we see, when we display the journal-entries for our git-update.service unit.

Fri 2024-04-26 22:16:21.471040 CEST [s=78cb2a728dda4d579b41ba58b655d4c2;i=6a34;b=859f56a381394260854aeac3b77d87a3;m=1a58c0563ee;t=6170593c17f40;x=2574d8467f36d20]
    PRIORITY=6
    SYSLOG_FACILITY=3
    _UID=1000
    _GID=1000
    _CAP_EFFECTIVE=0
    _SELINUX_CONTEXT=unconfined
    _AUDIT_SESSION=393
    _AUDIT_LOGINUID=1000
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_UNIT=user@1000.service
    _SYSTEMD_SLICE=user-1000.slice
    _BOOT_ID=859f56a381394260854aeac3b77d87a3
    _MACHINE_ID=e83bb1062b594b79817a5c8a5605f9fd
    _HOSTNAME=lanadmin
    _RUNTIME_SCOPE=system
    _TRANSPORT=stdout
    SYSLOG_IDENTIFIER=git
    MESSAGE=Already up to date.
    _COMM=git
    _STREAM_ID=cfc8932e3cf9431aa59873d163d624a8
    _PID=39521
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/app.slice/git-update.service
    _SYSTEMD_USER_UNIT=git-update.service
    _SYSTEMD_USER_SLICE=app.slice
    _SYSTEMD_INVOCATION_ID=8f476f2ef43245ba89a9cb69a26f8577

Great. So how to fix this? Yeah, I can't. Unless I can make the git-process running longer there is no real solution. I tried adding ExecStart=/usr/bin/sleep 1 after each git command, but that of course didn't change anything. As sleep is a different process.

Now I'm left with the following situation: Sometimes both log entries are logged correctly with all fields. Sometimes just one (either the first or second one). And rarely none is logged at all. Then all I have are the standard Starting git-update.service - Update git-Repositories. and Finished git-update.service - Update git-Repositories... log messages which are sent via systemd when a unit file is started and when it finishes.

Beautiful. Just beautiful. I mean.. The syslog facility, identifier and priority is logged each time. So yeah, that's actually a reason for good old rsyslog.

A somewhat of a solution?

The best advise I can currently give is: If you have short lived processes started via systemd and it's important you can easily view all log messages:

  1. Make sure ForwardToSyslog=yes is set in /etc/systemd/journald.conf. Note that the default values are usually listed as comments. So if the line #ForwardToSyslog=yes is present, you should be fine
  2. Install rsyslog or any other traditional syslog service
  3. Configure it to store your log messages in a separate logfile or let it go to /var/log/messages
  4. Don't forget to configure logrotate (or some other sort of logfile rotating) for all logfiles created by rsyslog 😉

I just learned to always execute a plain journalctl during troubleshooting sessions just to make sure that I spot messages from short running processes.

And what about the timer?

This is the timer file I use. It runs once every hour.

user@lanadmin:~$ systemctl --user cat git-update.timer
# /home/user/.config/systemd/user/git-update.timer
[Unit]
Description=Update git-repositories every hour

[Timer]
# Documentation: https://www.freedesktop.org/software/systemd/man/latest/systemd.time.html#Calendar%20Events
OnCalendar=*-*-* *:00:00
Unit=git-update.service

[Install]
WantedBy=default.target

After creating the file you need to enable and start it.

user@lanadmin:~$ systemctl --user enable git-update.timer
Created symlink /home/user/.config/systemd/user/default.target.wants/git-update.timer → /home/user/.config/systemd/user/git-update.timer.

user@lanadmin:~$ systemctl --user start git-update.timer

Using systemctl --user list-timers we can verify that the timer is scheduled to run.

user@lanadmin:~$ systemctl --user list-timers
NEXT                         LEFT       LAST PASSED UNIT                   ACTIVATES
Sun 2024-04-28 16:00:00 CEST 49min left -    -      git-update.timer git-update.service

1 timers listed.
Pass --all to see loaded but inactive timers, too.
Comments

Icinga2 Monitoring notifications via Telegram Bot (Part 1)

Photo by Kindel Media: https://www.pexels.com/photo/low-angle-shot-of-robot-8566526/

One thing I wanted to set up for a long time was to get my Icinga2 notifications via some of the Instant Messaging apps I have on my mobile. So there was Threema, Telegram and Whatsapp to choose from.

Well.. Threema wants money for this kind of service, Whatsapp requires a business account who must be connected with the bot. And Whatsapp Business means I have to pay again? - Don't know - didn't pursue that path any further. As this either meant I would've needed to convert my private account into a business account or get a second account. No, sorry. Not something I want.

Telegram on the other hand? "Yeah, well, message the @botfather account, type /start, type /newbot, set a display and username, get your Bot-Token (for API usage) that's it. The only requirement we have? The username must end in bot." From here it was an easy decision which app I choose. (Telegram documentation here.)

Creating your telegram bot

  1. Search for the @botfather account on Telegram; doesn't matter if you use the mobile app or do it via Telegram Web.
  2. Type /start and a help message will be displayed.
  3. To create a new bot, type: /newbot
  4. Via the question "Alright, a new bot. How are we going to call it? Please choose a name for your bot." you are asked for the display name of your bot.
    • I choose something generic like "Icinga Monitoring Notifications".
  5. Likewise the question "Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot." asks for the username.
    • Choose whatever you like.
  6. If the username is already taken Telegram will state this and simply ask for a new username until you find one which is available.
  7. In the final message you will get your token to access the HTTP API. Note this down and save it in your password manager. We will need this later for Icinga.
  8. To test everything send a message to your bot in Telegram

That's the Telegram part. Pretty easy, right?

Testing our bot from the command line

We are now able to receive (via /getUpdates) and send messages (via /sendMessage) from/to our bot. Define the token as a shell variable and execute the following curl command to get the message that was sent to your bot. Note: Only new messages are received. If you already viewed them in Telegram Web the response will be empty. As seen in the first executed curl command.

Just close Telegram Web and the App on your phone and sent a message via curl. This should do the trick. Later we define our API-Token as a constant in the Icinga2 configuration.

For better readability I pipe the output through jq.

When there is a new message from your Telegram-Account to your bot, you will see a field with the named id. Note this number down. This is the Chat-ID from your account and we need this, so that your bot can actually send you messages.

Relevant documentation links are:

user@host:~$ TOKEN="YOUR-TOKEN"
user@host:~$ curl --silent "https://api.telegram.org/bot${TOKEN}/getUpdates" | jq
{
  "ok": true,
  "result": []
}
user@host:~$ curl --silent "https://api.telegram.org/bot${TOKEN}/getUpdates" | jq
{
  "ok": true,
  "result": [
    {
      "update_id": NUMBER,
      "message": {
        "message_id": 3,
        "from": {
          "id": CHAT-ID,
          "is_bot": false,
          "first_name": "John Doe Example",
          "username": "JohnDoeExample",
          "language_code": "de"
        },
        "chat": {
          "id": CHAT-ID,
          "first_name": "John Doe Example",
          "username": "JohnDoeExample",
          "type": "private"
        },
        "date": 1694637798,
        "text": "This is a test message"
      }
    }
  ]
}

Configuring Icinga2

Now we need to integrate our bot into the Icinga2 notification process. Luckily there were many people before us doing this, so there are already some notification scripts and example configuration files on GitHub.

I choose the scripts found here: https://github.com/lazyfrosch/icinga2-telegram

As I use the distributed monitoring I store some configuration files beneath /etc/icinga2/zones.d/. If you don't use this, feel free to store those files somewhere else. However as I define the Token in /etc/icinga2/constants.conf which isn't synced via the config file sync, I have to make sure that the Notification configuration is also stored outside of /etc/icinga2/zones.d/. Else the distributed setup will fail as the config file sync throws an syntax error on all other machines due to the missing TelegramBotToken constant.

First we define the API-Token in the /etc/icinga2/constants.conf file:

user@host:/etc/icinga2$ grep -B1 TelegramBotToken constants.conf
/* Telegram Bot Token */
const TelegramBotToken = "YOUR-TOKEN-HERE"

Afterwards we download the host and service notification script into /etc/icinga2/scripts and set the executeable bit.

user@host:/etc/icinga2/scripts$ wget https://raw.githubusercontent.com/lazyfrosch/icinga2-telegram/master/telegram-host-notification.sh
user@host:/etc/icinga2/scripts$ wget https://raw.githubusercontent.com/lazyfrosch/icinga2-telegram/master/telegram-service-notification.sh
user@host:/etc/icinga2/scripts$ chmod +x telegram-host-notification.sh telegram-service-notification.sh

Based on the notifications we want to receive, we need to define the variable vars.telegram_chat_id in the appropriate user/group object(s). An example for the icingaadmin is shown below and can be found in the icinga2-example.conf on GitHub: https://github.com/lazyfrosch/icinga2-telegram/blob/master/icinga2-example.conf along with the notification commands which we are setting up after this.

user@host:~$ cat /etc/icinga2/zones.d/global-templates/users.conf
object User "icingaadmin" {
  import "generic-user"

  display_name = "Icinga 2 Admin"
  groups = [ "icingaadmins" ]

  email = "root@localhost"
  vars.telegram_chat_id = "YOUR-CHAT-ID-HERE"
}

Notifications for Host & Service

We need to define 2 new NotificationCommand objects which trigger the telegram-(host|service)-notification.sh scripts. These are stored in /etc/icinga2/conf.d/telegrambot-commands.conf.

Note: We store the NotificationCommands and Notifications in /etc/icinga2/conf.d and NOT in /etc/icinga2/zones.d/master. This is because I have only one master in my setup which sends out notifications and as we defined the constant TelegramBotToken in /etc/icinga2/constants.conf - which is not synced via the zone config sync. Therefore we would run into an syntax error on all Icinga2 agents.

See https://admin.brennt.net/icinga2-error-check-command-does-not-exist-because-of-missing-constant for details.

Of course we could also define the constant in a file under /etc/icinga2/zones.d/master but I choose not to do so for security reasons.

user@host:/etc/icinga2$ cat /etc/icinga2/conf.d/telegrambot-commands.conf
/*
 * Notification Commands for Telegram Bot
 */
object NotificationCommand "telegram-host-notification" {
  import "plugin-notification-command"

  command = [ SysconfDir + "/icinga2/scripts/telegram-host-notification.sh" ]

  env = {
    NOTIFICATIONTYPE = "$notification.type$"
    HOSTNAME = "$host.name$"
    HOSTALIAS = "$host.display_name$"
    HOSTADDRESS = "$address$"
    HOSTSTATE = "$host.state$"
    LONGDATETIME = "$icinga.long_date_time$"
    HOSTOUTPUT = "$host.output$"
    NOTIFICATIONAUTHORNAME = "$notification.author$"
    NOTIFICATIONCOMMENT = "$notification.comment$"
    HOSTDISPLAYNAME = "$host.display_name$"
    TELEGRAM_BOT_TOKEN = TelegramBotToken
    TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"

    // optional
    ICINGAWEB2_URL = "https://host.domain.tld/icingaweb2"
  }
}

object NotificationCommand "telegram-service-notification" {
  import "plugin-notification-command"

  command = [ SysconfDir + "/icinga2/scripts/telegram-service-notification.sh" ]

  env = {
    NOTIFICATIONTYPE = "$notification.type$"
    SERVICEDESC = "$service.name$"
    HOSTNAME = "$host.name$"
    HOSTALIAS = "$host.display_name$"
    HOSTADDRESS = "$address$"
    SERVICESTATE = "$service.state$"
    LONGDATETIME = "$icinga.long_date_time$"
    SERVICEOUTPUT = "$service.output$"
    NOTIFICATIONAUTHORNAME = "$notification.author$"
    NOTIFICATIONCOMMENT = "$notification.comment$"
    HOSTDISPLAYNAME = "$host.display_name$"
    SERVICEDISPLAYNAME = "$service.display_name$"
    TELEGRAM_BOT_TOKEN = TelegramBotToken
    TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"

    // optional
    ICINGAWEB2_URL = "https://host.domain.tld/icingaweb2"
  }
}

As I want to get all notifications for all hosts and services I simply apply the notification object for all hosts and services which have a set host.name. - Same as in the example.

user@host:/etc/icinga2$ cat /etc/icinga2/conf.d/telegrambot-notifications.conf
/*
 * Notifications for alerting via Telegram Bot
 */
apply Notification "telegram-icingaadmin" to Host {
  import "mail-host-notification"
  command = "telegram-host-notification"

  users = [ "icingaadmin" ]

  assign where host.name
}

apply Notification "telegram-icingaadmin" to Service {
  import "mail-service-notification"
  command = "telegram-service-notification"

  users = [ "icingaadmin" ]

  assign where host.name
}

Checking configuration

Now we check if our Icinga2 config has no errors and reload the service:

root@host:~ # icinga2 daemon -C
[2023-09-14 22:19:37 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2023-09-14 22:19:37 +0200] information/cli: Loading configuration file(s).
[2023-09-14 22:19:37 +0200] information/ConfigItem: Committing config item(s).
[2023-09-14 22:19:37 +0200] information/ApiListener: My API identity: hostname.domain.tld
[2023-09-14 22:19:37 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[...]
[2023-09-14 22:19:37 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2023-09-14 22:19:37 +0200] information/cli: Finished validating the configuration file(s).
root@host:~ # systemctl reload icinga2.service

Verify it works

Log into your Icingaweb2 frontend, click the Notification link for a host or service and trigger a custom notification.

And we have a message in Telegram:

What about CheckMK?

If you use CheckMK see this blogpost: https://www.srcbox.net/posts/monitoring-notifications-via-telegram/

Lessons learned

And if you use this setup you will encounter one situation/question rather quickly: Do I really need to be woken up at 3am for every notification?

No. No you don't want to. Not in a professional context and even less in a private context.

Therefore: In part 2 we will register a second bot account and then use these to differentiate between important and unimportant notifications. The unimportant ones will be muted in the Telegram client on our smartphone. The important ones won't, and therefore are able to wake us at 3am in the morning.

Comments

Snap packages and SSL-certificates.. A neverending story

Photo by Magda Ehlers: https://www.pexels.com/photo/a-broken-bicycle-leaning-on-the-wall-5342343/

Alternative title: Why I hate developers who reinvent the wheel - but forget the spokes.

This is more of a rant and link-dump article. I didn't research everything in detail as I generally avoid Snaps. So far they've caused me more troubles than benefits.. But I wanted a place to keep all my arguments and links to certain documentation sites in one place.. So here we go.

Snap is, compared to packaging formats like RPM and APT, relatively new. Maybe this explains why it still has some teething problems. The most annoying one for me is: Snap packages don't use your custom SSL-Certificates, stored in /usr/share/ca-certificates. Which is the default place to store your companies Root-CA certificates. Every browser respects this. There is tooling in every Linux distribution to take care of stored certificates and adding them to the truststore. There is tooling to automatically add these certificates to any Java (or other language) truststore that might reside on your system.

But no, that would be too easy mirroring that behaviour in Snap, right? After all.. Why just reinvent the wheel, when you can have more fun by forgetting the spokes.
Yes, I am aware that centreless/hubless wheels do exist. ;-)

The root cause is often: Snap Confinement (or to be precise: The strict confinement mode). Which means a snap is separated from the system and only has access to directories which are configured at build time of the snap, as stated in Interface Management documentation (see also: https://snapcraft.io/docs/home-interface and https://snapcraft.io/docs/supported-interfaces). And as desirable and good this is. Reality teaches us that for every application you always need a way to modify, to configure it. At least certain aspects. With snap.. Not so much?

In the latest Ubuntu releases Firefox and Chromium have been migrated exclusively to snap. You cannot install a Firefox via apt and have a normal install. You will get a snap package. Which means: Goodbye SSL Client cert authentication, goodbye internal company SSL certificates. Bug 1901586: [snap] CA Certificates from /usr/local/share/ca-certificates are not used has all the details.

Yes, clever people might add: But you can do a bind mount and mount that directory under /home where nearly every Snap application has access. But why do I need to do this? Why does Snap impose that burden on me, the user? And this doesn't fix the SSL issue..

Oh, I should copy them to /etc/ssl/certs? And what about the other troubles this might cause? .. Hello? Nothing? Oh, okay..

And it keeps getting better: If you install the Videoplayer VLC as a snap package and your files are not located under /home.. You can't access them. Snap developers will happily advise you to move your files, change the mountpoint or do a mount --bind under /home. Which can be seen here: Bug 1643706: snap apps need to be able to browse outside of user $HOME dir. for Desktop installs

And this is the point I want to make. It's OK to design a new packaging format for applications. It's okay to add security mechanism like snap confinement. But designing them in way so that each and every user is forced to manage/store/organize files in the way snap dictates? Nope, sorry.

Someone on the Internet wrote that Snap is a great idea for things like phones, PCs in public space (libraries, schools, etc.) or other confined environments where the user doesn't and shouldn't be allowed to configure many aspects. Where a system is designed and offered for only some specific purposes which are more or less static and change seldom.

And I agree. For me the root cause with all my problems regarding Snap is: It isn't the only daemon on my system. It HAS to integrate into the existing system and processes, like update-ca-certificates or even where in the filesystem I store my files. This was all there before Snap existed and now applications which always worked won't because someone thought it is better that way.. No, sorry. Like I said before: It HAS to integrate into the existing system. If it doesn't.. It might still have it use-cases. I'm not arguing against that. But then please let me have a choice! But breaking existing workflows, file structures, etc. that is not acceptable for me. And as the Internet shows, also not for many other users.

The sad part is that the decision to make Firefox only available as a Snap was done by Mozilla & Canoncial. Therefore you can't download the .deb from the Mozilla Webpage or the like. (Luckily there is still a PPA which build Firefox as .deb package offered by volunteers. This means it can potentially vanish if there is no one left doing it. But that's more or less the risk with any piece of software/technology.)

The announcement is on their Discourse: https://discourse.ubuntu.com/t/feature-freeze-exception-seeding-the-official-firefox-snap-in-ubuntu-desktop/24210/1

Oh and Snaps tend to auto-upgrade, even without unattended-upgrades configured. Which can be a problem if you require a specific version. Luckily snap refresh --hold can hold updates for all Snap packages. Or, if you just want it for some specific package/time use snap refresh --hold=72h vlc. This post has more information: Hold your horses, I mean snaps! New feature lets you stop snap updates, for as long as you need (snapcraft.io)

Security considerations

Snap was invented by Canoncial. The Snap-Store is run by Canoncial. Snap packages are slowly replacing more and more .deb-packages in Ubuntu, when you type apt-get install packagename you will automatically get a snap package if one exists.

Or if you try to execute a command which isn't present on your system. Then the command-not-found helper from Ubuntu will recommend you the associated .deb or Snap package.

The problem? Aquasec discovered that many traditional programs are not available as a Snap, but Canoncial still allows you to register Snaps with the exact same name, despite providing an entirely different program. (aquasec.com)

Additionally you can specify aliases or register Snaps for the literal filenames. The example shows how, when you try to execute the command tarquingui, command-not-found recommends to install the tarquin Snap. Which provides the tarquingui program. But they were able to additionally register a Snap with the name tarquingui. What happens? command-not-found now recommends both Snaps.

My personal opinion is that it is a dangerous and dumb oversight to not reserve every APT-package on the Snap store. Preventing the hijacking of well established software packages by potentially malicious third parties. After all it was Canoncial who introduced Snap and it's Canoncial who solely operates the Snap-Store...

How to install Firefox as .deb package (Ubuntu 23.04)

Update: I recently learned of a bug in unattended-upgrades which ignores apt-pinnings if the origin is not listed in the allowed-origins: #2033646 unattended-upgrade ignores apt-pinning to not-allowed origins

To prevent this bug, use the following workaround:

root@host:~# echo 'Unattended-Upgrade::Allowed-Origins:: "LP-PPA-mozillateam:${distro_codename}";' | sudo tee /etc/apt/apt.conf.d/51unattended-upgrades-firefox
  1. Remove the Firefox snap (omit if not present)
    • snap remove firefox
    • If dpkg -l firefox still shows it as installed, execute:
      • apt-get remove --purge firefox
  2. Add the Mozilla PPA
    • add-apt-repository ppa:mozillateam/ppa
      • If the command add-apt-repository is missing, install the package software-properties-common
  3. Pin the Mozilla PPA-Repository with higher priority so packages are installed out of this PPA instead of other repositories where they might be available too
    • echo -e "Package: *\nPin: release o=LP-PPA-mozillateam\nPin-Priority: 1001" | sudo tee /etc/apt/preferences.d/mozilla-firefox
  4. Update repository information
    • apt-get update
  5. Now check that your APT-Package Policies are correct
    • apt-cache policy firefox
      • If set up correctly Candidate: will list the package name which is not a Snap. See the example below.
      • root@ubuntu:/# apt-cache policy firefox
        firefox:
          Installed: (none)
          Candidate: 1:1snap1-0ubuntu3
          Version table:
             1:1snap1-0ubuntu3 500
                500 http://archive.ubuntu.com/ubuntu lunar/main amd64 Packages
             120.0.1+build1-0ubuntu0.23.04.1~mt1 500
                500 https://ppa.launchpadcontent.net/mozillateam/ppa/ubuntu lunar/main amd64 Packages
        
        root@ubuntu:/# echo -e "Package: *\nPin: release o=LP-PPA-mozillateam\nPin-Priority: 1001" | tee /etc/apt/preferences.d/mozilla-firefox
        Package: *
        Pin: release o=LP-PPA-mozillateam
        Pin-Priority: 1001
        
        root@ubuntu:/# apt-cache policy firefox
        firefox:
          Installed: (none)
          Candidate: 120.0.1+build1-0ubuntu0.23.04.1~mt1
          Version table:
             1:1snap1-0ubuntu3 500
                500 http://archive.ubuntu.com/ubuntu lunar/main amd64 Packages
             120.0.1+build1-0ubuntu0.23.04.1~mt1 1001
               1001 https://ppa.launchpadcontent.net/mozillateam/ppa/ubuntu lunar/main amd64 Packages
  6. Install Firefox. You should see that https://ppa.launchpadcontent.net/mozillateam/ppa/ubuntu is being used to download Firefox. If not search for errors.
      • apt-get install firefox
    Comments