Feuerfest

Just the private blog of a Linux sysadmin

Using and configuring unattended-upgrades under Debian Bookworm - Part 1: Preparations

Photo by Markus Winkler: https://www.pexels.com/photo/wood-writing-mathematics-typography-19915915/

Preface

At the time of this writing Debian Bookworm is the stable release of Debian. It utilizes Systemd timers for all automation tasks like the updating of the package lists and execution of the actual apt-get upgrade. Therefore we won't need to configure APT-Parameters in files like /etc/apt/apt.conf.d/02periodic. In fact some of these files don't even exist on my systems. Keep that in mind if you read this article along with others, who might do things differently - or for older/newer releases of Debian.

Part 2 can be found here: Using and configuring unattended-upgrades under Debian Bookworm - Part 2: Practise

unattended-upgrades is not your enemy

I've read and heard my fair share of people condemning unattended-upgrades as it:

  • Made things unnecessary complicated
  • Broke the system
  • Ignored APT-Pinnings and caused problems because of that
  • Just didn't work
  • Caused problems with High Availability (HA) services
  • Made the database fail
  • Split-Brains occurred only due to unattended-upgrades restarting services
  • Restarted services during work hours causing disruptions and annoyances

You should get the gist. And then there is my experience, running unattended-upgrades on over 6000 Debian servers running all sort of services. From Apaches, Tomcats, JBoss and HAProxy to DRBD, GlusterFS, Corosync, Pacemaker some NoSQL-DBs and services like keepalived and ipvsadm for Layer2/3-Loadbalancing or Quagga/FRR for BGP-Route announcements. And in all those years unattended-upgrades wasn't the root cause of a single problem, outage or incident.

Therefore I suspect that many of the people whose complaints I've read just installed it and never cared about it again. Or, at least, didn't care enough. Presumably they just wanted a quick solution to help them get their systems updated. Well yes, unattended-upgrades is the right tool for this. But as always:

Know your system and plan accordingly!

Things to consider beforehand

Let me give you a list of things to consider before you even install unattended-upgrades.

  1. Are all of your APT-Repositories in order?
    • GPG-keys valid and present?
    • Correct repository for your Debian release?
    • Do you use only official Debian repositories?
    • Do you use Debian repositories from other projects/vendors?
    • Are your repositories in sync on all your servers?
      • Preferably you use the same Debian Mirror on all of your systems. At least make sure you don't use different repositories for security and system updates across all your Debian systems.
      • This prevents problems from outdated mirrors. Happens rarely, but can happen.
        • Trust me: When 2 machines out of a cluster of 5 behave differently, your first thought or troubleshooting step won't be to check if the different Apt-Repositories have the same content
      • Remember: Those are often provided as-is by volunteers. They are NOT a commercial service. And yet most of these volunteers do an exceptional & awesome job despite not being paid for it.
      • Best case scenario: Your company has an internal Debian Mirror. Saving you money on bandwidth usage.
      • Have a look at https://www.debian.org/mirror/ftpmirror.en.html or Aptly if you plan on setting up a mirror.
    • Basically: If an apt-get update prints out anything other then your configured repositories, followed by the line Reading package lists... Done: Fix your repositories!
    • Yes, you can have vendors with broken Debian repositories. Most often the Release file will be buggy or missing at all. If you can't get your vendor to fix it, well, then the best option is to not specify the repository at all and ensure you have another form of automation to update those packages.
      • Sadly a wget http://some-company.tld/some.deb && dpkg -i some.deb is still considered valuable quality work at far too many enterprise software companies out there..
        • Or you just work around that nuisance, create an internal repository yourself and upload the packages there.
      • A good time to remind them that you pay them and that you need a fully working Debian repository, following the Debian guidelines so you can automatically patch all your systems.
    • Do NOT continue otherwise. You have been warned.
  2. What kind of services does your system provide?
    • Here it is essential to know what the system does. Technically and organizationally.
    • What services are provided, why, to whom?
    • Are there any Service Level Agreements (SLAs) in place?
      • Do downtimes of a service need to be scheduled first?
      • Or can service restarts happen at any time?
      • Also automatically? Or is human supervision required by law/standards? (Looking at you PCI DSS (Wikipedia) 😉)
      • Or is there already an agreed upon maintenance window during which the service is allowed to be unresponsive?
    • Do all services have High Availability (HA)?
      • Do the service(s) survive if the primary/master/main system is shut down?
      • How many systems can be unavailable at the same time?
      • Or do other tasks need to be done (manually) on secondary/slave/standby systems?
      • Does your failover work?
      • Is the failover tested regularly?
    • How are you informed if the service(s) stop responding?
      • Is your monitoring set to check right after updates did happen? (You can specify the time when unattended-upgrades does install the updates.)
  3. This will enable you to classify all your installed packages into the following categories for unattended-upgrades:
    • Can be updated anytime
    • Can be updated at certain times / Only a certain number of systems is allowed to be unavailable at any time
      • See: systemctl list-timers especially apt-daily-upgrade.timer and maybe apt-daily.timer
    • Only manual updates (Blacklisting)
      • Unattended-Upgrade::Package-Blacklist in /etc/apt/apt.conf.d/50unattended-upgrades is your friend
      • Then execute apt-get update && apt-get upgrade manually to update the blacklisted packages
    • Never update this package / We need a specific version
      • This will need blacklisting AND APT-Pinnings to be in place
      • As an apt-get update && apt-get upgrade can still be executed manually
  4. In certain situations (HA-Nodes, manually triggered fail-over, etc.) you will need to run specific commands which need to be executed before or after an package update.
    • This is something unattended-upgrades can't help you with
    • Sometimes these are commands which should be included in the Debian package itself. Namely in the preinst, prerm, postinst and postrm files. The so-called package maintainer scripts (Official Debian documentation).
      • But more often the commands are customer/environment specific and adding them to the maintainer script makes no sense or can even cause disruptions in the service.
    • A feasible workaround is the utilisation of a drop-in file if your service is started via a Systemd Unit-File, see https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html and search for "drop-in". The ArchWiki also has a good article: https://wiki.archlinux.org/title/systemd#Drop-in_files - but keep in mind that Arch is based on Gentoo and hence maybe has different paths than Debian
    • If the correct procedure is missing from the package, or isn't suitable for your environment.. Then its best to exclude the packages from unattended-upgrades and get yourself some tool like Ansible, Puppet or Rundeck to automate the execution of manual tasks. (Good how I love Rundeck. 🥰) There you are able to ensure everything is valid and verified before switching your primary cluster node and running package updates.
  5. If you have some kind of ITIL ChangeManagement process in place this, of course, also can't be checked by unattended-upgrades
    • The only viable solution would be to have different repositories based on your environment classifications (development, QA testing, production) or approval classifications (untested, testing, approved) and push packages to the corresponding repository once the ChangeManagement process is completed
    • Then you can install the updates automatically from there
  6. Unattended-Upgrades is great! But you can't automate every single scenario by just using it!
    • Sometimes this even means to go back to the drawing board and optimise your internal company processes first. Especially in situations where approvals have to be given manually by real humans
  7. When you start rolling out unattended-upgrades it's better to have an overview first on how many patches are missing on each server. The more updates you install, the more likely it is that things will change or even break. Or may it even just those informal log messages that some parameter or option will be deprecated in the next major release.
    • I advise to start with non-critical systems first
    • Ideally systems which are somewhat up-to-date so you can identify & troubleshoot problems easier
    • It's perfectly fine to update all systems manually first and only then enable unattended-upgrades
      • As then you will have a common ground for all your systems and bugs are easier to identify as the same bug will affect all systems at somewhat the same time
  8. You need to have a quick & easy, bureaucracy-free process to add packages to the blacklist
    • There once were broken corosync and keepalived packages in Debian in the early 2010s
    • Once we saw that, we immediately added them to the blacklist and downgraded the other systems where the update was already installed
    • You don't want to spent one hour on the phone frantically trying to reach a member of your Change Advisory Board to greenlight the change which lets you modify the blacklist while unattended-upgrades happily wrecks one system after another
  9. Point 8 sets a requirement: How do you roll out a new blacklist/config for unattended-upgrades to 6000 servers in under 15 minutes?
    • You are too slow to do that manually, no matter how fast you can type
    • for HOST in $(cat host-list.txt); do ssh $HOST some-command; done? Yeah... No. It works, sure... But.. Come on, really? And this does only one host at a time.. Still taking too long.
    • You do want something like Puppet, Ansible or Rundeck for this very reason.
      • Change the configuration in hiera
      • Commit it to git
      • Log in to Rundeck and execute the "Do a manually triggered Puppet/Ansible run" job
      • Then drink a coffee while you watch Rundeck doing it's job and can care for the 10 or so servers who will fail for various other reasons.

Understanding apt-cache policy

The next thing that will help you vastly in providing a smoothly running unattended-upgrades service is understanding and using the apt-cache tool, especially with the policy argument: apt-cache policy. Even more if you use Debian-Repositories from other projects or vendors.

Why? unattended-upgrades comes with some default configured Origins-Pattern in /etc/apt/apt.conf.d/50unattended-upgrades. These are useable for the official Debian Security updates and when new point releases of Debian are published (12.5, 12.6, etc.). These incorporate all updates since the last point release. Non-Security Updates for installed packages between point releases won't be installed with the default configuration.

If you want these when they are ready, you have to uncomment the line for "origin=Debian,codename=${distro_codename}-updates";.

${distro_codename}-proposed-updates are updates which may not be stable yet. Read https://wiki.debian.org/StableProposedUpdates for the details. Personally I keep it disabled. Especially on production systems.

For Debian Bookworm the default Origins-Pattern are:

Unattended-Upgrade::Origins-Pattern {
[...]
//      "origin=Debian,codename=${distro_codename}-updates";
//      "origin=Debian,codename=${distro_codename}-proposed-updates";
        "origin=Debian,codename=${distro_codename},label=Debian";
        "origin=Debian,codename=${distro_codename},label=Debian-Security";
        "origin=Debian,codename=${distro_codename}-security,label=Debian-Security";
[...]
};

This means only packages matching these Origins-Pattern will be updated. Therefore you have to verify that these patterns include all packages you want to update and, additionally, that packages you do not wish to update are excluded. Although the use of the blacklist might be easier here, as this simply works on the name of the package.

Also note that codename=${distro_codename} is always part of an Origins-Pattern. It is important to not just use terms like stable or oldstable as these do not identify a unique distribution. Those change whenever a new major Debian release happens. (This is also true if you are referencing OS images in things like Docker. debian:latest is most certainly a bad idea, debian-bookworm:latest is better.)

Matching Origin-Patterns to apt-cache policy's output

How do you translate these Origins-Patterns into the lines apt-cache policy gives us?

Short side-note: apt-cache uses the metadata (repository information) obtained via apt-get update. Therefore it also works if the configured repositories are offline, but can also show outdated data if you haven't updated the package list information via apt-get update recently.

If executed you will see output like the following:

root@lanadmin:~# apt-cache policy
Package files:
 100 /var/lib/dpkg/status
     release a=now
 500 http://debian.tu-bs.de/debian bookworm-updates/non-free-firmware amd64 Packages
     release v=12-updates,o=Debian,a=stable-updates,n=bookworm-updates,l=Debian,c=non-free-firmware,b=amd64
     origin debian.tu-bs.de
 500 http://debian.tu-bs.de/debian bookworm-updates/main amd64 Packages
     release v=12-updates,o=Debian,a=stable-updates,n=bookworm-updates,l=Debian,c=main,b=amd64
     origin debian.tu-bs.de
 500 http://security.debian.org/debian-security bookworm-security/non-free-firmware amd64 Packages
     release v=12,o=Debian,a=stable-security,n=bookworm-security,l=Debian-Security,c=non-free-firmware,b=amd64
     origin security.debian.org
 500 http://security.debian.org/debian-security bookworm-security/main amd64 Packages
     release v=12,o=Debian,a=stable-security,n=bookworm-security,l=Debian-Security,c=main,b=amd64
     origin security.debian.org
 500 http://debian.tu-bs.de/debian bookworm/non-free-firmware amd64 Packages
     release v=12.5,o=Debian,a=stable,n=bookworm,l=Debian,c=non-free-firmware,b=amd64
     origin debian.tu-bs.de
 500 http://debian.tu-bs.de/debian bookworm/main amd64 Packages
     release v=12.5,o=Debian,a=stable,n=bookworm,l=Debian,c=main,b=amd64
     origin debian.tu-bs.de
Pinned packages:
root@lanadmin:~#

For the sake of easiness, let us look at just this single line:

 500 http://security.debian.org/debian-security bookworm-security/non-free-firmware amd64 Packages
     release v=12,o=Debian,a=stable-security,n=bookworm-security,l=Debian-Security,c=non-free-firmware,b=amd64
     origin security.debian.org

I wont go over each and every single listed value. If you want to know more: man apt_preferences probably has all the details and apt-cache policy packagename lists all policies for a single package.

We want to pay attention to the 2nd line starting with release. Here we have the relevant values. These are defined in the Release-File for each Debian Release/Repository. A documentation can be found here: https://wiki.debian.org/DebianRepository/Format#A.22Release.22_files

But what do they mean? man apt_preferences also explains them, but as they are relevant, let's make a short table.

Field Alias unattended-upgrades variable Description
Version v N/A Contains the release version
Origin o ${distro_id} Originator of the packages. (Debian if used for packages from the Debian project. If you have commercial packages most likely the company or product name will show up here.)
Archive or Suite a N/A Names the archive to which all the packages in the directory tree belong (on the repository server)
Codename n ${distro_codename} Codename of the Debian release. In our case: bookworm
Label l N/A Names the label of the packages in the directory tree of the Release file.
Component c N/A The licensing component associated with the packages in the directory tree. You may know this as: main, contrib, non-free-firmware, etc.
Architecture b N/A The processor architecture for which a package is compiled. (amd64, i386, arm, etc.)
Site N/A N/A FQDN for the APT-Repository. Useful when fields like Archive/Suite/Label are empty or ambiguous. (Example with component: "site=www.example.com,component=main";)

All this information is usually listed in the first 30 lines of /etc/apt/apt.conf.d/50unattended-upgrades. Therefore: Read it, to ensure you get the currently valid information.

Understanding this makes it easy to match the configured Origin-Patterns to your configure Debian-Repositories.

If you are in doubt: Browse your Debian repository via HTTP/HTTPS and have a look at the Release file, for example: http://debian.tu-bs.de/debian/dists/bookworm/Release. The first 5 lines are:

Origin: Debian
Label: Debian
Suite: stable
Version: 12.5
Codename: bookworm

Filling in the variables we see that only the following Origin-Pattern matches:

"origin=Debian,codename=${distro_codename},label=Debian";

All others either have a different label and/or codename.

Looking at http://debian.tu-bs.de/debian/dists/bookworm-updates/Release we can see that this matches the following commented out Origin-Pattern:

"origin=Debian,codename=${distro_codename}-updates";

This means:

http://debian.tu-bs.de/debian/dists/bookworm/ holds all packages for the current point release of Debian Bookworm.
http://debian.tu-bs.de/debian/dists/bookworm-updates/ has all published updates which came out before a new point release is made. That is the reason why I always enable this repository too. But depending on your operating strategy going only with updates when a new point release is published is also fine.

Security updates are always published via the http://security.debian.org/debian-security/ repository and will be installed when available. 

Key points / Story time

What you should have understood is:

  1. Each repository must be uniquely identifiable.
    • This means: Each Origins-Pattern should only match one of all your configured repositories
    • Of course you can have multiple repositories matching the same Origins-Pattern, but keep the possible implications in mind!
  2. Packages that share the same name must have the same content
    • And by content I mean: Their hashsums must be identical for each given version

If it doesn't you are potentially in for a wild ride.

At a former company we used many internal repositories. And this was fine for a long time. Suddenly some developer started pushing Debian packages with the exact same name as official Debian packages to those internal repositories. We immediately complained. Laying out how that could wreck havoc on our infrastructure as we have to use those repositories and at the same time we can't blacklist that package, as blacklisting works solely on name of a package.

You can't do things like: "Blacklist package test-foo, but only if its coming from repository on repo.coolhost.tld"

We urged him to simply rename those packages or upload them to another repository - as those packages had to share the same name as they contained a not-yet included fix for a bug the company encountered. He wouldn't as he saw no problem with his approach. "Just don't use them." (Yeah thanks.. That's not how it works with automation.. Especially not if you - sort of - hijack repositories which are used for something entirely different..)

The workaround we made was by utilizing the apt priority for each repository, so packages from our internal official Debian mirror took priority. And that worked, but it was still annoying.

Some weeks later those packages caused an incident and in the root cause analysis the problem was identified and those packages were moved to a separate repository.

Lesson learned. Care for your repositories.

And this ends the first part of this post. The next part will focus on the practical side. We will look at the Systemd Unit- and Timer-File, how we can add new repositories to unattended-upgrades, for example to also upgrade our Proxmox installation using the Proxmox Debian repositories, how to blacklist packages and more.

Comments

Creating a systemd timer to regularly pull Git-Repositories and getting to know an uncomfortable systemd/journalctl bug along the way

Photo by Christina Morillo: https://www.pexels.com/photo/white-dry-erase-board-with-red-diagram-1181311/

I have a VM in my LAN which servers as central admin host. There I wanted to create a systemd unit and timer to automatically update my Git-Repositories. The reason is that it happens too often that I push some changes from other machines but forget to pull the repos before I commit my changes done on the admin host.

Sure, stash the commit. Fix any conflicts. No big deal. But still annoying. Therefore: A systemd unit file with a timer that updates the repos every hour and be done with it.

Encountering the bug

While reading the systemd documentation I learned that you can list multiple ExecStart parameters if the service is of Type=oneshot. Then all commands will be executed in sequential order.

Unless Type= is oneshot, exactly one command must be given. When Type=oneshot is used, zero or more commands may be specified. [...] If more than one command is specified, the commands are invoked sequentially in the order they appear in the unit file. If one of the commands fails (and is not prefixed with "-"), other lines are not executed, and the unit is considered failed.

From: https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#ExecStart=

This seems to be the easiest way to achieve my goal. Just add one ExecStart= line for every Git-Repo, done. For testing I didn't write the timer file. Wanting to verify that the unit file works. And the following unit file works flawlessly.

user@lanadmin:~$ systemctl --user cat git-update
# /home/user/.config/systemd/user/git-update.service
[Unit]
Description=Update git-Repositories
After=network-online.target
Wants=network-online.target

[Service]
# Allows the execution of multiple ExecStart parameters in sequential order
Type=oneshot
# Show status "dead" after commands are executed (this is just commands being run)
RemainAfterExit=no
# git pull = git fetch + git merge
ExecStart=/usr/bin/git -C %h/git/github/chrlau/dotfiles pull
ExecStart=/usr/bin/git -C %h/git/github/chrlau/scripts pull

[Install]
WantedBy=default.target

However, while initially writing it and looking at the output of systemctl --user status git-update I noticed something.

user@lanadmin:~$ systemctl --user status git-update
○ git-update.service - Update git-Repositories
     Loaded: loaded (/home/user/.config/systemd/user/git-update.service; enabled; preset: enabled)
     Active: inactive (dead) since Fri 2024-04-26 22:16:21 CEST; 15s ago
    Process: 39510 ExecStart=/usr/bin/git -C /home/user/git/github/chrlau/dotfiles pull (code=exited, status=0/SUCCESS)
    Process: 39516 ExecStart=/usr/bin/git -C /home/user/git/github/chrlau/scripts pull (code=exited, status=0/SUCCESS)
   Main PID: 39516 (code=exited, status=0/SUCCESS)
        CPU: 40ms

Apr 26 22:16:18 lanadmin systemd[9234]: Starting git-update.service - Update git-Repositories...
Apr 26 22:16:21 lanadmin git[39521]: Already up to date.
Apr 26 22:16:21 lanadmin systemd[9234]: Finished git-update.service - Update git-Repositories.

There should be two log lines with git[pid]: Already up to date. After all we call git two times. But there is only one line. Why!?

At first I considered something like rate-limiting or the de-duplication of identical log messages. But I found nothing. Only an old RedHat Bugreports from 2013 & 2017 about how journalctl can't always catch the necessary process information (cgroup, etc.) from /proc before the process is gone (read: Bug 963620 - journald: we need a way to get audit, cgroup, ... information attached to log messages instead of asynchronously reading them in and Bug 1426152 - Journalctl miss to show logs from unit). Especially with short-running processes this occurred regularly. This can't be the reason, or?

I checked the journal for the unit file. Line still missing.

user@lanadmin:~$ journalctl --user -u git-update
Apr 26 22:16:18 lanadmin systemd[9234]: Starting git-update.service - Update git-Repositories...
Apr 26 22:16:21 lanadmin git[39521]: Already up to date.
Apr 26 22:16:21 lanadmin systemd[9234]: Finished git-update.service - Update git-Repositories.

Then I accidentally executed journalctl without any parameters and...

user@lanadmin:~$ journalctl
[...]
Apr 26 22:16:18 lanadmin systemd[9234]: Starting git-update.service - Update git-Repositories...
Apr 26 22:16:20 lanadmin git[39515]: Already up to date.
Apr 26 22:16:21 lanadmin git[39521]: Already up to date.
Apr 26 22:16:21 lanadmin systemd[9234]: Finished git-update.service - Update git-Repositories.
[...]

There it is. So why does a simple journalctl display both lines, while a systemctl --user status git-update doesn't?

Remembering the bug we just read? journalctl has a verbose mode. This displays all fields for every log line. This should tell us the difference between those to log messages.

At first we have the entry for the Starting git-update.service - Update git-Repositories message. Nothing suspicious here.

user@lanadmin:~$ journalctl -o verbose
Fri 2024-04-26 22:16:18.724396 CEST [s=78cb2a728dda4d579b41ba58b655d4c2;i=6a32;b=859f56a381394260854aeac3b77d87a3;m=1a58bdb7ae0;t=6170593979632;x=ed56438f3a913535]
    PRIORITY=6
    SYSLOG_FACILITY=3
    TID=9234
    SYSLOG_IDENTIFIER=systemd
    _TRANSPORT=journal
    _PID=9234
    _UID=1000
    _GID=1000
    _COMM=systemd
    _EXE=/usr/lib/systemd/systemd
    _CMDLINE=/lib/systemd/systemd --user
    _CAP_EFFECTIVE=0
    _SELINUX_CONTEXT=unconfined
    _AUDIT_SESSION=393
    _AUDIT_LOGINUID=1000
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/init.scope
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_UNIT=user@1000.service
    _SYSTEMD_USER_UNIT=init.scope
    _SYSTEMD_SLICE=user-1000.slice
    _SYSTEMD_USER_SLICE=-.slice
    _BOOT_ID=859f56a381394260854aeac3b77d87a3
    _MACHINE_ID=e83bb1062b594b79817a5c8a5605f9fd
    _HOSTNAME=lanadmin
    _RUNTIME_SCOPE=system
    CODE_FILE=src/core/job.c
    JOB_TYPE=start
    CODE_LINE=581
    CODE_FUNC=job_emit_start_message
    MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5
    MESSAGE=Starting git-update.service - Update git-Repositories...
    JOB_ID=10
    USER_INVOCATION_ID=8f476f2ef43245ba89a9cb69a26f8577
    USER_UNIT=git-update.service
    _SOURCE_REALTIME_TIMESTAMP=1714162578724396

Then comes the entry for the first Already up to date. log message. And it's entry is way shorter than the previous log message. No fields regarding systemd are associated.

Fri 2024-04-26 22:16:20.137988 CEST [s=78cb2a728dda4d579b41ba58b655d4c2;i=6a33;b=859f56a381394260854aeac3b77d87a3;m=1a58bf10cb2;t=6170593ad2804;x=730951cbebf4e84a]
    PRIORITY=6
    SYSLOG_FACILITY=3
    _UID=1000
    _GID=1000
    _BOOT_ID=859f56a381394260854aeac3b77d87a3
    _MACHINE_ID=e83bb1062b594b79817a5c8a5605f9fd
    _HOSTNAME=lanadmin
    _RUNTIME_SCOPE=system
    _TRANSPORT=stdout
    _STREAM_ID=36f2542db1e249da8c5c5b1342d065e8
    SYSLOG_IDENTIFIER=git
    MESSAGE=Already up to date.
    _PID=39515
    _COMM=git

And yep, here is the second Already up to date. log message. It contains all fields and this is the message we see, when we display the journal-entries for our git-update.service unit.

Fri 2024-04-26 22:16:21.471040 CEST [s=78cb2a728dda4d579b41ba58b655d4c2;i=6a34;b=859f56a381394260854aeac3b77d87a3;m=1a58c0563ee;t=6170593c17f40;x=2574d8467f36d20]
    PRIORITY=6
    SYSLOG_FACILITY=3
    _UID=1000
    _GID=1000
    _CAP_EFFECTIVE=0
    _SELINUX_CONTEXT=unconfined
    _AUDIT_SESSION=393
    _AUDIT_LOGINUID=1000
    _SYSTEMD_OWNER_UID=1000
    _SYSTEMD_UNIT=user@1000.service
    _SYSTEMD_SLICE=user-1000.slice
    _BOOT_ID=859f56a381394260854aeac3b77d87a3
    _MACHINE_ID=e83bb1062b594b79817a5c8a5605f9fd
    _HOSTNAME=lanadmin
    _RUNTIME_SCOPE=system
    _TRANSPORT=stdout
    SYSLOG_IDENTIFIER=git
    MESSAGE=Already up to date.
    _COMM=git
    _STREAM_ID=cfc8932e3cf9431aa59873d163d624a8
    _PID=39521
    _SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/app.slice/git-update.service
    _SYSTEMD_USER_UNIT=git-update.service
    _SYSTEMD_USER_SLICE=app.slice
    _SYSTEMD_INVOCATION_ID=8f476f2ef43245ba89a9cb69a26f8577

Great. So how to fix this? Yeah, I can't. Unless I can make the git-process running longer there is no real solution. I tried adding ExecStart=/usr/bin/sleep 1 after each git command, but that of course didn't change anything. As sleep is a different process.

Now I'm left with the following situation: Sometimes both log entries are logged correctly with all fields. Sometimes just one (either the first or second one). And rarely none is logged at all. Then all I have are the standard Starting git-update.service - Update git-Repositories. and Finished git-update.service - Update git-Repositories... log messages which are sent via systemd when a unit file is started and when it finishes.

Beautiful. Just beautiful. I mean.. The syslog facility, identifier and priority is logged each time. So yeah, that's actually a reason for good old rsyslog.

A somewhat of a solution?

The best advise I can currently give is: If you have short lived processes started via systemd and it's important you can easily view all log messages:

  1. Make sure ForwardToSyslog=yes is set in /etc/systemd/journald.conf. Note that the default values are usually listed as comments. So if the line #ForwardToSyslog=yes is present, you should be fine
  2. Install rsyslog or any other traditional syslog service
  3. Configure it to store your log messages in a separate logfile or let it go to /var/log/messages
  4. Don't forget to configure logrotate (or some other sort of logfile rotating) for all logfiles created by rsyslog 😉

I just learned to always execute a plain journalctl during troubleshooting sessions just to make sure that I spot messages from short running processes.

And what about the timer?

This is the timer file I use. It runs once every hour.

user@lanadmin:~$ systemctl --user cat git-update.timer
# /home/user/.config/systemd/user/git-update.timer
[Unit]
Description=Update git-repositories every hour

[Timer]
# Documentation: https://www.freedesktop.org/software/systemd/man/latest/systemd.time.html#Calendar%20Events
OnCalendar=*-*-* *:00:00
Unit=git-update.service

[Install]
WantedBy=default.target

After creating the file you need to enable and start it.

user@lanadmin:~$ systemctl --user enable git-update.timer
Created symlink /home/user/.config/systemd/user/default.target.wants/git-update.timer → /home/user/.config/systemd/user/git-update.timer.

user@lanadmin:~$ systemctl --user start git-update.timer

Using systemctl --user list-timers we can verify that the timer is scheduled to run.

user@lanadmin:~$ systemctl --user list-timers
NEXT                         LEFT       LAST PASSED UNIT                   ACTIVATES
Sun 2024-04-28 16:00:00 CEST 49min left -    -      git-update.timer git-update.service

1 timers listed.
Pass --all to see loaded but inactive timers, too.
Comments

Icinga2 Monitoring notifications via Telegram Bot (Part 1)

Photo by Kindel Media: https://www.pexels.com/photo/low-angle-shot-of-robot-8566526/

One thing I wanted to set up for a long time was to get my Icinga2 notifications via some of the Instant Messaging apps I have on my mobile. So there was Threema, Telegram and Whatsapp to choose from.

Well.. Threema wants money for this kind of service, Whatsapp requires a business account who must be connected with the bot. And Whatsapp Business means I have to pay again? - Don't know - didn't pursue that path any further. As this either meant I would've needed to convert my private account into a business account or get a second account. No, sorry. Not something I want.

Telegram on the other hand? "Yeah, well, message the @botfather account, type /start, type /newbot, set a display and username, get your Bot-Token (for API usage) that's it. The only requirement we have? The username must end in bot." From here it was an easy decision which app I choose. (Telegram documentation here.)

Creating your telegram bot

  1. Search for the @botfather account on Telegram; doesn't matter if you use the mobile app or do it via Telegram Web.
  2. Type /start and a help message will be displayed.
  3. To create a new bot, type: /newbot
  4. Via the question "Alright, a new bot. How are we going to call it? Please choose a name for your bot." you are asked for the display name of your bot.
    • I choose something generic like "Icinga Monitoring Notifications".
  5. Likewise the question "Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot." asks for the username.
    • Choose whatever you like.
  6. If the username is already taken Telegram will state this and simply ask for a new username until you find one which is available.
  7. In the final message you will get your token to access the HTTP API. Note this down and save it in your password manager. We will need this later for Icinga.
  8. To test everything send a message to your bot in Telegram

That's the Telegram part. Pretty easy, right?

Testing our bot from the command line

We are now able to receive (via /getUpdates) and send messages (via /sendMessage) from/to our bot. Define the token as a shell variable and execute the following curl command to get the message that was sent to your bot. Note: Only new messages are received. If you already viewed them in Telegram Web the response will be empty. As seen in the first executed curl command.

Just close Telegram Web and the App on your phone and sent a message via curl. This should do the trick. Later we define our API-Token as a constant in the Icinga2 configuration.

For better readability I pipe the output through jq.

When there is a new message from your Telegram-Account to your bot, you will see a field with the named id. Note this number down. This is the Chat-ID from your account and we need this, so that your bot can actually send you messages.

Relevant documentation links are:

user@host:~$ TOKEN="YOUR-TOKEN"
user@host:~$ curl --silent "https://api.telegram.org/bot${TOKEN}/getUpdates" | jq
{
  "ok": true,
  "result": []
}
user@host:~$ curl --silent "https://api.telegram.org/bot${TOKEN}/getUpdates" | jq
{
  "ok": true,
  "result": [
    {
      "update_id": NUMBER,
      "message": {
        "message_id": 3,
        "from": {
          "id": CHAT-ID,
          "is_bot": false,
          "first_name": "John Doe Example",
          "username": "JohnDoeExample",
          "language_code": "de"
        },
        "chat": {
          "id": CHAT-ID,
          "first_name": "John Doe Example",
          "username": "JohnDoeExample",
          "type": "private"
        },
        "date": 1694637798,
        "text": "This is a test message"
      }
    }
  ]
}

Configuring Icinga2

Now we need to integrate our bot into the Icinga2 notification process. Luckily there were many people before us doing this, so there are already some notification scripts and example configuration files on GitHub.

I choose the scripts found here: https://github.com/lazyfrosch/icinga2-telegram

As I use the distributed monitoring I store some configuration files beneath /etc/icinga2/zones.d/. If you don't use this, feel free to store those files somewhere else. However as I define the Token in /etc/icinga2/constants.conf which isn't synced via the config file sync, I have to make sure that the Notification configuration is also stored outside of /etc/icinga2/zones.d/. Else the distributed setup will fail as the config file sync throws an syntax error on all other machines due to the missing TelegramBotToken constant.

First we define the API-Token in the /etc/icinga2/constants.conf file:

user@host:/etc/icinga2$ grep -B1 TelegramBotToken constants.conf
/* Telegram Bot Token */
const TelegramBotToken = "YOUR-TOKEN-HERE"

Afterwards we download the host and service notification script into /etc/icinga2/scripts and set the executeable bit.

user@host:/etc/icinga2/scripts$ wget https://raw.githubusercontent.com/lazyfrosch/icinga2-telegram/master/telegram-host-notification.sh
user@host:/etc/icinga2/scripts$ wget https://raw.githubusercontent.com/lazyfrosch/icinga2-telegram/master/telegram-service-notification.sh
user@host:/etc/icinga2/scripts$ chmod +x telegram-host-notification.sh telegram-service-notification.sh

Based on the notifications we want to receive, we need to define the variable vars.telegram_chat_id in the appropriate user/group object(s). An example for the icingaadmin is shown below and can be found in the icinga2-example.conf on GitHub: https://github.com/lazyfrosch/icinga2-telegram/blob/master/icinga2-example.conf along with the notification commands which we are setting up after this.

user@host:~$ cat /etc/icinga2/zones.d/global-templates/users.conf
object User "icingaadmin" {
  import "generic-user"

  display_name = "Icinga 2 Admin"
  groups = [ "icingaadmins" ]

  email = "root@localhost"
  vars.telegram_chat_id = "YOUR-CHAT-ID-HERE"
}

Notifications for Host & Service

We need to define 2 new NotificationCommand objects which trigger the telegram-(host|service)-notification.sh scripts. These are stored in /etc/icinga2/conf.d/telegrambot-commands.conf.

Note: We store the NotificationCommands and Notifications in /etc/icinga2/conf.d and NOT in /etc/icinga2/zones.d/master. This is because I have only one master in my setup which sends out notifications and as we defined the constant TelegramBotToken in /etc/icinga2/constants.conf - which is not synced via the zone config sync. Therefore we would run into an syntax error on all Icinga2 agents.

See https://admin.brennt.net/icinga2-error-check-command-does-not-exist-because-of-missing-constant for details.

Of course we could also define the constant in a file under /etc/icinga2/zones.d/master but I choose not to do so for security reasons.

user@host:/etc/icinga2$ cat /etc/icinga2/conf.d/telegrambot-commands.conf
/*
 * Notification Commands for Telegram Bot
 */
object NotificationCommand "telegram-host-notification" {
  import "plugin-notification-command"

  command = [ SysconfDir + "/icinga2/scripts/telegram-host-notification.sh" ]

  env = {
    NOTIFICATIONTYPE = "$notification.type$"
    HOSTNAME = "$host.name$"
    HOSTALIAS = "$host.display_name$"
    HOSTADDRESS = "$address$"
    HOSTSTATE = "$host.state$"
    LONGDATETIME = "$icinga.long_date_time$"
    HOSTOUTPUT = "$host.output$"
    NOTIFICATIONAUTHORNAME = "$notification.author$"
    NOTIFICATIONCOMMENT = "$notification.comment$"
    HOSTDISPLAYNAME = "$host.display_name$"
    TELEGRAM_BOT_TOKEN = TelegramBotToken
    TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"

    // optional
    ICINGAWEB2_URL = "https://host.domain.tld/icingaweb2"
  }
}

object NotificationCommand "telegram-service-notification" {
  import "plugin-notification-command"

  command = [ SysconfDir + "/icinga2/scripts/telegram-service-notification.sh" ]

  env = {
    NOTIFICATIONTYPE = "$notification.type$"
    SERVICEDESC = "$service.name$"
    HOSTNAME = "$host.name$"
    HOSTALIAS = "$host.display_name$"
    HOSTADDRESS = "$address$"
    SERVICESTATE = "$service.state$"
    LONGDATETIME = "$icinga.long_date_time$"
    SERVICEOUTPUT = "$service.output$"
    NOTIFICATIONAUTHORNAME = "$notification.author$"
    NOTIFICATIONCOMMENT = "$notification.comment$"
    HOSTDISPLAYNAME = "$host.display_name$"
    SERVICEDISPLAYNAME = "$service.display_name$"
    TELEGRAM_BOT_TOKEN = TelegramBotToken
    TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"

    // optional
    ICINGAWEB2_URL = "https://host.domain.tld/icingaweb2"
  }
}

As I want to get all notifications for all hosts and services I simply apply the notification object for all hosts and services which have a set host.name. - Same as in the example.

user@host:/etc/icinga2$ cat /etc/icinga2/conf.d/telegrambot-notifications.conf
/*
 * Notifications for alerting via Telegram Bot
 */
apply Notification "telegram-icingaadmin" to Host {
  import "mail-host-notification"
  command = "telegram-host-notification"

  users = [ "icingaadmin" ]

  assign where host.name
}

apply Notification "telegram-icingaadmin" to Service {
  import "mail-service-notification"
  command = "telegram-service-notification"

  users = [ "icingaadmin" ]

  assign where host.name
}

Checking configuration

Now we check if our Icinga2 config has no errors and reload the service:

root@host:~ # icinga2 daemon -C
[2023-09-14 22:19:37 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2023-09-14 22:19:37 +0200] information/cli: Loading configuration file(s).
[2023-09-14 22:19:37 +0200] information/ConfigItem: Committing config item(s).
[2023-09-14 22:19:37 +0200] information/ApiListener: My API identity: hostname.domain.tld
[2023-09-14 22:19:37 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[...]
[2023-09-14 22:19:37 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2023-09-14 22:19:37 +0200] information/cli: Finished validating the configuration file(s).
root@host:~ # systemctl reload icinga2.service

Verify it works

Log into your Icingaweb2 frontend, click the Notification link for a host or service and trigger a custom notification.

And we have a message in Telegram:

What about CheckMK?

If you use CheckMK see this blogpost: https://www.srcbox.net/posts/monitoring-notifications-via-telegram/

Lessons learned

And if you use this setup you will encounter one situation/question rather quickly: Do I really need to be woken up at 3am for every notification?

No. No you don't want to. Not in a professional context and even less in a private context.

Therefore: In part 2 we will register a second bot account and then use these to differentiate between important and unimportant notifications. The unimportant ones will be muted in the Telegram client on our smartphone. The important ones won't, and therefore are able to wake us at 3am in the morning.

Comments

Snap packages and SSL-certificates.. A neverending story

Photo by Magda Ehlers: https://www.pexels.com/photo/a-broken-bicycle-leaning-on-the-wall-5342343/

Alternative title: Why I hate developers who reinvent the wheel - but forget the spokes.

This is more of a rant and link-dump article. I didn't research everything in detail as I generally avoid Snaps. So far they've caused me more troubles than benefits.. But I wanted a place to keep all my arguments and links to certain documentation sites in one place.. So here we go.

Snap is, compared to packaging formats like RPM and APT, relatively new. Maybe this explains why it still has some teething problems. The most annoying one for me is: Snap packages don't use your custom SSL-Certificates, stored in /usr/share/ca-certificates. Which is the default place to store your companies Root-CA certificates. Every browser respects this. There is tooling in every Linux distribution to take care of stored certificates and adding them to the truststore. There is tooling to automatically add these certificates to any Java (or other language) truststore that might reside on your system.

But no, that would be too easy mirroring that behaviour in Snap, right? After all.. Why just reinvent the wheel, when you can have more fun by forgetting the spokes.
Yes, I am aware that centreless/hubless wheels do exist. ;-)

The root cause is often: Snap Confinement (or to be precise: The strict confinement mode). Which means a snap is separated from the system and only has access to directories which are configured at build time of the snap, as stated in Interface Management documentation (see also: https://snapcraft.io/docs/home-interface and https://snapcraft.io/docs/supported-interfaces). And as desirable and good this is. Reality teaches us that for every application you always need a way to modify, to configure it. At least certain aspects. With snap.. Not so much?

In the latest Ubuntu releases Firefox and Chromium have been migrated exclusively to snap. You cannot install a Firefox via apt and have a normal install. You will get a snap package. Which means: Goodbye SSL Client cert authentication, goodbye internal company SSL certificates. Bug 1901586: [snap] CA Certificates from /usr/local/share/ca-certificates are not used has all the details.

Yes, clever people might add: But you can do a bind mount and mount that directory under /home where nearly every Snap application has access. But why do I need to do this? Why does Snap impose that burden on me, the user? And this doesn't fix the SSL issue..

Oh, I should copy them to /etc/ssl/certs? And what about the other troubles this might cause? .. Hello? Nothing? Oh, okay..

And it keeps getting better: If you install the Videoplayer VLC as a snap package and your files are not located under /home.. You can't access them. Snap developers will happily advise you to move your files, change the mountpoint or do a mount --bind under /home. Which can be seen here: Bug 1643706: snap apps need to be able to browse outside of user $HOME dir. for Desktop installs

And this is the point I want to make. It's OK to design a new packaging format for applications. It's okay to add security mechanism like snap confinement. But designing them in way so that each and every user is forced to manage/store/organize files in the way snap dictates? Nope, sorry.

Someone on the Internet wrote that Snap is a great idea for things like phones, PCs in public space (libraries, schools, etc.) or other confined environments where the user doesn't and shouldn't be allowed to configure many aspects. Where a system is designed and offered for only some specific purposes which are more or less static and change seldom.

And I agree. For me the root cause with all my problems regarding Snap is: It isn't the only daemon on my system. It HAS to integrate into the existing system and processes, like update-ca-certificates or even where in the filesystem I store my files. This was all there before Snap existed and now applications which always worked won't because someone thought it is better that way.. No, sorry. Like I said before: It HAS to integrate into the existing system. If it doesn't.. It might still have it use-cases. I'm not arguing against that. But then please let me have a choice! But breaking existing workflows, file structures, etc. that is not acceptable for me. And as the Internet shows, also not for many other users.

The sad part is that the decision to make Firefox only available as a Snap was done by Mozilla & Canoncial. Therefore you can't download the .deb from the Mozilla Webpage or the like. (Luckily there is still a PPA which build Firefox as .deb package offered by volunteers. This means it can potentially vanish if there is no one left doing it. But that's more or less the risk with any piece of software/technology.)

The announcement is on their Discourse: https://discourse.ubuntu.com/t/feature-freeze-exception-seeding-the-official-firefox-snap-in-ubuntu-desktop/24210/1

Oh and Snaps tend to auto-upgrade, even without unattended-upgrades configured. Which can be a problem if you require a specific version. Luckily snap refresh --hold can hold updates for all Snap packages. Or, if you just want it for some specific package/time use snap refresh --hold=72h vlc. This post has more information: Hold your horses, I mean snaps! New feature lets you stop snap updates, for as long as you need (snapcraft.io)

Security considerations

Snap was invented by Canoncial. The Snap-Store is run by Canoncial. Snap packages are slowly replacing more and more .deb-packages in Ubuntu, when you type apt-get install packagename you will automatically get a snap package if one exists.

Or if you try to execute a command which isn't present on your system. Then the command-not-found helper from Ubuntu will recommend you the associated .deb or Snap package.

The problem? Aquasec discovered that many traditional programs are not available as a Snap, but Canoncial still allows you to register Snaps with the exact same name, despite providing an entirely different program. (aquasec.com)

Additionally you can specify aliases or register Snaps for the literal filenames. The example shows how, when you try to execute the command tarquingui, command-not-found recommends to install the tarquin Snap. Which provides the tarquingui program. But they were able to additionally register a Snap with the name tarquingui. What happens? command-not-found now recommends both Snaps.

My personal opinion is that it is a dangerous and dumb oversight to not reserve every APT-package on the Snap store. Preventing the hijacking of well established software packages by potentially malicious third parties. After all it was Canoncial who introduced Snap and it's Canoncial who solely operates the Snap-Store...

How to install Firefox as .deb package (Ubuntu 23.04)

Update: I recently learned of a bug in unattended-upgrades which ignores apt-pinnings if the origin is not listed in the allowed-origins: #2033646 unattended-upgrade ignores apt-pinning to not-allowed origins

To prevent this bug, use the following workaround:

root@host:~# echo 'Unattended-Upgrade::Allowed-Origins:: "LP-PPA-mozillateam:${distro_codename}";' | sudo tee /etc/apt/apt.conf.d/51unattended-upgrades-firefox
  1. Remove the Firefox snap (omit if not present)
    • snap remove firefox
    • If dpkg -l firefox still shows it as installed, execute:
      • apt-get remove --purge firefox
  2. Add the Mozilla PPA
    • add-apt-repository ppa:mozillateam/ppa
      • If the command add-apt-repository is missing, install the package software-properties-common
  3. Pin the Mozilla PPA-Repository with higher priority so packages are installed out of this PPA instead of other repositories where they might be available too
    • echo -e "Package: *\nPin: release o=LP-PPA-mozillateam\nPin-Priority: 1001" | sudo tee /etc/apt/preferences.d/mozilla-firefox
  4. Update repository information
    • apt-get update
  5. Now check that your APT-Package Policies are correct
    • apt-cache policy firefox
      • If set up correctly Candidate: will list the package name which is not a Snap. See the example below.
      • root@ubuntu:/# apt-cache policy firefox
        firefox:
          Installed: (none)
          Candidate: 1:1snap1-0ubuntu3
          Version table:
             1:1snap1-0ubuntu3 500
                500 http://archive.ubuntu.com/ubuntu lunar/main amd64 Packages
             120.0.1+build1-0ubuntu0.23.04.1~mt1 500
                500 https://ppa.launchpadcontent.net/mozillateam/ppa/ubuntu lunar/main amd64 Packages
        
        root@ubuntu:/# echo -e "Package: *\nPin: release o=LP-PPA-mozillateam\nPin-Priority: 1001" | tee /etc/apt/preferences.d/mozilla-firefox
        Package: *
        Pin: release o=LP-PPA-mozillateam
        Pin-Priority: 1001
        
        root@ubuntu:/# apt-cache policy firefox
        firefox:
          Installed: (none)
          Candidate: 120.0.1+build1-0ubuntu0.23.04.1~mt1
          Version table:
             1:1snap1-0ubuntu3 500
                500 http://archive.ubuntu.com/ubuntu lunar/main amd64 Packages
             120.0.1+build1-0ubuntu0.23.04.1~mt1 1001
               1001 https://ppa.launchpadcontent.net/mozillateam/ppa/ubuntu lunar/main amd64 Packages
  6. Install Firefox. You should see that https://ppa.launchpadcontent.net/mozillateam/ppa/ubuntu is being used to download Firefox. If not search for errors.
      • apt-get install firefox
    Comments