Feuerfest

Just the private blog of a Linux sysadmin

How to stop logging HTTP-Requets from check_http in Apache 2

Photo by Sena: https://www.pexels.com/photo/wood-covered-in-snow-10944259/

I encountered a situation where the HTTP(S)-Requests done by Icinga via check_http checkplugin seriously messed up the generated website traffic reports. The software used to generate said reports wasn't capable of filtering out certain requests based on User-Agents. Restructuring the report in a way that the hits from the monitoring requests could be easily identified was also out of the question as this would generate too much follow-up work for other involved parties & systems.

The solution (or rather: workaround...) was to simply omit the logging of said monitoring requests to the logfile of the Apache vHost.

The relevant line in the virtual host config was the following:

CustomLog /var/log/apache2/host.domain-tld-access.log combined

This defines where to store the logfile in the "NCSA extended/combined log format" as specified on https://httpd.apache.org/docs/2.4/mod/mod_log_config.html#formats respectively in the /etc/apache2/apache2.conf in Debian.

root@host:~ # grep LogFormat /etc/apache2/apache2.conf
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

SetEnvIf to the rescue!

Apache2 allows us to set environment variables for each request via the SetEnvIf directive (among others). This is documented on the page for mod_setenvif: https://httpd.apache.org/docs/2.4/mod/mod_setenvif.html#setenvif

A typical line in the access.log would log like the following:

256.133.7.42 - - [30/Dec/2024:05:46:18 +0100] "GET / HTTP/1.0" 302 390 "-" "check_http/v2.3.1 (monitoring-plugins 2.3.1)

Since the requests were for the URI / we can't exclude requests based on Request_URI. Instead, we have to go for the supplied user-agent. Fortunately this is quite unique.

To catch future versions we use a wildcard-regex for the version numbers. As environment variable I choose monitoring.

# Don't log Icinga monitoring requests
SetEnvIf User-Agent "check_http/v(.*) \(monitoring-plugins (.*)\)" monitoring

Now all that is needed is to adopt the CustomLog-directive in the virtual host configuration. As the documentation for the CustomLog-directive explains we can access the environment variables via the env= option. Negating this will simply omit all requests with that environment variable set.

Therefore our resulting CustomLog line now looks like this:

CustomLog /var/log/apache2/admin.brennt.net-access.log combined env=!monitoring

Now just check your configuration for errors via apach2ctl configtest and restart your Apache webserver.

Comments

Using ping to monitor if your systems are working is wrong

Generated via Dall-E

I've seen this far too often. When monitoring a system a simple ping command is used to verify that the system is "up and running". In reality nothing could be further from the truth as this is not what you are actively checking.

Ping utilizes the Internet Control Message Protocol (ICMP) to send so-called ping-request packets to a server, which the server will answer with ping-reply packets. Giving us a method to verify that the server is actively answering our requests. But if the server is answering the packets, it doesn't mean that the server itself is in working condition. That all services are running.

And this is because of several reasons:

1. The network is reliable - mostly

In https://aphyr.com/posts/288-the-network-is-reliable Kyle Kingsbury, a.k.a "Aphyr" and Peter Bailis discuss common network fallacies, similar to the Fallacies of distributed computing by L. Peter Deutsch. As it is commonly assumed that the network "just works" and no strange things will happen. When indeed they do happen all the time.

In regard to our ICMP ping this means:

  1. There can be a firewall blocking ICMP or simply all traffic from our monitoring system
  2. Routing can be misconfigured
  3. Datacenter outages can happen
  4. Bandwidth can be drastically reduced
  5. VLANs can be misconfigured
  6. Cables can be broken
  7. Switchports can be defect
  8. Add your own ideas what can go wrong in a network

And you do want a monitoring method which allows you to reliably distinguish between network and system problems.

2. CPU Lockups

ICMP packets are answered by the kernel itself. This can have the nasty side-effect that your server literally hangs. Trapped in a state known as either Soft or Hard Lockup. And while overall they are somewhat rare - CPU Soft Lockups still do occur from time to time in my experience. Especially with earlier versions of hypervisors for virtual machines (VMs) as a CPU Soft Lockup can be triggered if there is simply too much CPU load on a system.

But the nasty side-effect of CPU Soft Lockups? The system will still reply to ICMP packets, while all other services are unreachable.

I once had problems with power management (ACPI) with a servers hardware. Somehow the ACPI kernel module would lock resources without freeing them. This effectively meant that the system came to a complete stop - but it didn't reboot or shutdown. Nor did it crash as in "Completely unreachable". No, ICMP packets were still answered quite fine.

Just no SSH connection was possible. No TCP or UDP services reachable. As the CPU was stuck at a certain operation and never switched to process other tasks.

Only disabling ACPI by adding the acpi=off parameter to the grub kernel boot command line "fixed" this.

Regarding soft lockups I can recommend reading the following:

  1. Linux Magic System Request Key Hacks: Here you learn how you can trigger a kernel panic yourself and how to configure a few things 
  2. https://www.baeldung.com/linux/terminal-kernel-panic also has a nice list of ways to trigger a kernel panic from the command line
  3. https://www.kernel.org/doc/Documentation/lockup-watchdogs.txt Kernel documentation regarding "Softlockup detector and hardlockup detector (aka nmi_watchdog)"
  4. This SuSE knowledge base article also has some good basic information on how to configure timers, etc. https://www.suse.com/support/kb/doc/?id=000018705

Takeaways

  1. ICMP is suited to check if the system is reachable in the network
    • After all ICMP is more cost-effective than TCP in terms of package size and number of packages sent
  2. A TCP connect to the port providing the used service is usually better for the reasons stated above
  3. You must incorporate your network topology in your monitoring system; only then you will be able to properly distinguish between: "System unreachable", "Misconfigured switchport" and "Service stopped responding"
    • This means switches, firewalls, routers, loadbalancers, gateways - everything your users/service depends upon to be reachable must be included in your monitoring system, and:
  4. If possible the dependencies between them should be defined
    • Like: Router → Firewall → LoadBalancer → Switch → System → Service

Conclusion

Knowing all this you should keep the following in mind: A successful ping only verifies that the system is reachable via your network. And this doesn't imply anything about the state of the OS.

Yes, this is no mind-blowing truth that I reveal here. But still I encounter monitoring setups where ICMP is used to verify that a system is "up and running" far too often.

Comments

How to install DiagnosisTools on my Synology DiskStation 411, or: Why I love the Internet

Synology Inc. https://www.synology.com/img/products/detail/DS423/heading.png

I own a Synology DiskStation 411 - in short: DS411. It looks like the one in the picture - which shows the successor model DS423. The DS411 runs some custom Linux and therefore is missing a lot of common tools. Recently I had some network error which made my NAS unreachable from one of my Proxmox VMs. The debugging process was made harder as I had no tools such as lsof, strace or strings.

I googled a bit and learned that Synology offers a DiagnosisTool package which contains all these tools and more. The package center however showed no such package for me. From the search results I got that it should be shown in the package center if there is a compatible version for the DSM version running on the NAS. So seems there is no compatible version for my DS411 running DSM 6.4.2?

Luckily there is a command to install the tools: synogear install

root@DiskStation:~# synogear install
failed to get DiagnosisTool ... can't parse actual package download link from info file

Okay, that's uncool. But still doesn't explain why we can't install them. What does synogear install actually do? Time to investigate. As we have no stat or whereis we have to resort to command -v which is included in the POSIX standard. (Yes, which is available, but as command is available everywhere that's the better choice.)

root@DiskStation:~# command -v synogear
/usr/syno/bin/synogear
root@DiskStation:~# head /usr/syno/bin/synogear
#!/bin/sh

DEBUG_MODE="no"
TOOL_PATH="/var/packages/DiagnosisTool/target/tool/"
TEMP_PROFILE_DIR="/var/packages/DiagnosisTool/etc/"

STATUS_NOT_INSTALLED=101
STATUS_NOT_LOADED=102
STATUS_LOADED=103
STATUS_REMOVED=104

Nice! /usr/syno/bin/synogear is a simple shellscript. Therefore we can just run it in debug mode and see what is happening without having to read every single line.

root@DiskStation:~# bash -x synogear install
+ DEBUG_MODE=no
+ TOOL_PATH=/var/packages/DiagnosisTool/target/tool/
+ TEMP_PROFILE_DIR=/var/packages/DiagnosisTool/etc/
+ STATUS_NOT_INSTALLED=101
[...]
++ curl -s -L 'https://pkgupdate.synology.com/firmware/v1/get?language=enu&timezone=Amsterdam&unique=synology_88f6282_411&major=6&minor=2&build=25556&package_update_channel=stable&package=DiagnosisTool'
+ reference_json='{"package":{}}'
++ echo '{"package":{}}'
++ jq -e -r '.["package"]["link"]'
+ download_url=null
+ echo 'failed to get DiagnosisTool ... can'\''t parse actual package download link from info file'
failed to get DiagnosisTool ... can't parse actual package download link from info file
+ return 255
+ return 255
+ status=255
+ '[' 255 '!=' 102 ']'
+ return 1
root@DiskStation:~#

The main problem seems to be an empty JSON-Response from https://pkgupdate.synology.com/firmware/v1/get?language=enu&timezone=Amsterdam&unique=synology_88f6282_411&major=6&minor=2&build=25556&package_update_channel=stable&package=DiagnosisTool and opening that URL in a browser confirms it. So there really seems to be no package for my Model and DSM version combination.

Through my search I also learned that there is a package archive at https://archive.synology.com/download/Package/DiagnosisTool/ which listed several versions of the DiagnosisTool. One package for each CPU architecture. But that also didn't give many clues as I wasn't familiar with many of the CPU architectures and nothing seems to match my CPU. No Feroceon or 88FR131 or the like.

root@DiskStation:~# cat /proc/cpuinfo
Processor       : Feroceon 88FR131 rev 1 (v5l)
BogoMIPS        : 1589.24
Features        : swp half thumb fastmult edsp
CPU implementer : 0x56
CPU architecture: 5TE
CPU variant     : 0x2
CPU part        : 0x131
CPU revision    : 1

Hardware        : Synology 6282 board
Revision        : 0000
Serial          : 0000000000000000

As I didn't want to randomly install all sorts of packages for different CPU architectures - not knowing how good or bad Synology is in preventing the installation of non-matching packages, I opted for the r/synology subreddit and stopped my side-quest at this point, focusing on the main problem.

Random redditors to the rescue!

Nothing much happened for 2 months and I had already forgotten about that thread. My problem with the VM was solved in the meantime and I had no reason to pursue it any further.

Then someone replied. This person apparently did try all sorts of packages for the DiskStation additionally providing a link to the package that worked. It was there that I noticed something. The link provided was: https://global.synologydownload.com/download/Package/spk/DiagnosisTool/1.1-0112/DiagnosisTool-88f628x-1.1-0112.spk and I recognized the string 88f628x but couldn't pin down where I had spotted it. Only then it dawned on me: 88f for the Feroceon 88FR131 and 628x for all Synology 628x boards. Could this really be it?

Armed with this information I quickly identified https://global.synologydownload.com/download/Package/spk/DiagnosisTool/3.0.1-3008/DiagnosisTool-88f628x-3.0.1-3008.spk to be the last version for my DS411 and installed it via the package center GUI (button "Manual installation" and then navigated to the downloaded .spk file on my computer).

Note: The .spk file seems to be a normal compressed tar file and can be opened with the usual tools. The file structure inside is roughly the same as with any RPM or DEB package. Making it easy to understand what happens during/after package installation.

The installation went fine, no errors reported and after that synogear install worked:

root@DiskStation:~# synogear install
Tools are installed and ready to use.
DiagnosisTool version: 3.0.1-3008

And I was finally able to use lsof!

root@DiskStation:~# lsof -v
lsof version information:
    revision: 4.89
    latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/
    latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ
    latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man
    constructed: Mon Jan 20 18:58:20 CST 2020
    compiler: /usr/local/arm-marvell-linux-gnueabi/bin/arm-marvell-linux-gnueabi-ccache-gcc
    compiler version: 4.6.4
    compiler flags: -DSYNOPLAT_F_ARMV5 -O2 -mcpu=marvell-f -include /usr/syno/include/platformconfig.h -DSYNO_ENVIRONMENT -DBUILD_ARCH=32 -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -g -DSDK_VER_MIN_REQUIRED=600 -pipe -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -O2 -Wno-unused-result -DNETLINK_SOCK_DIAG=4 -DLINUXV=26032 -DGLIBCV=215 -DHASIPv6 -DNEEDS_NETINET_TCPH -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -DLSOF_ARCH="arm" -DLSOF_VSTR="2.6.32" -I/usr/local/arm-marvell-linux-gnueabi/arm-marvell-linux-gnueabi/libc/usr/include -I/usr/local/arm-marvell-linux-gnueabi/arm-marvell-linux-gnueabi/libc/usr/include -O
    loader flags: -L./lib -llsof
    Anyone can list all files.
    /dev warnings are disabled.
    Kernel ID check is disabled.

The redditor was also so nice to let me know that I have to execute synogear install each time before I can use these tools. Huh? Why that? Shouldn't they be in my path?

Turns out: No, the directory /var/packages/DiagnosisTool/target/tool/ isn't included into our PATH environment variable.

root@DiskStation:~# echo $PATH
/sbin:/bin:/usr/sbin:/usr/bin:/usr/syno/sbin:/usr/syno/bin:/usr/local/sbin:/usr/local/bin

synogear install does that. It copies /etc/profile to /var/packages/DiagnosisTool/etc/.profile, while removing all lines starting with PATH or export PATH. Adding the path to the tools directory to the new PATH and exporting that and setting the new .profile file in the ENV environment variable.

Most likely this is just a precaution for novice users.

And to check if the tools are "loaded" they grep if /var/packages/DiagnosisTool/target/tool/ is included in $PATH.

So yeah, there is no technical reason preventing us from just adding /var/packages/DiagnosisTool/target/tool/ to $PATH and be done with that.

And this is why I love the internet. I thought I would've never figured that out if not for someone to post an reply and included the file which worked.

New learnings

Synology mailed me about critical security vulnerabilities present in DSM 6.2.4-25556 Update 7 which are fixed in 6.2.4-25556 Update 8 (read the Release Notes). However the update wasn't offered to me via the normal update dialog in the DSM, as it is a staged rollout. Therefore I opted to download a patch file from the Synology support site. This means I did not download the whole DSM package for a specific version but just from one version to another. And here I noticed that the architecture name is included in the patch filename. Nice.

If you visit https://www.synology.com/en-global/support/download/DS411?version=6.2 do not just click on Download, instead opt to choose your current DSM version and the target version you wish to update to. Then you are offered a patch file and identifier/name for the architecture your DiskStation uses is part of the filename.

This should be a reliable way to identify the architecture for all Synology models - in case it isn't clear through the CPU/board name, etc. as in my case.

List of included tools

Someone ask me for a list of all tools which are included in the DiagnosesTool package. Here you go:

admin@DiskStation:~$ ls -l /var/packages/DiagnosisTool/target/tool/ | awk '{print $9}'
addr2line
addr2name
ar
arping
as
autojump
capsh
c++filt
cifsiostat
clockdiff
dig
domain_test.sh
elfedit
eu-addr2line
eu-ar
eu-elfcmp
eu-elfcompress
eu-elflint
eu-findtextrel
eu-make-debug-archive
eu-nm
eu-objdump
eu-ranlib
eu-readelf
eu-size
eu-stack
eu-strings
eu-strip
eu-unstrip
file
fio
fix_idmap.sh
free
gcore
gdb
gdbserver
getcap
getpcaps
gprof
iftop
iostat
iotop
iperf
iperf3
kill
killall
ld
ld.bfd
ldd
log-analyzer.sh
lsof
ltrace
mpstat
name2addr
ncat
ndisc6
nethogs
nm
nmap
nping
nslookup
objcopy
objdump
perf-check.py
pgrep
pidof
pidstat
ping
ping6
pkill
pmap
ps
pstree
pwdx
ranlib
rarpd
rdisc
rdisc6
readelf
rltraceroute6
run
sa1
sa2
sadc
sadf
sar
setcap
sid2ugid.sh
size
slabtop
sockstat
speedtest-cli.py
strace
strings
strip
sysctl
sysstat
tcpdump_wrapper
tcpspray
tcpspray6
tcptraceroute6
telnet
tload
tmux
top
tracepath
traceroute6
tracert6
uptime
vmstat
w
watch
zblacklist
zmap
ztee
Comments

Etsy is also in the progress of enshittyfication?

Photo by Andrea Piacquadio: https://www.pexels.com/photo/young-annoyed-female-freelancer-using-laptop-at-home-3808008/

Amber McDaniel from SustainableJungle.com made a comprehensive video about Etsy and the downward spiral the platform seems to be locked-in. The developments, changes in processes etc. she describes clearly shout "Enshittyfication" to me.

If you are interested in the details hop over to Youtube and watch here video:

Source: https://www.youtube.com/watch?v=lSyEQAWLQbk

Comments

De mortuis nihil nisi bene

Photo by Veronika Valdova: https://www.pexels.com/photo/cemetery-of-fallen-soldiers-and-veterans-930711/

This is a Latin saying commonly translated to "Speak no ill of the dead." And I somewhat agree with that, however, due to a recent event in Germany I realized that I apply this behaviour in a more contextualized way.

But what happened? Ursula Haverbeck died. She was one of Germany's most known holocaust deniers. Despite being born in 1928 and therefore must having experienced - or at least heard of - the horrors first-hand. She must have seen people vanishing at night. Burning shops from "unwanted people" etc.

Yet she denied the holocaust publicly several times - which is a crime punishable by law in Germany. And to prison she went. I think between 3 to 5 times. For a sentence of, in total, 4 years.

Now she is dead at the age of 96.

And of course there are many jokes about her dead, people being generally happy that this mean-spirited woman is gone, etc. and so on. Just the Internet being ... well, The Internet.

Personally I smiled about some remarks or jokes but saw a line crossed when people were proposing to do illegal things to her grave. That's definitely against too many of my personal viewpoints. No matter if you believe in (a/any) god at all, our of which faith you are, a graveyard is sacred ground. A place where the living can meet the dead on a highly personal level. To ease the sorrow of a lost one. Completely disconnected from any religious dogmas or viewpoints - no matter if you share the same faith as the deceased person or not.
Religious arguments aside: Desecrating just one grave affects all people who have a connection to this graveyard. Totally not acceptable.

However there are many people who post comments with "Speak no ill of the dead." in order to ask people to stop making fun of her. And the common reply is: "There is nothing wrong in telling the truth about a dead person."

And I second this. We do not speak well of many people from the history of mankind either. Of course Hitler & Stalin immediately come to mind.
Well, certain people do, of course. But most people will be very determined in what they think of such people.

So, yes. Say anything about a dead person. As long as it is true. But keep in mind to whom you are speaking.

And this is what I realized. When I am at a funeral I won't go to the griefing partner/family-member/whomever and tell this person: "Ah, well you know.. I never really like X anyway." No, you won't. Common courtesy. Not the time nor the place to play games or live your personal vendetta. And if you can't bring yourself to not say anything like this: Be a nice human being and don't show up at all. Sometimes staying away from a funeral you have been invited to already says more than enough.

Maybe you would state that you will still miss this person - despite giving you hard times every now and then. Again focusing on the good. And this should be fine. As usually the bereaved know the character of the deceased very well for themselves.

For me, the saying therefore reads as: "Speak no lie of the dead and mind who you are talking to."

If we can collectively agree on this, than the Internet will be a better place.

Comments

Howto properly split all logfile content based on timestamps - and realizing my own fallacy

Photo by Mikhail Nilov: https://www.pexels.com/photo/person-in-black-hoodie-using-a-computer-6963061/

I use a Pi-hole for DNS based AdBlocking in my home network. Additionally I installed Unbound as recursive DNS resolver on it. Meaning: I can use the RaspberryPi in my network at home as the DNS server for all my devices. This way I don't have to use the DNS-Servers of my ISP granting me some additionally privacy. Additionally I can see which DNS queries are sent by each device. Leading to surprising revelations.

However recently my internet connection was interrupted and afterwards I noticed that I couldn't access any site or services where I used a domain or hostname to connect to. And while the problem itself (dnsmasq: Maximum number of concurrent DNS queries reached (max: 150)) was fixed easily with a simple restart of the unbound service, I noticed that the /var/log/unbound/unbound.log logfile was uncompressed, unrotated and 3.3 gigabyte in size. Whoops. That happens when no logrotate job is present.

Side gig: A logrotate config for Unbound

Fixing this issue was rather easy. A short search additionally revealed that unbound-control has a log_reopen option which is a good idea to trigger after the logrotate. This way Unbound properly closes old filehandles and uses the new logfile.

root@pihole:~# cat /etc/logrotate.d/unbound
/var/log/unbound/unbound.log {
        monthly
        missingok
        rotate 12
        compress
        delaycompress
        notifempty
        sharedscripts
        create 644
        postrotate
                /usr/sbin/unbound-control log_reopen
        endscript
}

But wait, there is more

However I had it on my list to dig deeper into the dnsmasq: Maximum number of concurrent DNS queries reached (max: 150) error in order to better understand the whole construct of Pi-hole, dnsmasq and Unbound.

However, the logfile was way too big to work conveniently with it. 49.184.687 lines are just too much. Especially on a RaspberryPi with the, in comparison, limited CPU power. Now I could have just split it up after n lines using split -l number-of-lines but that is:

  • Too easy and
  • Did I encounter the need for a script which splits logfile lines based on a range of timestamps more often in the recent time

How to properly split a logfile - and overcomplicating stuff

Most of the unbound logfile lines will have the Unix timestamp in brackets, followed by the process name, the log level the message belongs too and the actual message.

root@pihole:~# head -n 1 /var/log/unbound/unbound.log
[1700653509] unbound[499:0] debug: module config: "subnetcache validator iterator"

However some multi-line message wont follow this format:

[1700798246] unbound[1506:0] info: incoming scrubbed packet: ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr aa ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
chat.cdn.whatsapp.net.  IN      A

;; ANSWER SECTION:
chat.cdn.whatsapp.net.  60      IN      A       157.240.252.61

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:
;; MSG SIZE  rcvd: 55

[1700798246] unbound[1506:0] debug: iter_handle processing q with state QUERY RESPONSE STATE

This means we need the following technical approach:

  1. Generate the Unix-timestamp for the first day in a month at 00:00:00 o'clock
    • Alternatively formulated: The Unix-timestamp for the first second of a month
  2. Generate the Unix-timestamp for the last day of the month at 23:59:59 o'clock
    • The last second of a month
  3. Find the first occurrence of the timestamp from point 1
  4. Find the last occurrence of the timestamp from point 2
  5. Use sed to move the lines for each month into a separate logfile

I will however also show an awk command on how to filter based on the timestamps, useful for logfiles where every line is prefix with a timestamp.

Calculating with date

Luckily date is powerful and easy to use for date calculations. %s gives us the Unix timestamp. We do not need to specify hours:minutes:seconds as date automatically takes 00:00:00 for these values. Automatically giving us the first second of a day. And date also takes care of leap years and possible a lot of other nuisances when it comes to time and date calculations.

To get the last second of a month we simply take the first day of the month, add a month and subtract one second. It can't be easier.

# Unix timestamp for the first second in a month
user@host:~$ date -d "$(date +%Y/%m/01)" "+%Y/%m/%d %X - %s"
2024/11/01 00:00:00 - 1730415600

# Unix timestamp for the last second in a month
user@host:~$ date -d "$(date +%Y/%m/01) + 1 month - 1 second" "+%Y/%m/%d %X - %s"
2024/11/30 23:59:59 - 1733007599

To verify the value we can use this for-loop. It will give us all the date and timestamps we need to confirm that our commands are correct.

user@host:~$ for YEAR in {2023..2024}; do for MONTH in {1..12}; do echo -n "$(date -d "$(date +$YEAR/$MONTH/01)" "+%Y/%m/%d %X - %s")  "; date -d "$(date +$YEAR/$MONTH/01) + 1 month - 1 second" "+%Y/%m/%d %X - %s"; done; done
2023/01/01 00:00:00 - 1672527600  2023/01/31 23:59:59 - 1675205999
2023/02/01 00:00:00 - 1675206000  2023/02/28 23:59:59 - 1677625199
2023/03/01 00:00:00 - 1677625200  2023/03/31 23:59:59 - 1680299999
2023/04/01 00:00:00 - 1680300000  2023/04/30 23:59:59 - 1682891999
2023/05/01 00:00:00 - 1682892000  2023/05/31 23:59:59 - 1685570399
2023/06/01 00:00:00 - 1685570400  2023/06/30 23:59:59 - 1688162399
2023/07/01 00:00:00 - 1688162400  2023/07/31 23:59:59 - 1690840799
2023/08/01 00:00:00 - 1690840800  2023/08/31 23:59:59 - 1693519199
2023/09/01 00:00:00 - 1693519200  2023/09/30 23:59:59 - 1696111199
2023/10/01 00:00:00 - 1696111200  2023/10/31 23:59:59 - 1698793199
2023/11/01 00:00:00 - 1698793200  2023/11/30 23:59:59 - 1701385199
2023/12/01 00:00:00 - 1701385200  2023/12/31 23:59:59 - 1704063599
2024/01/01 00:00:00 - 1704063600  2024/01/31 23:59:59 - 1706741999
2024/02/01 00:00:00 - 1706742000  2024/02/29 23:59:59 - 1709247599
2024/03/01 00:00:00 - 1709247600  2024/03/31 23:59:59 - 1711922399
2024/04/01 00:00:00 - 1711922400  2024/04/30 23:59:59 - 1714514399
2024/05/01 00:00:00 - 1714514400  2024/05/31 23:59:59 - 1717192799
2024/06/01 00:00:00 - 1717192800  2024/06/30 23:59:59 - 1719784799
2024/07/01 00:00:00 - 1719784800  2024/07/31 23:59:59 - 1722463199
2024/08/01 00:00:00 - 1722463200  2024/08/31 23:59:59 - 1725141599
2024/09/01 00:00:00 - 1725141600  2024/09/30 23:59:59 - 1727733599
2024/10/01 00:00:00 - 1727733600  2024/10/31 23:59:59 - 1730415599
2024/11/01 00:00:00 - 1730415600  2024/11/30 23:59:59 - 1733007599
2024/12/01 00:00:00 - 1733007600  2024/12/31 23:59:59 - 1735685999

To verify we can do the reverse (Unix timestamp to date) with the following command:

user@host:~$ date -d @1698793200
Wed  1 Nov 00:00:00 CET 2023

Solution solely working on timestamps

As the logfile timestamp is enclosed in brackets we need to tell awk to treat either [ or ] as a field separator. Then we can use awk to check if the second field is in a given time frame. For the first test run we define the variables manually in our shell and adjust the date commands to only output the Unix timestamp.

And as the logfile starts in November 2023 I set the values accordingly. awk then conveniently puts all lines whose timestamp is between these to values into a separate logfile.

user@host:~$ YEAR=2023
user@host:~$ MONTH=11
user@host:~$ FIRST_SECOND=$(date -d "$(date +$YEAR/$MONTH/01)" "+%s")
user@host:~$ LAST_SECOND=$(date -d "$(date +$YEAR/$MONTH/01) + 1 month - 1 second" "+%s")
user@host:~$ awk -F'[\\[\\]]' -v MIN=${FIRST_SECOND} -v MAX=${LAST_SECOND} '{if($2 >= MIN && $2 =< MAX) print}' /var/log/unbound/unbound.log >> /var/log/unbound/unbound-$YEAR-$MONTH.log

And this would already work fine, if every line would start with the timestamp. As this is not the case we need to add a bit more logic.

So the resulting script would look like this:

user@host:~$ cat date-split.sh
#!/bin/bash
# vim: set tabstop=2 smarttab shiftwidth=2 softtabstop=2 expandtab foldmethod=syntax :

# Split a logfile based on timestamps

LOGFILE="/var/log/unbound/unbound.log"
AWK="$(command -v awk)"
GZIP="$(command -v gzip)"

for YEAR in {2023..2024}; do
  for MONTH in {1..12}; do

    # Logfile starts November 2023 and ends November 2024 - don't grep for values before/after that time window
    if  [[ "$YEAR" -eq 2023 && "$MONTH" -gt 10 ]] ||  [[ "$YEAR" -eq 2024 && "$MONTH" -lt 12 ]]; then

      # Debug
      echo "$YEAR/$MONTH"

      # Calculate first and last second of each month
      FIRST_SECOND="$(date -d "$(date +"$YEAR"/"$MONTH"/01)" "+%s")"
      LAST_SECOND="$(date -d "$(date +"$YEAR"/"$MONTH"/01) + 1 month - 1 second" "+%s")"

      # Export variables so the grep in the sub-shells have this value
      export FIRST_SECOND
      export LAST_SECOND

      # Split logfiles solely based on timestamps
      awk -F'[\\[\\]]' -v MIN=${FIRST_SECOND} -v MAX=${LAST_SECOND} '{if($2 >= MIN && $2 <= MAX) print}' unbound.log >> "unbound-$YEAR-$MONTH.log"

      # Creating all those separate logfiles will probably fill up our diskspace
      #  therefore we gzip them immediately afterwards
      "$GZIP" "/var/log/unbound/unbound-$YEAR-$MONTH.log"

    fi

  done;
done

However, this script is vastly over-engineered. Why? Read on.

StackOverflow to the rescue

I still had the problem with the multi-line log messages. At first I wanted to use grep to get the matching first and last line numbers with head and tail. But uh.. Yeah, I had a fallacy here. As still wouldn't have worked with multi-line logmessages without a timestamp. Also using grep like this is highly inefficient. While it would be fine for a one-time usage script I still hit a road block.

I just wasn't able to get awk to do what I wanted and I resorted to asking my question on StackOverflow. Better to get the input from others then wasting a lot of time.

awk to the rescue

It was only through the answer that I realized that my solution was a bit over-engineered. Why use date if you can use strftime to calculate the year and month from the timestamp directly? The initial answer was:

awk '
$1 ~ /^\[[0-9]+]$/ {
  f = "unbound-" strftime("%m-%Y", substr($1, 2, length($1)-2)) ".log"
  if (f != prev) close(f); prev = f
}
{
  print > f
}' unbound.log

How this works has been explained in detail on StackOverflow, so I just copy & paste it here.

For each line which first field is a [timestamp] (that is, matches regexp ^\[[0-9]+]$), we use substr and length to extract timestamp, strftime to convert it to a mm-YYYY string and assign "unbound-mm-YYYY.log" to variable f. In the second block, that applies to all lines, we print the current line in file f. Note: contrary to shell redirections, in awk, print > FILE appends to FILE.

Edit: as suggested by Ed Morton closing each file when we are done with it should significantly improve the performance if the total number of files is large. if (f != prev) close(f); prev = f added. Ed also noted that escaping the final ] in the regex is useless (and undefined behavior per POSIX). Backslash removed.

And this worked flawlessly. The generated monthly logfiles from my testfile matched exactly the line-numbers per month. Even multi-line log messages and empty lines were included.

All I then did was adding gzip to compress the files directly before the next file is created. Just to prevent filling up the disk completely. Additionally I change the filename from unbound-MM-YYYY.log to unbound-YYYY-MM.log. Yes, the logfile name won't work with logrotate. But I just need it to properly dig through the files and the Year-Month naming will be of great help here. Afterwards I don't need them anymore and will delete them. So this was none of my concern.

This was my new working solution:

awk '$1 ~ /^\[[0-9]+]$/ {
  f = "unbound-" strftime("%Y-%m", substr($1, 2, length($1)-2)) ".log"
  if (f != prev) {
    if (prev) system("gzip " prev)
    close(prev)
    prev = f
  }
}
{
  print > f
}
END {
  if (prev) system("gzip " prev)
}' unbound.log

No bash script with convoluted logic needed. And easily useable for other logfiles too. Just adopt the starting regular expression to match the one the logfile uses and adopt the logic for strftime so the proper timestamp can be created.

Sometimes it's better to ask other people. 😄

Comments