Feuerfest

Just the private blog of a Linux sysadmin

Why I prefer !requiretty over "ssh -t"

Dall-E https://admin.brennt.net/bl-content/uploads/pages/dad5b98ab9f04a2cdca5de3afe2f6b0e/dall-e_sudo.jpg

Claudio Künzler, whom I know briefly from working with him on enhancing is check_equallogic back in 2010, wrote an article over at Geeker's Digest on How to use sudo inside SSH command. Of course he mentions the ssh -t parameter, as without it, we would get the following error message when calling sudo: (Example shamelessly stolen from his article. 😇)

ck@linux:~$ ssh targetserver "sudo whoami"
sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper
sudo: a password is required

And ssh -t is the right call here. Well, to be fair: It's not the only solution and in my eyes even not the best solution.

No, I am not talking about piping the password into the command prompt which is so often recommend as a solution (it's not!) that it makes me sad.

I am talking about the usage of negating requiretty in the /etc/sudoers file or a file under /etc/sudoers.d/ respectively.

Lets take the /etc/sudoers.d/icinga2 file I use in my article How to monitor your APT-repositories with Icinga:

Here I must use NOPASSWD for all executed commands and monitoring plugins as well as the line Defaults:icinga2 !requiretty. This negates the need for a tty for the icinga2 user completely. Omitting either the NOPASSWD or the !requiretty will give us the error message we see above.

root@admin:~ # cat /etc/sudoers.d/icinga2
# This line disables the need for a tty for sudo
#  else we will get all kind of "sudo: a password is required" errors
Defaults:icinga2 !requiretty

# sudo rights for Icinga2
icinga2  ALL=(ALL) NOPASSWD: /usr/bin/unattended-upgrades
icinga2  ALL=(ALL) NOPASSWD: /usr/bin/unattended-upgrade
icinga2  ALL=(ALL) NOPASSWD: /usr/bin/apt-get
icinga2  ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/check_apt

It's also possible to just negate requiretty based on the path to the binary. As mentioned in this StackExchange question: How to disable requiretty for a single command in sudoers?

However keep in mind that the ordering of lines in a sudoers file is important! Quoting man sudoers from the SUDOERS FILE FORMAT section:

When multiple entries match for a user, they are applied in order. Where there are multiple matches, the last match is used (which is not necessarily the most specific match).

Why not just use ssh -t?

Personally I prefer the configuration/setting of sudo-related parameters in an /etc/sudoers.d/ file. My reasons are:

When properly configured via a sudoers file it doesn't matter if a command is called via ssh, ssh -t or any other way. Hence enhancing operational stability and making it easier for users as they don't have to remember adding the -t parameter.

And it, at least, servers as some form of documentation that this user/binary is called from another script/host/etc. giving you a clue that these sudo rights are needed/used for.

Comments

Using ping to monitor if your systems are working is wrong

Generated via Dall-E

I've seen this far too often. When monitoring a system a simple ping command is used to verify that the system is "up and running". In reality nothing could be further from the truth as this is not what you are actively checking.

Ping utilizes the Internet Control Message Protocol (ICMP) to send so-called ping-request packets to a server, which the server will answer with ping-reply packets. Giving us a method to verify that the server is actively answering our requests. But if the server is answering the packets, it doesn't mean that the server itself is in working condition. That all services are running.

And this is because of several reasons:

1. The network is reliable - mostly

In https://aphyr.com/posts/288-the-network-is-reliable Kyle Kingsbury, a.k.a "Aphyr" and Peter Bailis discuss common network fallacies, similar to the Fallacies of distributed computing by L. Peter Deutsch. As it is commonly assumed that the network "just works" and no strange things will happen. When indeed they do happen all the time.

In regard to our ICMP ping this means:

  1. There can be a firewall blocking ICMP or simply all traffic from our monitoring system
  2. Routing can be misconfigured
  3. Datacenter outages can happen
  4. Bandwidth can be drastically reduced
  5. VLANs can be misconfigured
  6. Cables can be broken
  7. Switchports can be defect
  8. Add your own ideas what can go wrong in a network

And you do want a monitoring method which allows you to reliably distinguish between network and system problems.

2. CPU Lockups

ICMP packets are answered by the kernel itself. This can have the nasty side-effect that your server literally hangs. Trapped in a state known as either Soft or Hard Lockup. And while overall they are somewhat rare - CPU Soft Lockups still do occur from time to time in my experience. Especially with earlier versions of hypervisors for virtual machines (VMs) as a CPU Soft Lockup can be triggered if there is simply too much CPU load on a system.

But the nasty side-effect of CPU Soft Lockups? The system will still reply to ICMP packets, while all other services are unreachable.

I once had problems with power management (ACPI) with a servers hardware. Somehow the ACPI kernel module would lock resources without freeing them. This effectively meant that the system came to a complete stop - but it didn't reboot or shutdown. Nor did it crash as in "Completely unreachable". No, ICMP packets were still answered quite fine.

Just no SSH connection was possible. No TCP or UDP services reachable. As the CPU was stuck at a certain operation and never switched to process other tasks.

Only disabling ACPI by adding the acpi=off parameter to the grub kernel boot command line "fixed" this.

Regarding soft lockups I can recommend reading the following:

  1. Linux Magic System Request Key Hacks: Here you learn how you can trigger a kernel panic yourself and how to configure a few things 
  2. https://www.baeldung.com/linux/terminal-kernel-panic also has a nice list of ways to trigger a kernel panic from the command line
  3. https://www.kernel.org/doc/Documentation/lockup-watchdogs.txt Kernel documentation regarding "Softlockup detector and hardlockup detector (aka nmi_watchdog)"
  4. This SuSE knowledge base article also has some good basic information on how to configure timers, etc. https://www.suse.com/support/kb/doc/?id=000018705

Takeaways

  1. ICMP is suited to check if the system is reachable in the network
    • After all ICMP is more cost-effective than TCP in terms of package size and number of packages sent
  2. A TCP connect to the port providing the used service is usually better for the reasons stated above
  3. You must incorporate your network topology in your monitoring system; only then you will be able to properly distinguish between: "System unreachable", "Misconfigured switchport" and "Service stopped responding"
    • This means switches, firewalls, routers, loadbalancers, gateways - everything your users/service depends upon to be reachable must be included in your monitoring system, and:
  4. If possible the dependencies between them should be defined
    • Like: Router → Firewall → LoadBalancer → Switch → System → Service

Conclusion

Knowing all this you should keep the following in mind: A successful ping only verifies that the system is reachable via your network. And this doesn't imply anything about the state of the OS.

Yes, this is no mind-blowing truth that I reveal here. But still I encounter monitoring setups where ICMP is used to verify that a system is "up and running" far too often.

Comments

Monitoring Teamspeak3 servers with check_teamspeak3 & Icinga2

Photo by cottonbro studio: https://www.pexels.com/photo/woman-sitting-on-the-floor-among-laptops-and-tangled-cables-and-wearing-goggles-8721343/

Just a short note as no one seems to really mention this. If you want to monitor your Teamspeak 3 server with Icinga2 (or Nagios or any other compatible monitoring system): check_teamspeak3 from xicon.eu / xiconfjs still works flawlessly despite the fact that the "Last commit" being 3 years old.

A little bit of background, or: UDP monitoring is hard

Teamspeak 3 - being a voice chat - utilizes UDP for most of it's services. Understandably as speed is key in providing enjoyable voice communications and simultaneously it can cope well with a few lost packets.

This however is the main problem in monitoring. With TCP you can open a simple TCP-Connect and if the port is open assume that your service is working. With UDP: You can't as UDP won't give you any feedback if any your packets were received as UDP in its entirety lacks the "Transmission Control" part of TCP in favour of faster packet sending/progressing. Therefore you have to send a request that provokes an reply and thus enables you to check that reply against your expected "Known good/working" reply.

This means that you must delve into the depths of a protocol in order to know what your packets must include. And this is the part where it often gets complicated, time-consuming and cumbersome. More often than not people rather went with a "Let's just check if the process is running." or "If the application opens a TCP-port too, let's just check that one." solution.

I however wanted a detailed check and check_teamspeak3 exactly does this. It connects to the voice datagram port and sends the appropriate encoded UDP packets and checks the result. Et voilà we have a monitoring check that really checks the working condition.

Thanks xiconfjs!

Alternatives

For those coming here in order to search for other ways to monitor Teamspeak 3: You can of course do one of the following:

  • Check if the Teamspeak process is running
  • Examine the list of locally opened ports and check if the ports are in listening mode (lsof, netstat, etc.)
  • Expose the serverquery port and do your checks via commands (see: https://community.teamspeak.com/t/how-to-use-the-server-query/25386)
  • The FileTransfer part of Teamspeak uses a TCP port, you could verify that this port is open with a simple check_tcp servicecheck
Comments

Icinga2 error "check command does not exist" because of missing constant

Photo by Christina Morillo: https://www.pexels.com/photo/software-engineer-standing-beside-server-racks-1181354/

Apparently this problem kept me busy far too long, as I kept looking into the Icinga2 Master logfiles only. Main due to the service definition for my icinga CheckCommand still being from a time when it was only one Master without any Agents. This lead to it being executed on the Master and hence I never saw the problems on the agent..

Additionally the cluster and cluster-health checks only check if all endpoints are connected. Which was the case all the time. Therefore I got no error there too.

But what happened?

I defined a new CheckCommand. It worked fine on the master. Then I re-rewrote the service apply-Rule so that it matches for all Linux hosts being monitored. And then I got Check command not found for all these new service checks on all agent hosts.

I deleted the API config sync directories and restarted Icinga2 on the agents to trigger a new sync:

root@agent:/etc/icinga2# rm /var/lib/icinga2/api/zones-stage/* -rf && rm /var/lib/icinga2/api/zones/* -rf
root@agent:/etc/icinga2# systemctl restart icinga2.service

And suddenly all CheckCommands which are not part of the Icinga Template Library stopped working on the agents.

Uhm, ok. At this point I suspected I had somehow messed up my /etc/icinga2/zones.conf file some time ago. Turns out, this wasn't the case.

The root cause

Some weeks ago I defined a service check which is only executed on my Icinga2 master. However I stored the CheckCommand and Service-Configuration under /etc/icinga2/zones.d/master anyway as you never know when this comes in handy. (This has since been corrected in the article.) But the Telegram API requires a Token. And I defined that in /etc/icinga2/constants.conf - but this file isn't synced as it is outside of /etc/icinga2/zones.d/master. Something which I did on purpose, as I didn't want to sync the Token to all agents.

This apparently caused the config file sync to run into an syntax error as the constant for the Token couldn't be resolved.
But again.. This was only logged in the logfiles on the agents..

root@agent:/etc/icinga2# cat /var/log/icinga2/icinga2.log
[...]
[2024-07-17 22:39:04 +0200] information/ApiListener: Received configuration for zone 'global-templates' from endpoint 'master.domain.tld'. Comparing the timestamp and checksums.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/eventcommands.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/groups.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/host-templates.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/notifications.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/service-templates.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/telegrambot-notifications.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/templates.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/timeperiods.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//_etc/users.conf' for zone 'global-templates'.
[2024-07-17 22:39:04 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/global-templates' (6688 Bytes).
[2024-07-17 22:39:04 +0200] information/ApiListener: Received configuration updates (2) from endpoint 'master.domain.tld' are different to production, triggering validation and reload.
[2024-07-17 22:39:04 +0200] critical/ApiListener: Config validation failed for staged cluster config sync in '/var/lib/icinga2/api/zones-stage/'. Aborting. Logs: '/var/lib/icinga2/api/zones-stage//startup.log'
[...]

The /var/lib/icinga2/api/zones-stage/startup.log has the details:

root@agent:/etc/icinga2# cat /var/lib/icinga2/api/zones-stage/startup.log
[2024-07-17 23:36:19 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2024-07-17 23:36:19 +0200] information/cli: Loading configuration file(s).
[2024-07-17 23:36:19 +0200] information/ConfigItem: Committing config item(s).
[2024-07-17 23:36:19 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 46:26-46:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(44):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(45):     SERVICEDISPLAYNAME = "$service.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(46):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(47):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(48):

[2024-07-17 23:36:19 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 20:26-20:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(18):     NOTIFICATIONCOMMENT = "$notification.comment$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(19):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(20):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(21):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(22):

[2024-07-17 23:36:19 +0200] critical/config: 2 errors
[2024-07-17 23:36:19 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

However... The tricky part is that a config validation will succeed!

root@agent:/etc/icinga2# icinga2 daemon -C
[2024-07-18 00:00:16 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2024-07-18 00:00:16 +0200] information/cli: Loading configuration file(s).
[2024-07-18 00:00:16 +0200] information/ConfigItem: Committing config item(s).
[2024-07-18 00:00:16 +0200] information/ApiListener: My API identity: agent.domaint.tld
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 5 Zones.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 2 Endpoints.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 235 CheckCommands.
[2024-07-18 00:00:16 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2024-07-18 00:00:16 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2024-07-18 00:00:16 +0200] information/cli: Finished validating the configuration file(s).

And this was the reason why I was too focused on the master..

What I learned later is, that you can utilize the following command to validate the configuration from the stage-dir.
Documentation for the Config Sync: Receive Config is here.

root@agent:/var/log/icinga2# icinga2 daemon -C --define System.ZonesStageVarDir=/var/lib/icinga2/api/zones-stage/
[2024-07-21 16:28:51 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2024-07-21 16:28:51 +0200] information/cli: Loading configuration file(s).
[2024-07-21 16:28:51 +0200] information/ConfigItem: Committing config item(s).
[2024-07-21 16:28:51 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 20:26-20:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(18):     NOTIFICATIONCOMMENT = "$notification.comment$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(19):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(20):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(21):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(22):

[2024-07-21 16:28:51 +0200] critical/config: Error: Error while evaluating expression: Tried to access undefined script variable 'TelegramBotToken'
Location: in /var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf: 46:26-46:41
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(44):     HOSTDISPLAYNAME = "$host.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(45):     SERVICEDISPLAYNAME = "$service.display_name$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(46):     TELEGRAM_BOT_TOKEN = TelegramBotToken
                                                                                                               ^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(47):     TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
/var/lib/icinga2/api/zones-stage//global-commands/_etc/telegrambot-commands.conf(48):

[2024-07-21 16:28:51 +0200] critical/config: 2 errors
[2024-07-21 16:28:51 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

The solution

On the master I moved the 2 files from /etc/icinga2/zones.d to /etc/icinga2/conf.d and restarted the service.

root@master:/etc/icinga2# mv /etc/icinga2/zones.d/global-commands/telegrambot-commands.conf /etc/icinga2/conf.d/
root@master:/etc/icinga2# mv /etc/icinga2/zones.d/global-templates/telegrambot-notifications.conf /etc/icinga2/conf.d/
root@master:/etc/icinga2# systemctl restart icinga2.service

On the agent a simple restart is enough:

root@agent:/etc/icinga2# systemctl restart icinga2.service

And after that everything worked again.

Another problem detected - an even deeper rooted cause

In the aftermath I was curious why & how Icinga didn't notify me that the config in the stage-dir couldn't be validated. Shouldn't there be some kind of included check for this?

Yes, turns out the built-in Icinga CheckCommand does exactly this. But it was never executed on my agent. As I still had a service definition from a time when I didn't have any agents. Initially the configuration was the following:

// Checks the agent health
apply Service "icinga" {
  import "generic-service"

  check_command = "icinga"

  assign where (host.address || host.address6) && host.vars.os == "Linux"
}

This was still a remnant of having only the Icinga Master and no agents. But this lead to it being executed on the Master. Which is... Not smart if you want to validate the configuration on the Agent.

After changing it to the following:

// Checks the agent health - must be executed on the agent
apply Service "icinga" {
  import "generic-service"

  check_command = "icinga"

  command_endpoint = host.vars.agent_endpoint

  assign where host.vars.agent_endpoint
}

The check worked as intended.

Oh, and I opened a pull request to enhance Icinga's documentation regarding the config sync: https://github.com/Icinga/icinga2/pull/10101. Let's see if it get's accepted.

Comments

Icinga2 Monitoring notifications via Telegram Bot (Part 1)

Photo by Kindel Media: https://www.pexels.com/photo/low-angle-shot-of-robot-8566526/

One thing I wanted to set up for a long time was to get my Icinga2 notifications via some of the Instant Messaging apps I have on my mobile. So there was Threema, Telegram and Whatsapp to choose from.

Well.. Threema wants money for this kind of service, Whatsapp requires a business account who must be connected with the bot. And Whatsapp Business means I have to pay again? - Don't know - didn't pursue that path any further. As this either meant I would've needed to convert my private account into a business account or get a second account. No, sorry. Not something I want.

Telegram on the other hand? "Yeah, well, message the @botfather account, type /start, type /newbot, set a display and username, get your Bot-Token (for API usage) that's it. The only requirement we have? The username must end in bot." From here it was an easy decision which app I choose. (Telegram documentation here.)

Creating your telegram bot

  1. Search for the @botfather account on Telegram; doesn't matter if you use the mobile app or do it via Telegram Web.
  2. Type /start and a help message will be displayed.
  3. To create a new bot, type: /newbot
  4. Via the question "Alright, a new bot. How are we going to call it? Please choose a name for your bot." you are asked for the display name of your bot.
    • I choose something generic like "Icinga Monitoring Notifications".
  5. Likewise the question "Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot." asks for the username.
    • Choose whatever you like.
  6. If the username is already taken Telegram will state this and simply ask for a new username until you find one which is available.
  7. In the final message you will get your token to access the HTTP API. Note this down and save it in your password manager. We will need this later for Icinga.
  8. To test everything send a message to your bot in Telegram

That's the Telegram part. Pretty easy, right?

Testing our bot from the command line

We are now able to receive (via /getUpdates) and send messages (via /sendMessage) from/to our bot. Define the token as a shell variable and execute the following curl command to get the message that was sent to your bot. Note: Only new messages are received. If you already viewed them in Telegram Web the response will be empty. As seen in the first executed curl command.

Just close Telegram Web and the App on your phone and sent a message via curl. This should do the trick. Later we define our API-Token as a constant in the Icinga2 configuration.

For better readability I pipe the output through jq.

When there is a new message from your Telegram-Account to your bot, you will see a field with the named id. Note this number down. This is the Chat-ID from your account and we need this, so that your bot can actually send you messages.

Relevant documentation links are:

user@host:~$ TOKEN="YOUR-TOKEN"
user@host:~$ curl --silent "https://api.telegram.org/bot${TOKEN}/getUpdates" | jq
{
  "ok": true,
  "result": []
}
user@host:~$ curl --silent "https://api.telegram.org/bot${TOKEN}/getUpdates" | jq
{
  "ok": true,
  "result": [
    {
      "update_id": NUMBER,
      "message": {
        "message_id": 3,
        "from": {
          "id": CHAT-ID,
          "is_bot": false,
          "first_name": "John Doe Example",
          "username": "JohnDoeExample",
          "language_code": "de"
        },
        "chat": {
          "id": CHAT-ID,
          "first_name": "John Doe Example",
          "username": "JohnDoeExample",
          "type": "private"
        },
        "date": 1694637798,
        "text": "This is a test message"
      }
    }
  ]
}

Configuring Icinga2

Now we need to integrate our bot into the Icinga2 notification process. Luckily there were many people before us doing this, so there are already some notification scripts and example configuration files on GitHub.

I choose the scripts found here: https://github.com/lazyfrosch/icinga2-telegram

As I use the distributed monitoring I store some configuration files beneath /etc/icinga2/zones.d/. If you don't use this, feel free to store those files somewhere else. However as I define the Token in /etc/icinga2/constants.conf which isn't synced via the config file sync, I have to make sure that the Notification configuration is also stored outside of /etc/icinga2/zones.d/. Else the distributed setup will fail as the config file sync throws an syntax error on all other machines due to the missing TelegramBotToken constant.

First we define the API-Token in the /etc/icinga2/constants.conf file:

user@host:/etc/icinga2$ grep -B1 TelegramBotToken constants.conf
/* Telegram Bot Token */
const TelegramBotToken = "YOUR-TOKEN-HERE"

Afterwards we download the host and service notification script into /etc/icinga2/scripts and set the executeable bit.

user@host:/etc/icinga2/scripts$ wget https://raw.githubusercontent.com/lazyfrosch/icinga2-telegram/master/telegram-host-notification.sh
user@host:/etc/icinga2/scripts$ wget https://raw.githubusercontent.com/lazyfrosch/icinga2-telegram/master/telegram-service-notification.sh
user@host:/etc/icinga2/scripts$ chmod +x telegram-host-notification.sh telegram-service-notification.sh

Based on the notifications we want to receive, we need to define the variable vars.telegram_chat_id in the appropriate user/group object(s). An example for the icingaadmin is shown below and can be found in the icinga2-example.conf on GitHub: https://github.com/lazyfrosch/icinga2-telegram/blob/master/icinga2-example.conf along with the notification commands which we are setting up after this.

user@host:~$ cat /etc/icinga2/zones.d/global-templates/users.conf
object User "icingaadmin" {
  import "generic-user"

  display_name = "Icinga 2 Admin"
  groups = [ "icingaadmins" ]

  email = "root@localhost"
  vars.telegram_chat_id = "YOUR-CHAT-ID-HERE"
}

Notifications for Host & Service

We need to define 2 new NotificationCommand objects which trigger the telegram-(host|service)-notification.sh scripts. These are stored in /etc/icinga2/conf.d/telegrambot-commands.conf.

Note: We store the NotificationCommands and Notifications in /etc/icinga2/conf.d and NOT in /etc/icinga2/zones.d/master. This is because I have only one master in my setup which sends out notifications and as we defined the constant TelegramBotToken in /etc/icinga2/constants.conf - which is not synced via the zone config sync. Therefore we would run into an syntax error on all Icinga2 agents.

See https://admin.brennt.net/icinga2-error-check-command-does-not-exist-because-of-missing-constant for details.

Of course we could also define the constant in a file under /etc/icinga2/zones.d/master but I choose not to do so for security reasons.

user@host:/etc/icinga2$ cat /etc/icinga2/conf.d/telegrambot-commands.conf
/*
 * Notification Commands for Telegram Bot
 */
object NotificationCommand "telegram-host-notification" {
  import "plugin-notification-command"

  command = [ SysconfDir + "/icinga2/scripts/telegram-host-notification.sh" ]

  env = {
    NOTIFICATIONTYPE = "$notification.type$"
    HOSTNAME = "$host.name$"
    HOSTALIAS = "$host.display_name$"
    HOSTADDRESS = "$address$"
    HOSTSTATE = "$host.state$"
    LONGDATETIME = "$icinga.long_date_time$"
    HOSTOUTPUT = "$host.output$"
    NOTIFICATIONAUTHORNAME = "$notification.author$"
    NOTIFICATIONCOMMENT = "$notification.comment$"
    HOSTDISPLAYNAME = "$host.display_name$"
    TELEGRAM_BOT_TOKEN = TelegramBotToken
    TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"

    // optional
    ICINGAWEB2_URL = "https://host.domain.tld/icingaweb2"
  }
}

object NotificationCommand "telegram-service-notification" {
  import "plugin-notification-command"

  command = [ SysconfDir + "/icinga2/scripts/telegram-service-notification.sh" ]

  env = {
    NOTIFICATIONTYPE = "$notification.type$"
    SERVICEDESC = "$service.name$"
    HOSTNAME = "$host.name$"
    HOSTALIAS = "$host.display_name$"
    HOSTADDRESS = "$address$"
    SERVICESTATE = "$service.state$"
    LONGDATETIME = "$icinga.long_date_time$"
    SERVICEOUTPUT = "$service.output$"
    NOTIFICATIONAUTHORNAME = "$notification.author$"
    NOTIFICATIONCOMMENT = "$notification.comment$"
    HOSTDISPLAYNAME = "$host.display_name$"
    SERVICEDISPLAYNAME = "$service.display_name$"
    TELEGRAM_BOT_TOKEN = TelegramBotToken
    TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"

    // optional
    ICINGAWEB2_URL = "https://host.domain.tld/icingaweb2"
  }
}

As I want to get all notifications for all hosts and services I simply apply the notification object for all hosts and services which have a set host.name. - Same as in the example.

user@host:/etc/icinga2$ cat /etc/icinga2/conf.d/telegrambot-notifications.conf
/*
 * Notifications for alerting via Telegram Bot
 */
apply Notification "telegram-icingaadmin" to Host {
  import "mail-host-notification"
  command = "telegram-host-notification"

  users = [ "icingaadmin" ]

  assign where host.name
}

apply Notification "telegram-icingaadmin" to Service {
  import "mail-service-notification"
  command = "telegram-service-notification"

  users = [ "icingaadmin" ]

  assign where host.name
}

Checking configuration

Now we check if our Icinga2 config has no errors and reload the service:

root@host:~ # icinga2 daemon -C
[2023-09-14 22:19:37 +0200] information/cli: Icinga application loader (version: r2.12.3-1)
[2023-09-14 22:19:37 +0200] information/cli: Loading configuration file(s).
[2023-09-14 22:19:37 +0200] information/ConfigItem: Committing config item(s).
[2023-09-14 22:19:37 +0200] information/ApiListener: My API identity: hostname.domain.tld
[2023-09-14 22:19:37 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[...]
[2023-09-14 22:19:37 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2023-09-14 22:19:37 +0200] information/cli: Finished validating the configuration file(s).
root@host:~ # systemctl reload icinga2.service

Verify it works

Log into your Icingaweb2 frontend, click the Notification link for a host or service and trigger a custom notification.

And we have a message in Telegram:

What about CheckMK?

If you use CheckMK see this blogpost: https://www.srcbox.net/posts/monitoring-notifications-via-telegram/

Lessons learned

And if you use this setup you will encounter one situation/question rather quickly: Do I really need to be woken up at 3am for every notification?

No. No you don't want to. Not in a professional context and even less in a private context.

Therefore: In part 2 we will register a second bot account and then use these to differentiate between important and unimportant notifications. The unimportant ones will be muted in the Telegram client on our smartphone. The important ones won't, and therefore are able to wake us at 3am in the morning.

Comments