Feuerfest

Just the private blog of a Linux sysadmin

Using nebula-sync to synchronize your Pi-hole instances

As I have 2 running Pi-hole instances I have the problem of keeping them in sync. I do want all local DNS entries to be present on both. Also I do want the same filter lists, exceptions, etc.

Entering: nebula-sync. After Pi-hole released version 6 and made huge changes towards the architecture & API, the often used gravity-sync stopped working and was archived.

As nebula-sync can be run as a docker image the setup is fairly easy.

I just logged into my Portainer instance and spun up a new stack with the following YAML. I disable the gravity run after nebula-sync as this currently breaks the replica. The webserver process is killed and doesn't come back. Hence no connections to port 80 (HTTP) or 443 (HTTPS) are possible and therefore the API can't be reached either.

This is currently investigated/worked on in the following issue: Pi-hole FTL issue #2395: FTL takes 5 minutes to reboot?

---
services:
  nebula-sync:
    image: ghcr.io/lovelaze/nebula-sync:latest
    container_name: nebula-sync
    environment:
    - PRIMARY=http://ip1.ip1.ip1.ip1|password1
    - REPLICAS=http://ip2.ip2.ip2.ip2|password2
    - FULL_SYNC=true
    - RUN_GRAVITY=false   # running Gravity after sync breaks the replica, see: https://github.com/pi-hole/FTL/issues/2395
    - CRON=*/15 * * * *
    - TZ=Europe/Berlin

And if everything works we see the following:

2025-05-26T16:01:24Z INF Starting nebula-sync v0.11.0
2025-05-26T16:01:24Z INF Running sync mode=full replicas=1
2025-05-26T16:01:24Z INF Authenticating clients...
2025-05-26T16:01:25Z INF Syncing teleporters...
2025-05-26T16:01:25Z INF Syncing configs...
2025-05-26T16:01:25Z INF Running gravity...
2025-05-26T16:01:25Z INF Invalidating sessions...
2025-05-26T16:01:25Z INF Sync completed

When I did have RUN_GRAVITY=true in my stack I would always see the first sync succeeding. This run would however kill the webserver - hence the API isn't reachable any more and the nebula-sync container would only log the following error message:

2025-05-26T16:34:29Z INF Starting nebula-sync v0.11.0
2025-05-26T16:34:29Z INF Running sync mode=full replicas=1
2025-05-26T16:34:29Z INF Authenticating clients...
2025-05-26T16:34:31Z INF Syncing teleporters...
2025-05-26T16:34:31Z INF Syncing configs...
2025-05-26T16:34:31Z INF Running gravity...
2025-05-26T16:34:31Z INF Invalidating sessions...
2025-05-26T16:34:31Z INF Sync completed

2025-05-26T16:35:00Z INF Running sync mode=full replicas=1
2025-05-26T16:35:00Z INF Authenticating clients...
2025-05-26T16:35:02Z INF Invalidating sessions...
2025-05-26T16:35:04Z WRN Failed to invalidate session for target: http://ip2.ip2.ip2.ip2
2025-05-26T16:35:04Z ERR Sync failed error="authenticate: http://ip2.ip2.ip2.ip2/api/auth: Post \"http://ip2.ip2.ip2.ip2/api/auth\": dial tcp ip2.ip2.ip2.ip2:80: connect: connection refused"

Comments

Installing Unbound as recursive DNS server on my PiHole

I run a Pi-hole installation on each of my Raspberry 3 & 4. As I do like to keep my DNS queries as much under my control as I can, I also installed Unbound to serve as recursive DNS server. This way all DNS queries will be handled by my Raspberry Pis.

Pi-hole is already installed using one of the following methods: https://github.com/pi-hole/pi-hole/#one-step-automated-install. If you don't have that done yet, do it first.

There is a good guide at the Pi-hole website which I will basically following.

https://docs.pi-hole.net/guides/dns/unbound/

root@host:~# apt install unbound

Regarding the configuration file I go with the one in the guide. However as I did have some problems in that past I needed to troubleshoot I include the following lines regarding loglevels and verbosity:

root@host:~# head /etc/unbound/unbound.conf.d/pihole.conf
server:
    # If no logfile is specified, syslog is used
    logfile: "/var/log/unbound/unbound.log"
    val-log-level: 2
    # Default is 1
    #verbosity: 4
    verbosity: 1

    interface: 127.0.0.1
    port: 5335
root@host:~# 

You can add that if you want but it's not needed to make Unbound work.

Next the guide tells us to download the root hints. A file maintained by Internic which contains information about the 13 DNS root name servers. Under Debian we don't need to download the named.root file from Internic as shown in the guide. Debian has its own package for that: dns-root-data.

It no only contains information about the 13 DNS root name servers but also the needed DNSSEC keys (also called root trust anchors). And together with unattended-upgrades we even automate updating that. Saving us the creation of a Cronjob or systemd timer.

root@host:~# apt install dns-root-data

In order for Unbound to have a directory and logfile to write into we need to create that:

root@host:~# mkdir -p /var/log/unbound
root@host:~# touch /var/log/unbound/unbound.log
root@host:~# chown unbound /var/log/unbound/unbound.log

As we are running under Debian we now need to tweak the Unbound config a little bit. Else we will get problems with DNSSEC. For this we are deleting a Debian generated file from Unbound and comment out the unbound_conf= line in /etc/resolvconf.conf so that it isn't included anymore.

root@host:~# sed -Ei 's/^unbound_conf=/#unbound_conf=/' /etc/resolvconf.conf
root@host:~# rm /etc/unbound/unbound.conf.d/resolvconf_resolvers.conf

Now all that is left is restarting Unbound.

root@host:~# systemctl restart unbound.service

Testing DNS resolution:

root@host:~# dig pi-hole.net @127.0.0.1 -p 5335

; <<>> DiG 9.18.33-1~deb12u2-Raspbian <<>> pi-hole.net @127.0.0.1 -p 5335
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46191
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;pi-hole.net.                   IN      A

;; ANSWER SECTION:
pi-hole.net.            300     IN      A       3.18.136.52

;; Query time: 169 msec
;; SERVER: 127.0.0.1#5335(127.0.0.1) (UDP)
;; WHEN: Sun May 25 18:21:25 CEST 2025
;; MSG SIZE  rcvd: 56

And to verify & falsify DNSSEC. This request must return an A-Record for dnssec.works.

root@host:~# dig dnssec.works @127.0.0.1 -p 5335

; <<>> DiG 9.18.33-1~deb12u2-Raspbian <<>> dnssec.works @127.0.0.1 -p 5335
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14076
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;dnssec.works.                  IN      A

;; ANSWER SECTION:
dnssec.works.           3600    IN      A       46.23.92.212

;; Query time: 49 msec
;; SERVER: 127.0.0.1#5335(127.0.0.1) (UDP)
;; WHEN: Sun May 25 18:22:52 CEST 2025
;; MSG SIZE  rcvd: 57

This request will not result in an A-Record.

root@host:~# dig fail01.dnssec.works @127.0.0.1 -p 5335

; <<>> DiG 9.18.33-1~deb12u2-Raspbian <<>> fail01.dnssec.works @127.0.0.1 -p 5335
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1552
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;fail01.dnssec.works.           IN      A

;; Query time: 19 msec
;; SERVER: 127.0.0.1#5335(127.0.0.1) (UDP)
;; WHEN: Sun May 25 18:23:41 CEST 2025
;; MSG SIZE  rcvd: 48

Now all that is left to connect our Pi-hole with Unbound. Logon to your Pi-hole website and navigate to Settings -> DNS. Expand the line Custom DNS servers and enter to IP and Port to our Unbound server. 127.0.0.1#5335 for IPv4 and ::1#5335 for IPv6. If you don't use one of these two just don't add the line. After that hit "Save & Apply" and we are done.

Creating a logrotate config for Unbound

Sadly Unbound still doesn't deliver a logrotate config with its package. Therefore I just copy & paste from my previous article Howto properly split all logfile content based on timestamps - and realizing my own fallacy.

root@host:~# cat /etc/logrotate.d/unbound
/var/log/unbound/unbound.log {
        monthly
        missingok
        rotate 12
        compress
        delaycompress
        notifempty
        sharedscripts
        create 644
        postrotate
                /usr/sbin/unbound-control log_reopen
        endscript
}

Troubleshooting

fail01.dnssec.works timed out

The host fail01.dnssec.works tends to not answer requests sometimes. Others noticed this too. dig will only show the following message:

root@host:~# dig fail01.dnssec.works @127.0.0.1 -p 5335
;; communications error to 127.0.0.1#5335: timed out
;; communications error to 127.0.0.1#5335: timed out
;; communications error to 127.0.0.1#5335: timed out

; <<>> DiG 9.18.33-1~deb12u2-Raspbian <<>> fail01.dnssec.works @127.0.0.1 -p 5335
;; global options: +cmd
;; no servers could be reached

If that is the case, just execute the command again. Usually it will work the second time. Or just wait a few minutes. Sometimes the line ;; communications error to 127.0.0.1#5335: timed out will be printed, but the dig query will work after that nonetheless.

Comments

Better SQL*Plus output

Oracle's sqlplus tool is horrible to work with as it lacks many basic command line features. No tab-completition, no command history scrolling with Arrow-Up/-Down or the like and additionally the output formatting is near unreadable.

Hence I was delighted when I found out that command set markup csv on will switch to CSV output formatting.

Enable it the following way:

oracle@host:~$ sqlplus user/password @iXXXc1
SQL> set markup csv on
Comments

Working in IT is sometimes like this...

Task: Upgrade a custom application on a customer system

Technical details: This involves executing some script beforehand, than a Puppet Run and some menial afterwork/clean up tasks.

  1. Prepare everything. Shutdown system. Take a Snapshot.
  2. Executing the script: Error! I can't find this dependency!
    • Oh, yeah. That package requiring the dependency isn't needed here. Commenting out that package from the list.
  3. Executing script again. It finishes properly. Fine.
  4. Running Puppet agent.
    • Error after 45seconds: Left over process found. Aborting.
    • Checking Relases Notes: Ah, known bug. ps -ef |process and kill pid is the workaround ok.
    • ps'ing and kill'ing.
  5. Run puppet agent -t again.
  6. It runs for 30min, then aborts. A partition has not enough free space. Narf.
    • Searching for stuff to delete/compress/move.
    • Start Puppet again to verify it works. It does.
  7. Revert system to snapshot as this is policy.
  8. Shutdown. Snapshot. So I have a clean start with all known error sources eliminated.
  9. Puppet agent run.
  10. It runs for 2.5 hours. Then breaks with Execution expired and some Ruby stacktrace.
    • Checking Release Notes again. Nothing. Hmm ok.
    • Reading code to understand the problem. No real insight gained. It SHOULD work.
  11. Reverting to snapshot. Now taking a 2nd snapshot right before the Puppet run.
  12. 3rd Puppet agent run. Same error after 2.5 hours.
    • Oh come on..
  13. Reverting to snapshot.
  14. Installing the RPM which installation triggers the stacktrace manually. It works. Package install takes 45min.
  15. Reverting to snapshot.
  16. Just for the sake of trying I start a 4th Puppet run. Expecting no change.
  17. After 55min of Puppet running I send a status mail to project lead, some other involved people, application developer, etc. that I couldn't update the system in time today and will need help and an additional 1-2 hours tomorrow.
  18. 5min after sending the mail: Hey, it's me! Puppet! I finished. No errors!
  19. I send out a follow-up mail, stating that the biggest part of the update is done. But due to the missed time window I will still need 1-2 hours tomorrow for the afterwork/clean up tasks.

And I still don't where or what the error was! Due to Execution expired I suspect that somewhere deep in the Ruby code something timed out. Maybe something which isn't documented nor written in any logfile. Something which had a maintenance window at exactly the same hours as we had.

Hopefully the developer knows more.

Sometimes IT sucks.😂

Comments

You want my opinion? Then follow my blog, my Fediverse account. And NOT my LinkedIn profile!

Today I read a post on LinkedIn (in german) who was criticising that too few IT people speak about their opinions, their views, their stances on various topics. Who was sad that so much Know-how and stories from real projects were left untold.

And I must admit that, at first, I didn't understand what he meant. I was all like:

"What is he talking about? I read a dozens or so post each day from (former) colleagues, people I met on various IT or CCC events or whom I just happen to follow which are highly political. Or taking a technical deep dive on some obscure technology (sometimes even from the 1970s) in which they have taken an interest. I don't get his point."

Only then I happen to realize: Did he solely refer to LinkedIn?

And that explained a lot. Regerettably I must admit: Instantly I had severel prejuicides against that person.

This lead to my writing the following two comments (which I translated into english):

    I have to disagree. The IT bubble, at least the one I'm in, is very political. But not on LinkedIn, because that's completely the wrong place for many people. I also had this experience myself when I was researching the status of various alternatives to Google apps for Android mobile phones and noticed how many Russian open source developers have suddenly gone completely silent in the last 3-4 years. Before, the GitHub activity bars were a bright, lush green. Regular posts in the XDA Developers forum, etc. and then, from one day to the next: Nothing. Silence. Completely.

    I wrote about it here on LinkedIn. Made a reference to the Ukraine war and Russian recruitment for their war of aggression. And yes, I also got carried away with a ‘4-letter-word Putin!’. (Note: I meant "Fuck Putin!" which I obviously couldn't write again on LinkedIn.)

    The result? Less than 10 minutes later, the post was set to invisible. I objected twice, but each time I was automatically rejected by the system.

    So sorry, but if you want to see the politically active IT scene: Get out. Get out of the silos of a Silicon Valley tech company. Into the private blogs, into the Fediverse.

    I also have a private blog. I'm generally not very reserved with my private opinions. And fortunately, I've received much more praise and recognition for this than criticism.

    But not everyone has this luxury. Many companies have very strict social media guidelines on how I am allowed to appear as an employee of a company. And there are judgements that say that simply mentioning your employer in your profile means that these rules apply and you have to comply with them, as you are no longer 100% private.

    And some people simply don't want to express their political opinions under their real name. Be it for protection or because they value privacy.

    In addition, many IT professionals have a very well-founded (and justified) aversion to so-called social networks. Their content is too soft. Too auto-moderated. Too undemocratic when it comes to appeals against content offences, TOS, etc.

    Plus the fact that many networks like LinkedIn are perceived as nothing more than business horseshow (which, unfortunately, it often is).

    Yes, LinkedIn is not a good platform if you are interested in IT. For that I recommend the various blogs, Fediverse accounts, some forums (if they are still alive), Newsgroups (if they are still alive) and IRC channels (if they are.. You get the gist..).

    Why?

    1. IT people are technical, they dislike business mumbo-jumbo and attention farming. They want to write a text about something they care about and discuss it with like-minded individuals. LinkedIn is the completely wrong platform for that.
      • Also, technical content on LinkedIn gets far fewer views, interactions, etc. Believe me, I can see the views here in on my blog and for my content on LinkedIn.
      • Well, yeah, because IT people usually don't like automated feeds that they can't configure. It's much more enjoyable to look in your RSS reader and see which one of the blogs you follow has posted something new.
      • Would you write huge technical texts on LinkedIn knowing that your audience is on a completely different platform? You wouldn't. That's why I blog here and just do a post on LinkedIn to redirect a few people from LinkedIn towards my blog.
    2. IT people are generally very prejudiced towards social networks operated by commercial companies. They are well aware of the downsides of Facebook/Meta, Twitter/X, Google Plus, Instagram, TikTok and yes - also LinkedIn. That these networks are about as far away from a democratic platform as you can get. That moderation is largely automated with no chance of "getting a human on the line". Why should we make ourselves dependent on that?
      • LinkedIn doesn't even provide a button to copy the link from a post into your clipboard! All actions are aimed at keeping people on LinkedIn. LinkedIn/Microsoft don't want you to drive people away from LinkedIn! And IT people seriously hate silos and gatekeeping.
    3. Many companies have social media guidelines that make it impossible to engage in meaningful conversations about sensitive topics.
    4. IT people value their privacy. If they have a LinkedIn account, it shows their real name and employer. And maybe they don't want a connection between the two.
    5. Some may see blogging as a hobby that they do in their spare time, so they stay away from LinkedIn.

    And there are maybe more points I forgot to mention..

    Comments

    The 11th commandment: Thou shalt not copy from the artificious intellect without understanding

    It happened again. Someone asked an LLM a benign technical question. "How to check the speed of a hard drive?"

    And the LLM answered!

    Only the human wasn't clever enough to understand the answer. Nor was it particularly careful in formulating the question. Certain details were left out, critical information was not recognised as such. The human was too greedy to acquire this long-sought knowledge.

    But... Is an answer to a thoughtlessly asked question a reliable & helpful one?

    The human didn't care. He copy & pasted the command directly into the root shell of his machine.

    The sacred machine spirit awakened to life and fulfilled its divine duty.

    And... It actually gave the human the answer he was looking for. Only later did the human learn that the machine had given him even more. The machine spirit answered the question, "How quickly can the hard drive overwrite all my cherished memories from 20 years ago that I never backed up?"

    TL;DR: r/selfhosted: TIFU by copypasting code from AI. Lost 20 years of memories

    People! Make backups! Never, ever work on your only, single, lifetime copy of your data while executing potentially harmful commands. Jesus! Why do so many people fail to grasp this? I don't get it..

    And as for AI tools like ChatGPT, DeepSeek & others: Yes, they can be great & useful. But they don't understand. They have no sentience. They can't comprehend. They have no understanding of syntax or semantics. And therefore they can't check if the two match. ChatGPT won't notice that the answer it gives you doesn't match the question. Hell, there are enough people out there who won't notice. YOU have to think for the LLM! YOU have to give the full, complete context. Anything left out will not be considered in the answer.

    In addition: Pick a topic you're well-versed in. Do you know how much just plain wrong stuff there is on the internet about that subject? Exactly.

    Now ask yourself: Why should it be any different in a field you know nothing about?

    All the AI companies have just pirated the whole internet. Copied & absorbed what they could in the vague hope of getting that sweet, sweet venture capital money. Every technically incorrect solution. Every "easy fix" that says "just do a chmod -R 777 * and it works".

    And you just copy and paste that into your terminal?

    If an LLM gives you an answer you do the following:

    1. You ask the LLM to check if the answer is correct. Oh, yes. You will be surprised how often an LLM will correct itself.
      • And we haven't even touched the topic of hallucinations...
    2. Then you find the documentation/manpage for that command
    3. You read said documentation/manpage(s) until you understand what the code/command does. How it works
    4. Now you may execute that command

    Sounds like too much work? Yeah.. About that...

    Comments