Simplifying Raspberry Pi Cluster Management using Ansible
Managing a cluster of Raspberry Pis can either be super easy or super tedious — depending upon how you approach it. There are several tools out there you can choose from to make your life much easier. I’m talking about Ansible, Docker, Terraform, GitHub Actions, and on and on. But with so many options it’s sometimes hard to choose what suits you better and at the same time you also have to learn to use it.
The first piece of advice I want to share with you before we get along with the post is that if you’re getting lost among all the possible tools, just rather focus on the problem you’re trying to solve and not the tools you’re trying to use. That’s the main idea, which I myself failed to follow so many times because I wanted to learn a specific skill by developing something but my intuition says you need to approach it the other way around. Okay, let’s move on to the fun part.
In this post, we’ll look at the problem of identifying a dysfunctional, unresponsive Raspberry Pi in a cluster of many — as this might sound trivial it’s actually quite tricky. For that, we’ll use Ansible, since not only provides an elegant solution but is also beginner friendly and you can get going really fast with lots of resources and tutorials online.
So what’s even the problem?
About a year ago I acquired multiple Pis to build a Pi cluster and I must say it was a piece of cake to get it up and running. After some time, for many different reasons, some Pis became unresponsive such that I couldn’t SSH to it and gracefully reboot it. The most obvious solution was to unplug the power from the Pi and plug it back in. But wait, I have 8 Pis running, which one is down?
You just can’t tell based on IP in case you scan the network with nmap or by the LEDs indication — well you can if you look long enough and figure out whose green LED is not blinking for a longer period of time, but who the hell in these world has time to look at LEDs and guess which one is faulty? Definitely not me 😅
Actually, at first, I was only operating 3 Pis and I had no problem rebooting each one of them when one failed. But once I acquired 5 more and started to deploy long-running jobs on top of it, I couldn’t afford to reboot all of them, since I could corrupt the output of scripts. So that was an issue...
I had to think of a solution, but since I had other things to do in my life than taking care of my Raspberry Pi Cluster, it needed to be smooth, easily usable, and obviously shouldn’t take me a long time to develop it, because then I might as well just buy UTP cables of different cables and set the Pi hostname according to the colors.
Solution to the mystery
After some searching through the web and the Raspberry Pi documentation, I fairly quickly figured out I could turn ON and OFF the green LED that is beside the red Power LED on the Pis to figure out which Pis are “alive” and just reboot the rest. If you have a terminal console open on the Raspberry Pi it’s as simple as executing:
sudo sh -c “echo 1 > /sys/class/leds/led0/brightness”
This worked, but ughh I had to SSH to each accessible Pi and repeat the command and we DevOps engineers don’t like to repeat ourselves — and I’m lazy as well. Can we do it more elegantly? Here comes the Ansible to the rescue!
A better — DevOps alike solution!
I don’t want to go into much detail about Ansible — there are many great articles on the web, but basically one of its features is enabling users to run the same script on multiple machines at once using a single command. If you need an introductory tutorial on Ansible, take a look at this amazing blog post Working with Ansible Playbooks — Tips & Tricks with Examples written by Ioannis from Spacelift — I really like the structure and the notion of the post, since it can be easily understandable even if you’re a beginner.
Now that you probably already got a feel for Ansible, here’s a short bash script I came up with that basically just make the green LED blink 5 times:
#!/bin/bash
if [[ $EUID -ne 0 ]]; then
echo “This script requires root, run it again with sudo” exit 1
fi
sudo sh -c “echo none > /sys/class/leds/led0/trigger”
for i in 1 2 3 4 5
do
sudo sh -c “echo 1 > /sys/class/leds/led0/brightness”
sleep 1
sudo sh -c “echo 0 > /sys/class/leds/led0/brightness”
sleep 1
done
In general, I think you should be good with any language Ansible supports. This script is then integrated into the Ansible setup as follows:
- name: Aliveness Check
hosts: cluster
become: yes
tasks:
— name: Blink if alive
script: blink.sh
We can run it from our PC that has access to the cluster using the command:
ansible-playbook playbook.yml -i inventory.yml
Where the playbook.yml
file (script just above) defines the task to be run on the nodes, while the inventory.yml
bundles the nodes in a structured way. I haven’t mentioned inventory.yml
since it is rather specific how you have your cluster set up on your local or public network but you can find it in the repository to get a feel for how I grouped my cluster. And that’s basically it! Nothing else — just a couple of lines of code and we solve the problem. Sweeeet!
The complete source code can be found under AlivenessCheck
directory among other Ansible configurations you might like — all available on my GitHub repository:
Conclusion
Last but not least, some people that have thought about this issue before purchasing the cluster, either acquired UTP cables of different colors and match them with the hostname or something similar (and you maybe should too), to be able to distinguish between the Pis, but if you’re in a situation as I am, feel free to use the scripts.
Thanks for reading! 😎 If you enjoyed this article, hit that clap button below 👏
Would mean a lot to me and it helps other people see the story. Say Hello on Linkedin | Twitter
Do you want to start reading exclusive stories on Medium? Use this referral link 🔗
If you liked my post you can buy me a Hot dog 🌭
Are you an enthusiastic Engineer, who lacks the ability to compile compelling and inspiring technical content about it? Hire me on Upwork 🛠️
Checkout the rest of my content on Teodor J. Podobnik, @dorkamotorka and follow me for more, cheers!