For best results I use a out of band network device to cut power to devices and reboot them when they fail the watchdog criteria.
Normally they stop pinging or a service isn't responding after a NAGIOS plugin attempt to restart.
I would have a look at webpowerswitch.com
I use this with PCS and a GFS2 cluster for enforcing and recovering fencing. Works well.
On Sat, Jan 7, 2023 at 3:12 PM Pierre-Francois Renard <pfrenard@gmail.com> wrote:
Hello guys,
I am running 6 RPI4s with fedora 37. K3S is powering this cluster and it
is working well :)
But from time to time, 1 RPI is randomly hanging.
I am thinking about implementing a watchdog :
- software based, using embeded linux kernel
- hardware based such as https://www.omzlo.com/articles/the-piwatcher
Do you have any experience on one of theses two solutions ? Do you have
alternatives ?
By the way your job is fantastic and it is a great pleasure to be able
to run F37 on aarch64 so easily !
Thanks a lot
_______________________________________________
arm mailing list -- arm@lists.fedoraproject.org
To unsubscribe send an email to arm-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
No comments:
Post a Comment