Bug 1522916
| Summary: | [RFE] Use systemd to monitor vdsm | ||
|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Piotr Kliczewski <pkliczew> |
| Component: | Core | Assignee: | Marcin Sobczyk <msobczyk> |
| Status: | CLOSED DEFERRED | QA Contact: | Lukas Svaty <lsvaty> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.20.15 | CC: | bugs, gveitmic, mgoldboi, mperina, msobczyk, nsoffer, pkliczew |
| Target Milestone: | --- | Keywords: | FutureFeature |
| Target Release: | --- | Flags: | mperina:
ovirt-4.5?
rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack? |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-01 14:48:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Piotr Kliczewski
2017-12-06 17:50:46 UTC
Adding more info from the discussion on vdsm call. Systemd notify provides two important mechanisms that we would like to use: - startup completion detection: vdsmd will notify systemd when it has started and ready to accept requests. This will help other services (e.g. mom, hosted engine agent)to communicate with vdsmd without need to handle "connection refused" errors. - watching vdsmd hangs: vdsmd will notify systemd watchdog periodically. If vdsmd stops notifying the watchdog because of a deadlock or complete process hangup, or some other critical error, systemd will restart vdsmd. If vdsmd is blocked in D state and cannot be restarted, we will have logs about it in the journal. General solution: 1. Use Type=notify in vdsmd.service (READY=1) 2. Notify systemd via systemd.notify python module after vdsmd has started to listen on the vdsmd port. 3. Specify WatchdogSec in vdsmd.service 4. Add a health thread, checking vdsm subsystems periodically. If all subsystems are healthy, notify systemd watchdog using systemd.noitify python module (WATCHDOG=1). If one of the subsystems is considered as not-healthy, avoid notifying systemd, triggering a vdsm restart. Related docs: - https://www.freedesktop.org/software/systemd/man/systemd.service.html#Type= - https://www.freedesktop.org/software/systemd/man/systemd.service.html#WatchdogSec= - http://man7.org/linux/man-pages/man3/sd_notify.3.html Please also consider watching supervdsmd as suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1666123#c23. And maybe other host deamons like ovirt-ha. We didn't get to this bug for more than 2 years, and it's not being considered for the upcoming 4.4. It's unlikely that it will ever be addressed so I'm suggesting to close it. If you feel this needs to be addressed and want to work on it please remove cond nack and target accordingly. Closing old bug. Please reopen if still relevant/you want to work on it. |