Description of problem: Having some trivial hooks in after_vdsm_stop folder and stopping the vdsm service is not executing the hooks even though they are executable due to host being automatically activated right after with soft fencing. If I put to host to maintenance the hooks are executed. Version-Release number of selected component (if applicable): vdsm-4.40.60.7-1.el8ev.x86_64 How reproducible: always Steps to Reproduce: 1. echo 'touch /tmp/after' >/usr/libexec/vdsm/hooks/after_vdsm_stop/touching 2. chmod +x /usr/libexec/vdsm/hooks/after_vdsm_stop/touching 3. systemctl stop vdsmd Actual results: if the host is up in RHV the hooks are not executed Expected results: after vdsm stop hooks should be always executed Additional info:
scripts in before_vdsm_start are executed even with this soft fencing so script in after_vdsm_stop should be executed in the same way
(In reply to Petr Matyáš from comment #0) > Description of problem: > Having some trivial hooks in after_vdsm_stop folder and stopping the vdsm > service is not executing the hooks even though they are executable due to > host being automatically activated right after with soft fencing. > If I put to host to maintenance the hooks are executed. > > Version-Release number of selected component (if applicable): > vdsm-4.40.60.7-1.el8ev.x86_64 > > How reproducible: > always > > Steps to Reproduce: > 1. echo 'touch /tmp/after' >/usr/libexec/vdsm/hooks/after_vdsm_stop/touching > 2. chmod +x /usr/libexec/vdsm/hooks/after_vdsm_stop/touching > 3. systemctl stop vdsmd Hmm, how those hooks could be loaded into VDSM, when VDSM is stopped right after their creation? Shouldn't the correct flow contain additional VDSM restart step? 1. echo 'touch /tmp/after' >/usr/libexec/vdsm/hooks/after_vdsm_stop/touching 2. chmod +x /usr/libexec/vdsm/hooks/after_vdsm_stop/touching 3. systemctl restart vdsmd 4. systemctl stop vdsmd
If you cared to open the log you would see the hook was loaded on multiple occasions, I tried to stop (without changing the script) multiple times (which caused restart on most cases) with different setting in RHV for the host.
This is effectively happening because of [1]. There was a bug in systemd [2] that could cause real troubles [3] and an unfortunate side effect of the fix was that 'after_vdsm_stop' hooks are now working less reliably. Since the bug in systemd got fixed, I'm discussing with vdsm maintainers the possibility of reverting [1]. If we decide to do so, I posted a patch for OST to check if 'after_vdsm_hooks' work reliably [4]. [1] https://github.com/oVirt/vdsm/commit/f13aa4fe12602777938bf5d36b977ad19053f745 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1761260 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1759388 [4] https://gerrit.ovirt.org/#/c/ovirt-system-tests/+/115299/
Implementing this requires very complicated changes and tests, due resource limitations closing as deferred