Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1965979

Summary: After vdsm stop hooks are not executed due to soft fencing
Product: [oVirt] vdsm Reporter: Petr Matyáš <pmatyas>
Component: CoreAssignee: Marcin Sobczyk <msobczyk>
Status: CLOSED DEFERRED QA Contact: Guilherme Santos <gdeolive>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.40.60.7CC: bugs, lsvaty, mperina
Target Milestone: ---Keywords: Automation, Regression
Target Release: ---Flags: lsvaty: blocker-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-01 11:05:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Petr Matyáš 2021-05-31 10:00:39 UTC
Description of problem:
Having some trivial hooks in after_vdsm_stop folder and stopping the vdsm service is not executing the hooks even though they are executable due to host being automatically activated right after with soft fencing.
If I put to host to maintenance the hooks are executed.

Version-Release number of selected component (if applicable):
vdsm-4.40.60.7-1.el8ev.x86_64

How reproducible:
always

Steps to Reproduce:
1. echo 'touch /tmp/after' >/usr/libexec/vdsm/hooks/after_vdsm_stop/touching
2. chmod +x /usr/libexec/vdsm/hooks/after_vdsm_stop/touching
3. systemctl stop vdsmd

Actual results:
if the host is up in RHV the hooks are not executed

Expected results:
after vdsm stop hooks should be always executed

Additional info:

Comment 2 Petr Matyáš 2021-05-31 10:13:37 UTC
scripts in before_vdsm_start are executed even with this soft fencing so script in after_vdsm_stop should be executed in the same way

Comment 3 Martin Perina 2021-06-01 06:35:29 UTC
(In reply to Petr Matyáš from comment #0)
> Description of problem:
> Having some trivial hooks in after_vdsm_stop folder and stopping the vdsm
> service is not executing the hooks even though they are executable due to
> host being automatically activated right after with soft fencing.
> If I put to host to maintenance the hooks are executed.
> 
> Version-Release number of selected component (if applicable):
> vdsm-4.40.60.7-1.el8ev.x86_64
> 
> How reproducible:
> always
> 
> Steps to Reproduce:
> 1. echo 'touch /tmp/after' >/usr/libexec/vdsm/hooks/after_vdsm_stop/touching
> 2. chmod +x /usr/libexec/vdsm/hooks/after_vdsm_stop/touching
> 3. systemctl stop vdsmd

Hmm, how those hooks could be loaded into VDSM, when VDSM is stopped right after their creation? Shouldn't the correct flow contain additional VDSM restart step?

1. echo 'touch /tmp/after' >/usr/libexec/vdsm/hooks/after_vdsm_stop/touching
2. chmod +x /usr/libexec/vdsm/hooks/after_vdsm_stop/touching
3. systemctl restart vdsmd
4. systemctl stop vdsmd

Comment 4 Petr Matyáš 2021-06-01 06:56:41 UTC
If you cared to open the log you would see the hook was loaded on multiple occasions, I tried to stop (without changing the script) multiple times (which caused restart on most cases) with different setting in RHV for the host.

Comment 7 Marcin Sobczyk 2021-06-17 09:57:05 UTC
This is effectively happening because of [1]. There was a bug in systemd [2] that could cause real troubles [3]
and an unfortunate side effect of the fix was that 'after_vdsm_stop' hooks are now working less reliably. Since the bug in systemd
got fixed, I'm discussing with vdsm maintainers the possibility of reverting [1]. If we decide to do so, I posted
a patch for OST to check if 'after_vdsm_hooks' work reliably [4].

[1] https://github.com/oVirt/vdsm/commit/f13aa4fe12602777938bf5d36b977ad19053f745
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1761260
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1759388
[4] https://gerrit.ovirt.org/#/c/ovirt-system-tests/+/115299/

Comment 8 Martin Perina 2022-02-01 11:05:45 UTC
Implementing this requires very complicated changes and tests, due resource limitations closing as deferred