Bug 1222564
Summary: | regression for EL7: spmprotect always reboot when fencing vdsm on systemd | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Yaniv Bronhaim <ybronhei> | |
Component: | vdsm | Assignee: | Nir Soffer <nsoffer> | |
Status: | CLOSED ERRATA | QA Contact: | Kevin Alon Goldblatt <kgoldbla> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 3.5.0 | CC: | amureini, bazulay, gklein, lpeer, lsurette, nsoffer, tnisan, yeylon, ykaul, ylavi | |
Target Milestone: | ovirt-3.6.0-rc | Keywords: | Regression, ZStream | |
Target Release: | 3.6.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1265177 (view as bug list) | Environment: | ||
Last Closed: | 2016-03-09 19:40:05 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1265177 |
Description
Yaniv Bronhaim
2015-05-18 14:22:45 UTC
This is indeed a regression, but not this is not 3.5 regression. This is broken since we support systemd on Fedora and EL 7. This effects only storage domain v1, which is still supported for backward compatibility. (In reply to Nir Soffer from comment #4) > This effects only storage domain v1, which is still supported for backward > compatibility. Yaniv - Given this analysis, I'm fine with pushing it off to 3.5.5. Having a 3.0 DC in with a RHEV>=3.5 host isn't really interesting, IMHO. (In reply to Allon Mureinik from comment #5) > (In reply to Nir Soffer from comment #4) > > This effects only storage domain v1, which is still supported for backward > > compatibility. > Yaniv - Given this analysis, I'm fine with pushing it off to 3.5.5. > Having a 3.0 DC in with a RHEV>=3.5 host isn't really interesting, IMHO. done. Happy to hear soft fencing works on 3.5. Patch is ready - Nir please help to verify or state how it can be verified properly... This is a storage bug that needs to be tested with storage functional flows. Dima might be able to help as he did with the patch after nir requested for assistance, but moving on with verifying it should be handled by storage. I would use the testing env for that which found it at first, we can't certainly ack it as we are not familiar with the full usage of spmprotect script (In reply to Yaniv Bronhaim from comment #8) > This is a storage bug that needs to be tested with storage functional flows. Yaniv, I'm verifying the attached patch, no action needed on your side. Because spmprotect fail to get vdsm pid, it fail to do a clean shutdown of the spm, and then fail to terminate and kill vdsm. Finally, it reboot the machine. This effects only legacy dc using storage domain format 3.0. How to verify this: 1. create data center v 3.0 2. create cluster v 3.4 3. add host 4. create storage domain v1 5. wait until spm is up 6. check now maintenance/activation (no regression) 7. block access to storage 8. watch vdsm being killed without reboot (previously host would reboot after this) 9. unblock access to storage 10. watch vdsm become spm again (no regression) Repeat for nfs and iscsi (cannot mix domains in dc in this version). (In reply to Nir Soffer from comment #11) > How to verify this: > > 1. create data center v 3.0 > 2. create cluster v 3.4 > 3. add host > 4. create storage domain v1 > 5. wait until spm is up > 6. check now maintenance/activation (no regression) > 7. block access to storage > 8. watch vdsm being killed without reboot (previously host would reboot > after this) > 9. unblock access to storage > 10. watch vdsm become spm again (no regression) > > Repeat for nfs and iscsi (cannot mix domains in dc in this version). Please clarify: ------------------------------------ > 1. create data center v 3.0 > 2. create cluster v 3.4 > 3. add host > 4. create storage domain v1 > 5. wait until spm is up ( (wouln't the spm be up before you could create a SD in step 4? Should this step be before step 3?)) > 6. check now maintenance/activation (no regression) ((Check for setting the host into maintenance and the reactivating?) > 7. block access to storage > 8. watch vdsm being killed without reboot (previously host would reboot > after this) ((How long do we need to wait till this happens? > 9. unblock access to storage > 10. watch vdsm become spm again (no regression) > > Repeat for nfs and iscsi (cannot mix domains in dc in this version). (In reply to Kevin Alon Goldblatt from comment #14) > (In reply to Nir Soffer from comment #11) > > How to verify this: > > > > 1. create data center v 3.0 > > 2. create cluster v 3.4 > > 3. add host > > 4. create storage domain v1 > > 5. wait until spm is up > > 6. check now maintenance/activation (no regression) > > 7. block access to storage > > 8. watch vdsm being killed without reboot (previously host would reboot > > after this) > > 9. unblock access to storage > > 10. watch vdsm become spm again (no regression) > > > > Repeat for nfs and iscsi (cannot mix domains in dc in this version). > > Please clarify: > ------------------------------------ > > 1. create data center v 3.0 > > 2. create cluster v 3.4 > > 3. add host > > 4. create storage domain v1 > > > 5. wait until spm is up ( > (wouln't the spm be up before you could create a SD in step 4? Should this > step be before step 3?)) No, until you have storage, the host is up, but it is not spm. > > 6. check now maintenance/activation (no regression) > ((Check for setting the host into maintenance and the reactivating?) > > > 7. block access to storage > > 8. watch vdsm being killed without reboot (previously host would reboot > > after this) > ((How long do we need to wait till this happens? Vdsm will be killed in about 20 seconds after safelease fail to renew the lease. See the patch commit message, it describe the flow precisely. Verified using the following code: ----------------------------------- vdsm-4.17.10.1-0.el7ev.noarch rhevm-3.6.0.3-0.1.el6.noarch Verified using the following scenario: --------------------------------------- Steps to reproduce: 1.Create a DC with V3.0 2.Create a Cluster with V3.4 3.Add host 4.Create a SD (scsci/nfs) 5.Wait Till host becomes SPM 6.Verify that the host can be put into maintenance and then activated successfully 7.Block access to the storage using iptables 8.Verify that VDSM is killed and that the host is not rebooted 9.Unblock access to the host 10.Verify that the host becomes SP< again Moving to VERIFY! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html |