Bug 1415740
Summary: | Soft fencing causes a "host is rebooting" message that might be misleading | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Michael Burman <mburman> | ||||||
Component: | BLL.Infra | Assignee: | Miroslava Voglova <mvoglova> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Petr Matyáš <pmatyas> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.1.0.2 | CC: | bugs, mburman, mperina, oourfali, pmatyas, ybronhei | ||||||
Target Milestone: | ovirt-4.2.0 | Flags: | rule-engine:
ovirt-4.2+
|
||||||
Target Release: | 4.2.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-12-20 10:45:21 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Watching the engine log you can see that the engine triggers soft fencing when you stop vdsm service and get network exception. soft fencing sshing to the host and restart vdsmd. if it works, host gets back to up, if not, actual fence is done. In any case, it should happens since 4.0 and its not related to mom The ssh connection from mburman-4-upgrade-env.scl.lab.tlv.redhat.com is bringing vdsm up. Mom has nothing to do with it according to this log (it is only started after the vdsm service). Ok, but still this behavior is wrong. And it's new, it didn't happen on 4.0 And it happen only to some hosts, not to all of them. What if i need to stop vdsmd on the host? you probably need to update the bug component as well.. but look, this is the soft fencing logic. you need to disable fencing in the cluster edit tab. I'm quite sure it will do the trick and nothing will disturb your vdsmd down time Michael - can you attach the engine.log? I want to check where the "rebooting" comes from. If you don't put host to Maintenance and stop VDSM, host become NonResponsive and engine will try to fence it (1st by SSH Soft Fencing and then by Power Management). If your host was in Maintenance before you stopped VDSM manually, then please provide engine logs. Created attachment 1243847 [details]
engine log
So Seems like this is a general return when the non responsive treatment succeeded. The soft fencing is part of that treatment. Martin - thoughts? I can we can rephrase, or add a specific audit log to distinguish. Also, what happens in case it is skipped? I've updated the title and set the severity to low. Right, we don't distinguish between SSH Soft Fencing success and PM Restart success when returning result of VdsNotRespondingTreatment (code works fine, but events displayed to user may be a bit confusing). For now moving to 4.2, when a patch is ready we can retarget Verified on ovirt-engine-4.2.0-0.0.master.20171113223918.git25568c3.el7.centos.noarch This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |
Created attachment 1243632 [details] messages log Description of problem: mom-vdsm brings vdsmd up on it's own. It is seems that mom-vdsm is bringing vdsmd up although i stopped vdsmd with systemctl. Jan 23 16:28:44 orchid-vds1.qa.lab.tlv.redhat.com systemd[1]: Stopping MOM instance configured for VDSM purposes... ... Jan 23 16:28:54 orchid-vds1.qa.lab.tlv.redhat.com systemd[1]: Stopped MOM instance configured for VDSM purposes. Jan 23 16:28:54 orchid-vds1.qa.lab.tlv.redhat.com systemd[1]: Unit mom-vdsm.service entered failed state. Jan 23 16:28:54 orchid-vds1.qa.lab.tlv.redhat.com systemd[1]: mom-vdsm.service failed. (expected, we should probably increase the timeout) Jan 23 16:28:54 orchid-vds1.qa.lab.tlv.redhat.com systemd[1]: Stopping Virtual Desktop Server Manager... Jan 23 16:28:54 orchid-vds1.qa.lab.tlv.redhat.com systemd[1]: Stopped Virtual Desktop Server Manager. Jan 23 16:30:03 orchid-vds1.qa.lab.tlv.redhat.com sshd[15886]: Accepted publickey for root from 10.35.163.149 port 56318 ssh2: RSA 2c:d8:62: The host resolves to: mburman-4-upgrade-env.scl.lab.tlv.redhat.com Jan 23 16:30:04 orchid-vds1.qa.lab.tlv.redhat.com systemd-logind[694]: New session 21 of user root. Jan 23 16:30:04 orchid-vds1.qa.lab.tlv.redhat.com systemd[1]: Started Session 21 of user root. Jan 23 16:30:04 orchid-vds1.qa.lab.tlv.redhat.com systemd[1]: Starting Session 21 of user root. Jan 23 16:30:04 orchid-vds1.qa.lab.tlv.redhat.com sshd[15886]: pam_unix(sshd:session): session opened for user root by (uid=0) Jan 23 16:30:04 orchid-vds1.qa.lab.tlv.redhat.com systemd[1]: Starting Virtual Desktop Server Manager Version-Release number of selected component (if applicable): mom-0.5.8-1.el7ev.noarch vdsm-4.19.2-2.el7ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. Stop vdsmd on host using systemctl command Actual results: After a minute host comes up and vdsmd is running. Engine report in the ui event log Host 10.35.129.22 is non responsive. Status of host 10.35.129.22 was set to Up. Host 10.35.129.22 is rebooting. (why rebooting?) Expected results: If stopping vdsmd with systemctl it should remain down.