Bug 842635 - [RHEV] When there is no connection between the host and storage domain,the host goes to reboot.
[RHEV] When there is no connection between the host and storage domain,the ho...
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sanlock (Show other bugs)
Unspecified Unspecified
unspecified Severity high
: rc
: ---
Assigned To: David Teigland
Depends On:
Blocks: 800588 837024
  Show dependency treegraph
Reported: 2012-07-24 06:00 EDT by Leonid Natapov
Modified: 2012-10-24 11:34 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-10-24 11:34:20 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
logs (1.08 MB, application/octet-stream)
2012-07-24 06:00 EDT, Leonid Natapov
no flags Details

  None (edit)
Description Leonid Natapov 2012-07-24 06:00:43 EDT
Created attachment 599980 [details]

I have host with is SPM and I am blocking connection between this host and Storage Domain. The host goes to reboot in a minute or so.

According to David when the connection to storage is lost, it will try to kill all the pids using it.  If it can't kill them in time, then the host will be reset by the watchdog. And for some reason the root sanlock helper process was killed. Sanlock has to use the helper process to do the kill().  If the helper is not there to kill the pids, then the watchdog will kill the host. 

As I said above what I did is only blocking connection between the host and SD,so there is a problem with the sanlock root helper process that being killed.

vdsm and sanlock logs are attached.

Comment 2 David Teigland 2012-07-30 10:32:23 EDT
After looking more closely, Federico and I found that it was not related to killing the helper process, but instead was related to an unmount from vdsm being stuck or taking too long.
Comment 3 David Teigland 2012-09-20 17:16:31 EDT
Can we close this bz?  I don't think this was a bug.

Note You need to log in before you can comment on or make changes to this bug.