842635 – [RHEV] When there is no connection between the host and storage domain,the host goes to reboot.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 842635 - [RHEV] When there is no connection between the host and storage domain,the host goes to reboot.

Summary: [RHEV] When there is no connection between the host and storage domain,the ho...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	sanlock
Sub Component:
Version:	6.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	David Teigland
QA Contact:	Haim
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:	800588 837024
TreeView+	depends on / blocked

Reported:	2012-07-24 10:00 UTC by Leonid Natapov
Modified:	2012-10-24 15:34 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-10-24 15:34:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs (1.08 MB, application/octet-stream) 2012-07-24 10:00 UTC, Leonid Natapov	no flags	Details
View All

Description Leonid Natapov 2012-07-24 10:00:43 UTC

Created attachment 599980 [details]
logs

Scenario:
I have host with is SPM and I am blocking connection between this host and Storage Domain. The host goes to reboot in a minute or so.

According to David when the connection to storage is lost, it will try to kill all the pids using it.  If it can't kill them in time, then the host will be reset by the watchdog. And for some reason the root sanlock helper process was killed. Sanlock has to use the helper process to do the kill().  If the helper is not there to kill the pids, then the watchdog will kill the host. 

As I said above what I did is only blocking connection between the host and SD,so there is a problem with the sanlock root helper process that being killed.

vdsm and sanlock logs are attached.


sanlock-2.3-2.1.el6.x86_64
libvirt-lock-sanlock-0.9.10-21.el6.x86_64
sanlock-lib-2.3-2.1.el6.x86_64
sanlock-python-2.3-2.1.el6.x86_64
vdsm-4.9.6-23.0.el6_3.x86_64

Comment 2 David Teigland 2012-07-30 14:32:23 UTC

After looking more closely, Federico and I found that it was not related to killing the helper process, but instead was related to an unmount from vdsm being stuck or taking too long.

Comment 3 David Teigland 2012-09-20 21:16:31 UTC

Can we close this bz?  I don't think this was a bug.

Note You need to log in before you can comment on or make changes to this bug.