732652 – [vdsm][storage]validateStorageServerConnection fails in case mount is already exist on the host

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 732652 - [vdsm][storage]validateStorageServerConnection fails in case mount is already exist on the host

Summary: [vdsm][storage]validateStorageServerConnection fails in case mount is already...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	6.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Federico Simoncelli
QA Contact:	Moran Goldboim
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-08-23 08:26 UTC by Moran Goldboim
Modified:	2011-12-06 07:39 UTC (History)
CC List:	7 users (show)
Fixed In Version:	vdsm-4.9-97.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-12-06 07:39:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
vdsm log (1.11 MB, application/x-tar) 2011-08-23 08:26 UTC, Moran Goldboim	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2011:1782	0	normal	SHIPPED_LIVE	new packages: vdsm	2011-12-06 11:55:51 UTC

Description Moran Goldboim 2011-08-23 08:26:31 UTC

Created attachment 519406 [details]
vdsm log

Description of problem:
after restarting vdsm and initiating initVdsmOnUp rhevm sends validateStorageServerConnection with export domain in my case (FC DC), i have noticed that when mount exists vdsm returns:
Thread-206::INFO::2011-08-23 10:38:20,222::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: validateStorageServerConnection, Return response: {'status': {'message': 'O
K', 'code': 0}, 'statuslist': [{'status': 453, 'id': '3c2a6165-d61a-4a4a-b414-f973222019f3'}]}

after Thread-206::ERROR::2011-08-23 10:38:20,217::storage_connection::268::Storage.ServerConnection::(__validateFileServer) Error during storage connection validation: Operation Stuck
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/storage_connection.py", line 248, in __validateFileServer
    rc = oop.fileUtils.mount(con['rp'], mountpoint, fsType)
  File "/usr/share/vdsm/storage/processPool.py", line 33, in wrapper
    return self.runExternally(func, *args, **kwds)
  File "/usr/share/vdsm/storage/processPool.py", line 61, in runExternally
    raise Timeout("Operation Stuck")
Timeout: Operation Stuck

when mount doesn't exist vdsm returns it successfully.

Version-Release number of selected component (if applicable):
vdsm-4.9-95.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1.have vdsm connected with FC/iscsi DC and nfs export (hsm)
2.restart vdsm
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Federico Simoncelli 2011-08-25 12:03:58 UTC

This is a ProcessPool issue.
Apparently ProcessPool.runExternally is sending the command but it's not picked up by the Helper.
I added some debug and this is the output:

send: (<function mount at 0x7f5667704f50>, ('rhev-i8c-03.mpc.lab.eng.bos.redhat.com:/home/export', '/tmp/tmpwCWYK2', 'nfs'), {})
send: (<function umount at 0x7f566770e050>, ('rhev-i8c-03.mpc.lab.eng.bos.redhat.com:/home/export', '/tmp/tmpwCWYK2', 'nfs'), {})
receive: (<function umount at 0x7f566770e050>, ('rhev-i8c-03.mpc.lab.eng.bos.redhat.com:/home/export', '/tmp/tmpwCWYK2', 'nfs'), {})
return: (<function umount at 0x7f566770e050>, -1, None)

As we can see the "mount" command wasn't picked up and went in time out.
Still can't reproduce on my host. It might be related to the load present on the host.

Comment 3 Federico Simoncelli 2011-08-31 15:34:56 UTC

The issue is that oop.fileUtils.mount uses execCmd which is trying to log some information. Sadly if the log lock was acquired before the oop fork then we get stuck:

 http://www.mail-archive.com/python-bugs-list@python.org/msg116242.html
 http://bugs.python.org/issue6721#msg141286

That said we can take several approach here:

 1) remove or make logging optional in execCmd
 2) move mount out of the oop (revert part of 7c20a19)
 3) reorganize the code so that only isStaleHandle is oop (race prone)
 4) fix the problem in python

Comment 9 Federico Simoncelli 2011-09-01 09:44:23 UTC

commit ae8a9e17f4ccf35ef3a04f2460b906ead97b0e96
Author: Federico Simoncelli <fsimonce>
Date:   Thu Sep 1 09:39:04 2011 +0000

    BZ#732652 Avoid logging in the processPool Helper
    
    Logging can cause a deadlock in a multi-threaded multi-process
    environment therefore closing the logging file descriptors is
    not enough we should also remove all the handlers.
    Reference: http://bugs.python.org/issue6721
    
    Change-Id: Ic70c2b390db23b48f96ba677054039d2773960ae

http://gerrit.usersys.redhat.com/887

Comment 15 Moran Goldboim 2011-09-22 09:55:52 UTC

verified on vdsm-4.9-97.el6

Comment 16 errata-xmlrpc 2011-12-06 07:39:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html

Note You need to log in before you can comment on or make changes to this bug.