Bug 732652 - [vdsm][storage]validateStorageServerConnection fails in case mount is already exist on the host
Summary: [vdsm][storage]validateStorageServerConnection fails in case mount is already...
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.3
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: ---
Assignee: Federico Simoncelli
QA Contact: Moran Goldboim
Whiteboard: storage
Depends On:
TreeView+ depends on / blocked
Reported: 2011-08-23 08:26 UTC by Moran Goldboim
Modified: 2011-12-06 07:39 UTC (History)
7 users (show)

Fixed In Version: vdsm-4.9-97.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2011-12-06 07:39:20 UTC

Attachments (Terms of Use)
vdsm log (1.11 MB, application/x-tar)
2011-08-23 08:26 UTC, Moran Goldboim
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:1782 normal SHIPPED_LIVE new packages: vdsm 2011-12-06 11:55:51 UTC

Description Moran Goldboim 2011-08-23 08:26:31 UTC
Created attachment 519406 [details]
vdsm log

Description of problem:
after restarting vdsm and initiating initVdsmOnUp rhevm sends validateStorageServerConnection with export domain in my case (FC DC), i have noticed that when mount exists vdsm returns:
Thread-206::INFO::2011-08-23 10:38:20,222::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: validateStorageServerConnection, Return response: {'status': {'message': 'O
K', 'code': 0}, 'statuslist': [{'status': 453, 'id': '3c2a6165-d61a-4a4a-b414-f973222019f3'}]}

after Thread-206::ERROR::2011-08-23 10:38:20,217::storage_connection::268::Storage.ServerConnection::(__validateFileServer) Error during storage connection validation: Operation Stuck
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/storage_connection.py", line 248, in __validateFileServer
    rc = oop.fileUtils.mount(con['rp'], mountpoint, fsType)
  File "/usr/share/vdsm/storage/processPool.py", line 33, in wrapper
    return self.runExternally(func, *args, **kwds)
  File "/usr/share/vdsm/storage/processPool.py", line 61, in runExternally
    raise Timeout("Operation Stuck")
Timeout: Operation Stuck

when mount doesn't exist vdsm returns it successfully.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.have vdsm connected with FC/iscsi DC and nfs export (hsm)
2.restart vdsm
Actual results:

Expected results:

Additional info:

Comment 2 Federico Simoncelli 2011-08-25 12:03:58 UTC
This is a ProcessPool issue.
Apparently ProcessPool.runExternally is sending the command but it's not picked up by the Helper.
I added some debug and this is the output:

send: (<function mount at 0x7f5667704f50>, ('rhev-i8c-03.mpc.lab.eng.bos.redhat.com:/home/export', '/tmp/tmpwCWYK2', 'nfs'), {})
send: (<function umount at 0x7f566770e050>, ('rhev-i8c-03.mpc.lab.eng.bos.redhat.com:/home/export', '/tmp/tmpwCWYK2', 'nfs'), {})
receive: (<function umount at 0x7f566770e050>, ('rhev-i8c-03.mpc.lab.eng.bos.redhat.com:/home/export', '/tmp/tmpwCWYK2', 'nfs'), {})
return: (<function umount at 0x7f566770e050>, -1, None)

As we can see the "mount" command wasn't picked up and went in time out.
Still can't reproduce on my host. It might be related to the load present on the host.

Comment 3 Federico Simoncelli 2011-08-31 15:34:56 UTC
The issue is that oop.fileUtils.mount uses execCmd which is trying to log some information. Sadly if the log lock was acquired before the oop fork then we get stuck:


That said we can take several approach here:

 1) remove or make logging optional in execCmd
 2) move mount out of the oop (revert part of 7c20a19)
 3) reorganize the code so that only isStaleHandle is oop (race prone)
 4) fix the problem in python

Comment 9 Federico Simoncelli 2011-09-01 09:44:23 UTC
commit ae8a9e17f4ccf35ef3a04f2460b906ead97b0e96
Author: Federico Simoncelli <fsimonce@redhat.com>
Date:   Thu Sep 1 09:39:04 2011 +0000

    BZ#732652 Avoid logging in the processPool Helper
    Logging can cause a deadlock in a multi-threaded multi-process
    environment therefore closing the logging file descriptors is
    not enough we should also remove all the handlers.
    Reference: http://bugs.python.org/issue6721
    Change-Id: Ic70c2b390db23b48f96ba677054039d2773960ae


Comment 15 Moran Goldboim 2011-09-22 09:55:52 UTC
verified on vdsm-4.9-97.el6

Comment 16 errata-xmlrpc 2011-12-06 07:39:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.