Bug 680034

Summary: [vsdm][storage][error-handlnig]connectStorageServer takes about 10 minutes during storage disconnection on certain topologies
Product: Red Hat Enterprise Linux 6 Reporter: Moran Goldboim <mgoldboi>
Component: vdsmAssignee: Yotam Oron <yoron>
Status: CLOSED ERRATA QA Contact: Daniel Paikov <dpaikov>
Severity: high Docs Contact:
Priority: low    
Version: 6.1CC: abaron, bazulay, danken, hateya, iheim, ilvovsky, mkenneth, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.9-80.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 07:08:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
vdsm log none

Description Moran Goldboim 2011-02-24 06:50:13 UTC
Created attachment 480662 [details]
vdsm log

Description of problem:
topology:
-2 storage domains - on 2 nfs servers

when disconnecting one of the 2 storage domains, after vdsm is restarted -> rhevm is sending connectStorageServer with the 2 servers connection string together, this causes a timeout on rhevm, and fails the action. 

Version-Release number of selected component (if applicable):
vdsm-4.9-49.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1.create 2 SDs on 2 different storage servers
2.disconnect the master domain.
3.
  
Actual results:


Expected results:


Additional info:
Thread-27::DEBUG::2011-02-23 16:03:15,075::clientIF::229::Storage.Dispatcher.Protect::(wrapper) [10.35.104.7]                                                                             
Thread-27::INFO::2011-02-23 16:03:15,075::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: connectStorageServer, args: (domType=1, spUUID=07bf7773-b187-4540-bbef-1f3f33d5ceea, conList=[{'connection': 'orion.qa.lab.tlv.redhat.com:/export/mgoldboi/data1', 'iqn': '', 'portal': '', 'user': '', 'password': '******', 'id': '6e9e7a32-2565-4662-ab40-08a3a1c73b3d', 'port': ''}, {'connection': 'qanashead.qa.lab.tlv.redhat.com:/export/mgoldboi/data1', 'iqn': '', 'portal': '', 'user': '', 'password': '******', 'id': '3100cb1a-5ae9-49ee-8450-ea268e24531f', 'port': ''}])
Thread-27::DEBUG::2011-02-23 16:03:15,076::task::491::TaskManager.Task::(_debug) Task 3085df0c-08f2-459b-bc07-18ec7c811c51: moving from state init -> state preparing
Thread-27::INFO::2011-02-23 16:03:15,076::storage_connection::83::Storage.ServerConnection::(connect) Request to connect NFS storage server
Thread-27::INFO::2011-02-23 16:03:15,076::storage_connection::41::Storage.ServerConnection::(__validateConnectionParams) conList=[{'connection': 'orion.qa.lab.tlv.redhat.com:/export/mgoldboi/data1', 'iqn': '', 'portal': '', 'user': '', 'password': '******', 'id': '6e9e7a32-2565-4662-ab40-08a3a1c73b3d', 'port': ''}, {'connection': 'qanashead.qa.lab.tlv.redhat.com:/export/mgoldboi/data1', 'iqn': '', 'portal': '', 'user': '', 'password': '******', 'id': '3100cb1a-5ae9-49ee-8450-ea268e24531f', 'port': ''}]
Thread-27::DEBUG::2011-02-23 16:09:15,081::fileUtils::109::Storage.Misc.excCmd::(umount) '/usr/bin/sudo -n /bin/umount -f /rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_mgoldboi_data1' (cwd None)
Thread-27::DEBUG::2011-02-23 16:09:41,139::fileUtils::109::Storage.Misc.excCmd::(umount) FAILED: <err> = 'umount2: Device or resource busy\numount.nfs: /rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_mgoldboi_data1: device is busy\numount2: Device or resource busy\numount.nfs: /rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_mgoldboi_data1: device is busy\n'; <rc> = 16
Thread-27::ERROR::2011-02-23 16:12:41,140::storage_connection::169::Storage.ServerConnection::(__connectFileServer) Error during storage connection: [Errno 17] File exists: '/rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_mgoldboi_data1'
Thread-27::DEBUG::2011-02-23 16:12:41,162::task::491::TaskManager.Task::(_debug) Task 3085df0c-08f2-459b-bc07-18ec7c811c51: finished: {'statuslist': [{'status': 0, 'id': '6e9e7a32-2565-4662-ab40-08a3a1c73b3d'}, {'status': 451, 'id': '3100cb1a-5ae9-49ee-8450-ea268e24531f'}]}
Thread-27::DEBUG::2011-02-23 16:12:41,163::task::491::TaskManager.Task::(_debug) Task 3085df0c-08f2-459b-bc07-18ec7c811c51: moving from state preparing -> state finished
Thread-27::DEBUG::2011-02-23 16:12:41,163::resourceManager::786::irs::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-27::DEBUG::2011-02-23 16:12:41,163::resourceManager::821::irs::(cancelAll) Owner.cancelAll requests {}
Thread-27::DEBUG::2011-02-23 16:12:41,164::task::491::TaskManager.Task::(_debug) Task 3085df0c-08f2-459b-bc07-18ec7c811c51: ref 0 aborting False
Thread-27::INFO::2011-02-23 16:12:41,164::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: connectStorageServer, Return response: {'status': {'message': 'OK', 'code': 0}, 'statuslist': [{'status': 0, 'id': '6e9e7a32-2565-4662-ab40-08a3a1c73b3d'}, {'status': 451, 'id': '3100cb1a-5ae9-49ee-8450-ea268e24531f'}]}

Comment 3 Ayal Baron 2011-02-24 09:45:18 UTC
Moran, why did you mark this as a regression?
Is this in 2.3 and not in 2.2?

Comment 4 Moran Goldboim 2011-02-24 09:54:24 UTC
in 2.2 the vdsm was stuck on this scenario (GIL), behaviour was worst actually, but it's still buggy now...
anyhow- removing regression.

Comment 6 Yotam Oron 2011-06-16 14:27:09 UTC
http://gerrit.usersys.redhat.com/#change,596

Comment 9 Daniel Paikov 2011-07-19 13:23:30 UTC
Checked on 4.9-81.

Comment 10 errata-xmlrpc 2011-12-06 07:08:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html