Bug 680034 - [vsdm][storage][error-handlnig]connectStorageServer takes about 10 minutes during storage disconnection on certain topologies
Summary: [vsdm][storage][error-handlnig]connectStorageServer takes about 10 minutes du...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.1
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: rc
: ---
Assignee: Yotam Oron
QA Contact: Daniel Paikov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-24 06:50 UTC by Moran Goldboim
Modified: 2011-12-06 07:08 UTC (History)
8 users (show)

Fixed In Version: vdsm-4.9-80.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 07:08:07 UTC
Target Upstream Version:


Attachments (Terms of Use)
vdsm log (930.02 KB, application/x-tar)
2011-02-24 06:50 UTC, Moran Goldboim
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:1782 0 normal SHIPPED_LIVE new packages: vdsm 2011-12-06 11:55:51 UTC

Description Moran Goldboim 2011-02-24 06:50:13 UTC
Created attachment 480662 [details]
vdsm log

Description of problem:
topology:
-2 storage domains - on 2 nfs servers

when disconnecting one of the 2 storage domains, after vdsm is restarted -> rhevm is sending connectStorageServer with the 2 servers connection string together, this causes a timeout on rhevm, and fails the action. 

Version-Release number of selected component (if applicable):
vdsm-4.9-49.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1.create 2 SDs on 2 different storage servers
2.disconnect the master domain.
3.
  
Actual results:


Expected results:


Additional info:
Thread-27::DEBUG::2011-02-23 16:03:15,075::clientIF::229::Storage.Dispatcher.Protect::(wrapper) [10.35.104.7]                                                                             
Thread-27::INFO::2011-02-23 16:03:15,075::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: connectStorageServer, args: (domType=1, spUUID=07bf7773-b187-4540-bbef-1f3f33d5ceea, conList=[{'connection': 'orion.qa.lab.tlv.redhat.com:/export/mgoldboi/data1', 'iqn': '', 'portal': '', 'user': '', 'password': '******', 'id': '6e9e7a32-2565-4662-ab40-08a3a1c73b3d', 'port': ''}, {'connection': 'qanashead.qa.lab.tlv.redhat.com:/export/mgoldboi/data1', 'iqn': '', 'portal': '', 'user': '', 'password': '******', 'id': '3100cb1a-5ae9-49ee-8450-ea268e24531f', 'port': ''}])
Thread-27::DEBUG::2011-02-23 16:03:15,076::task::491::TaskManager.Task::(_debug) Task 3085df0c-08f2-459b-bc07-18ec7c811c51: moving from state init -> state preparing
Thread-27::INFO::2011-02-23 16:03:15,076::storage_connection::83::Storage.ServerConnection::(connect) Request to connect NFS storage server
Thread-27::INFO::2011-02-23 16:03:15,076::storage_connection::41::Storage.ServerConnection::(__validateConnectionParams) conList=[{'connection': 'orion.qa.lab.tlv.redhat.com:/export/mgoldboi/data1', 'iqn': '', 'portal': '', 'user': '', 'password': '******', 'id': '6e9e7a32-2565-4662-ab40-08a3a1c73b3d', 'port': ''}, {'connection': 'qanashead.qa.lab.tlv.redhat.com:/export/mgoldboi/data1', 'iqn': '', 'portal': '', 'user': '', 'password': '******', 'id': '3100cb1a-5ae9-49ee-8450-ea268e24531f', 'port': ''}]
Thread-27::DEBUG::2011-02-23 16:09:15,081::fileUtils::109::Storage.Misc.excCmd::(umount) '/usr/bin/sudo -n /bin/umount -f /rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_mgoldboi_data1' (cwd None)
Thread-27::DEBUG::2011-02-23 16:09:41,139::fileUtils::109::Storage.Misc.excCmd::(umount) FAILED: <err> = 'umount2: Device or resource busy\numount.nfs: /rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_mgoldboi_data1: device is busy\numount2: Device or resource busy\numount.nfs: /rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_mgoldboi_data1: device is busy\n'; <rc> = 16
Thread-27::ERROR::2011-02-23 16:12:41,140::storage_connection::169::Storage.ServerConnection::(__connectFileServer) Error during storage connection: [Errno 17] File exists: '/rhev/data-center/mnt/qanashead.qa.lab.tlv.redhat.com:_export_mgoldboi_data1'
Thread-27::DEBUG::2011-02-23 16:12:41,162::task::491::TaskManager.Task::(_debug) Task 3085df0c-08f2-459b-bc07-18ec7c811c51: finished: {'statuslist': [{'status': 0, 'id': '6e9e7a32-2565-4662-ab40-08a3a1c73b3d'}, {'status': 451, 'id': '3100cb1a-5ae9-49ee-8450-ea268e24531f'}]}
Thread-27::DEBUG::2011-02-23 16:12:41,163::task::491::TaskManager.Task::(_debug) Task 3085df0c-08f2-459b-bc07-18ec7c811c51: moving from state preparing -> state finished
Thread-27::DEBUG::2011-02-23 16:12:41,163::resourceManager::786::irs::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-27::DEBUG::2011-02-23 16:12:41,163::resourceManager::821::irs::(cancelAll) Owner.cancelAll requests {}
Thread-27::DEBUG::2011-02-23 16:12:41,164::task::491::TaskManager.Task::(_debug) Task 3085df0c-08f2-459b-bc07-18ec7c811c51: ref 0 aborting False
Thread-27::INFO::2011-02-23 16:12:41,164::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: connectStorageServer, Return response: {'status': {'message': 'OK', 'code': 0}, 'statuslist': [{'status': 0, 'id': '6e9e7a32-2565-4662-ab40-08a3a1c73b3d'}, {'status': 451, 'id': '3100cb1a-5ae9-49ee-8450-ea268e24531f'}]}

Comment 3 Ayal Baron 2011-02-24 09:45:18 UTC
Moran, why did you mark this as a regression?
Is this in 2.3 and not in 2.2?

Comment 4 Moran Goldboim 2011-02-24 09:54:24 UTC
in 2.2 the vdsm was stuck on this scenario (GIL), behaviour was worst actually, but it's still buggy now...
anyhow- removing regression.

Comment 6 Yotam Oron 2011-06-16 14:27:09 UTC
http://gerrit.usersys.redhat.com/#change,596

Comment 9 Daniel Paikov 2011-07-19 13:23:30 UTC
Checked on 4.9-81.

Comment 10 errata-xmlrpc 2011-12-06 07:08:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html


Note You need to log in before you can comment on or make changes to this bug.