RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 870756 - [engine-core] getStorageDomainsList command takes more then 60 sec if DC has a broken connection to NFS storage
Summary: [engine-core] getStorageDomainsList command takes more then 60 sec if DC has ...
Keywords:
Status: CLOSED DUPLICATE of bug 865034
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.4
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Ayal Baron
QA Contact: Haim
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-10-28 14:22 UTC by vvyazmin@redhat.com
Modified: 2012-11-04 10:08 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-11-03 22:02:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
## Logs vdsm (3.99 MB, application/octet-stream)
2012-10-28 14:22 UTC, vvyazmin@redhat.com
no flags Details

Description vvyazmin@redhat.com 2012-10-28 14:22:17 UTC
Created attachment 634534 [details]
## Logs vdsm

Description of problem:  getStorageDomainsList command takes more then 60 sec, if DC have a broken connection to NFS storage


Version-Release number of selected component (if applicable):
RHEVM 3.1 - SI22

RHEVM: rhevm-3.1.0-22.el6ev.noarch
VDSM: vdsm-4.9.6-39.0.el6_3.x86_64
LIBVIRT: libvirt-0.9.10-21.el6_3.5.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64
SANLOCK: sanlock-2.3-4.el6_3.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create  iSCSI DC with 2 hosts, with one iSCSI SD, Export NFS (or ISO NFS) domain
2. Block (via iptables) connection to Export NFS – for simulation broken connection to Export NFS (or ISO NFS) domain
3. Run “vdsClient -s 0 getStorageDomainsList
“ command
 
Actual results:
[root@cougar08 ~]# time vdsClient -s 0 getStorageDomainsList
8f81428c-612d-4f01-a834-aa7fff4e9485


real	1m2.653s
user	0m0.100s
sys	0m0.022s

If we wait less than 5 min, and restore connection to Export NFS, and run again “getStorageDomainsList”, we will get a reasonable response time, see below:
[root@cougar08 ~]# time vdsClient -s 0 getStorageDomainsList
8f81428c-612d-4f01-a834-aa7fff4e9485
27fedd2e-d04e-4a16-a9f7-714f2931e6d3


real	0m2.646s
user	0m0.093s
sys	0m0.029s

but if we wait a couple of hours, and restore connection to Export NFS, the respond be again more than 60 sec.


Expected results:
Command “vdsClient -s 0 getStorageDomainsList
” should run less than 60 seconds, much less reasonable time

Additional info:

With broken connection:

# egrep Thread-5240 less /var/log/vdsm/vdsm.log | less -S


/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:19,116::BindingXMLRPC::171::vds::(wrapper) [10.35.160.137]
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:19,116::task::588::TaskManager.Task::(_updateState) Task=`4818df65-4c8a-4d4e-ab7c-b4d044dd750f`::moving from state init -> sta
/var/log/vdsm/vdsm.log:Thread-5240::INFO::2012-10-28 15:45:19,117::logUtils::37::dispatcher::(wrapper) Run and protect: getStorageDomainsList(spUUID='00000000-0000-0000-0000-000000000000
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:19,117::misc::1043::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage)
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:19,118::misc::1045::SamplingMethod::(__call__) Got in to sampling method
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:19,118::misc::1043::SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan)
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:19,118::misc::1045::SamplingMethod::(__call__) Got in to sampling method
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:19,119::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/iscsiadm -m session -R' (cwd None)
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:19,151::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = ''; <rc> = 0
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:19,151::misc::1053::SamplingMethod::(__call__) Returning last result
/var/log/vdsm/vdsm.log:MainProcess|Thread-5240::DEBUG::2012-10-28 15:45:19,153::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host0/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5240::DEBUG::2012-10-28 15:45:19,156::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host1/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5240::DEBUG::2012-10-28 15:45:19,159::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host2/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5240::DEBUG::2012-10-28 15:45:19,162::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host3/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5240::DEBUG::2012-10-28 15:45:19,165::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host4/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5240::DEBUG::2012-10-28 15:45:19,168::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host5/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5240::DEBUG::2012-10-28 15:45:19,170::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host6/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5240::DEBUG::2012-10-28 15:45:19,173::iscsi::376::Storage.ISCSI::(forceIScsiScan) Performing SCSI scan, this will take up to 30 seconds
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,177::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/multipath' (cwd None)
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,253::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = ''; <rc> = 0
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,258::lvm::460::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,258::lvm::462::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,259::lvm::472::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,259::lvm::474::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,260::lvm::493::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,260::lvm::495::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,260::misc::1053::SamplingMethod::(__call__) Returning last result
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,261::lvm::352::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,263::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,586::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = ''; <rc> = 0
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:45:21,588::lvm::379::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
/var/log/vdsm/vdsm.log:Thread-5240::INFO::2012-10-28 15:46:21,596::logUtils::39::dispatcher::(wrapper) Run and protect: getStorageDomainsList, Return response: {'domlist': ['8f81428c-612
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:46:21,597::task::1172::TaskManager.Task::(prepare) Task=`4818df65-4c8a-4d4e-ab7c-b4d044dd750f`::finished: {'domlist': ['8f81428c-
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:46:21,597::task::588::TaskManager.Task::(_updateState) Task=`4818df65-4c8a-4d4e-ab7c-b4d044dd750f`::moving from state preparing -
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:46:21,597::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:46:21,598::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
/var/log/vdsm/vdsm.log:Thread-5240::DEBUG::2012-10-28 15:46:21,598::task::978::TaskManager.Task::(_decref) Task=`4818df65-4c8a-4d4e-ab7c-b4d044dd750f`::ref 0 aborting False

Without broken connection

# egrep Thread-5466 less /var/log/vdsm/vdsm.log | less -S

/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:54,538::BindingXMLRPC::171::vds::(wrapper) [10.35.160.137]
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:54,538::task::588::TaskManager.Task::(_updateState) Task=`2bea6ecb-fa41-4850-a309-83e0bf589ed5`::moving from state init -> sta
/var/log/vdsm/vdsm.log:Thread-5466::INFO::2012-10-28 15:49:54,539::logUtils::37::dispatcher::(wrapper) Run and protect: getStorageDomainsList(spUUID='00000000-0000-0000-0000-000000000000
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:54,539::misc::1043::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage)
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:54,540::misc::1045::SamplingMethod::(__call__) Got in to sampling method
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:54,540::misc::1043::SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan)
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:54,540::misc::1045::SamplingMethod::(__call__) Got in to sampling method
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:54,541::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/iscsiadm -m session -R' (cwd None)
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:54,572::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = ''; <rc> = 0
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:54,572::misc::1053::SamplingMethod::(__call__) Returning last result
/var/log/vdsm/vdsm.log:MainProcess|Thread-5466::DEBUG::2012-10-28 15:49:54,574::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host0/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5466::DEBUG::2012-10-28 15:49:54,577::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host1/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5466::DEBUG::2012-10-28 15:49:54,580::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host2/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5466::DEBUG::2012-10-28 15:49:54,583::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host3/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5466::DEBUG::2012-10-28 15:49:54,586::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host4/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5466::DEBUG::2012-10-28 15:49:54,588::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host5/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5466::DEBUG::2012-10-28 15:49:54,591::__init__::1164::Storage.Misc.excCmd::(_log) '/bin/dd of=/sys/class/scsi_host/host6/scan' (cwd None)
/var/log/vdsm/vdsm.log:MainProcess|Thread-5466::DEBUG::2012-10-28 15:49:54,594::iscsi::376::Storage.ISCSI::(forceIScsiScan) Performing SCSI scan, this will take up to 30 seconds
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,598::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/multipath' (cwd None)
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,680::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = ''; <rc> = 0
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,681::lvm::460::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,681::lvm::462::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,681::lvm::472::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,682::lvm::474::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,682::lvm::493::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,683::lvm::495::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,683::misc::1053::SamplingMethod::(__call__) Returning last result
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,683::lvm::352::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,686::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,996::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = ''; <rc> = 0
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:56,998::lvm::379::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
/var/log/vdsm/vdsm.log:Thread-5466::INFO::2012-10-28 15:49:57,007::logUtils::39::dispatcher::(wrapper) Run and protect: getStorageDomainsList, Return response: {'domlist': ['8f81428c-612
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:57,008::task::1172::TaskManager.Task::(prepare) Task=`2bea6ecb-fa41-4850-a309-83e0bf589ed5`::finished: {'domlist': ['8f81428c-
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:57,008::task::588::TaskManager.Task::(_updateState) Task=`2bea6ecb-fa41-4850-a309-83e0bf589ed5`::moving from state preparing -
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:57,009::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:57,009::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
/var/log/vdsm/vdsm.log:Thread-5466::DEBUG::2012-10-28 15:49:57,009::task::978::TaskManager.Task::(_decref) Task=`2bea6ecb-fa41-4850-a309-83e0bf589ed5`::ref 0 aborting False

Comment 4 Yaniv Bronhaim 2012-10-29 09:21:35 UTC
It doesn't look like a bug. The operation takes time to restore connectivity to storage and scan the domains list if we had stuck operation before. That's the nature of timeout. 
We fail the old getDomainsList and then you get the new answer. The last operation failure finished only after timeout that takes here more than a minute.  

I don't think its related to how long you have waited before running getStorageDomainsList, the time changes if you need to wait for old operations to end and the storage response.

Comment 5 Eduardo Warszawski 2012-11-03 22:02:06 UTC
The ~two minutes:

Thread-5240::DEBUG::2012-10-28 15:45:19,151::misc::1053::SamplingMethod::(__call__) Returning last result
Thread-5240::DEBUG::2012-10-28 15:45:21,177::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/multipath' (cwd None)


# > 1min whithout logs?
Dummy-5046::DEBUG::2012-10-28 15:45:19,972::__init__::1164::Storage.Misc.excCmd::(_log) 'dd if=/rhev/data-center/7455040b-1fe6-4434-95a4-57cab522d76e/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None)
Dummy-5046::DEBUG::2012-10-28 15:45:20,047::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0318764 s, 32.1 MB/s\n'; <rc> = 0
Thread-5240::DEBUG::2012-10-28 15:45:21,177::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/multipath' (cwd None)
Thread-5240::DEBUG::2012-10-28 15:45:21,253::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = ''; <rc> = 0

*** This bug has been marked as a duplicate of bug 865034 ***

Comment 6 Eduardo Warszawski 2012-11-04 10:08:46 UTC
This is the corect one:

# ~1 min gap

Thread-5240::DEBUG::2012-10-28 15:45:21,588::lvm::379::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
Thread-5240::INFO::2012-10-28 15:46:21,596::logUtils::39::dispatcher::(wrapper) Run and protect: getStorageDomainsList, Return response: {'domlist': ['8f81428c-612d-4f01-a834-aa7fff4e9485']}


Note You need to log in before you can comment on or make changes to this bug.