| Summary: | blocked on lvm scalability: vdsm fails to connect to storage pool for 5 minutes - on a loaded environment | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Moran Goldboim <mgoldboi> | ||||
| Component: | vdsm | Assignee: | Eduardo Warszawski <ewarszaw> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Moran Goldboim <mgoldboi> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.2 | CC: | abaron, bazulay, ewarszaw, fsimonce, hateya, iheim, ilvovsky, ykaul | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | vdsm-4.9-80.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-12-06 07:19:11 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | 658639 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
There are two commands that are taking a long time to complete, they both are lvm vgchanges:
Thread-321::DEBUG::2011-05-24 11:33:42,997::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgchange --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/36006016066102900341493a64a79e011|/dev/mapper/36006016066102900351493a64a79e011|/dev/mapper/36006016066102900361493a64a79e011|/dev/mapper/360060160661029007e98a17ffa81e011|/dev/mapper/36006016066102900be8c283b0344e011%\\", \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --refresh 40b5a575-8e2e-44a7-ab56-46ab5aa4bab9' (cwd None)
Thread-321::DEBUG::2011-05-24 11:35:04,221::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0
Thread-321::DEBUG::2011-05-24 11:36:33,305::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgchange --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/36006016066102900341493a64a79e011|/dev/mapper/36006016066102900351493a64a79e011|/dev/mapper/36006016066102900361493a64a79e011|/dev/mapper/360060160661029007e98a17ffa81e011|/dev/mapper/36006016066102900be8c283b0344e011%\\", \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --refresh 2b27b725-9063-4a27-899a-3ec8d02c1ade' (cwd None)
Thread-321::DEBUG::2011-05-24 11:37:59,899::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0
The 'refresh' option is known to take long time in vgs with many lvs.
Resolution of this issue depends on bug 658639 - [RHEL 6.2] [RFE] LVM scalability for RHEV
verified (4.9-83). on host loaded with luns (120), connectStoragePool takes 1 minute and half. still there are 5 lvm commands, 4 specifically for various vgs and another general one. there still work to be done here, but this bz can be closed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2011-1782.html |
Created attachment 500606 [details] vdsm log Description of problem: Running a host with 60 vms and restarting libvirt, vdsm is restarting and getting connectStoragePool command from rhevm. this command isn't answered for 3 minutes and rhevm is starting migration process for the running vms. some lvm operations are taking lots of time: Thread-321::DEBUG::2011-05-24 11:33:42,997::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgchange --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/36006016066102900341493a64a79e011|/dev/mapper/36006016066102900351493a64a79e011|/dev/mapper/36006016066102900361493a64a79e011|/dev/mapper/360060160661029007e98a17ffa81e011|/dev/mapper/36006016066102900be8c283b0344e011%\\", \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --refresh 40b5a575-8e2e-44a7-ab56-46ab5aa4bab9' (cwd None) Thread-321::DEBUG::2011-05-24 11:35:04,221::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0 Thread-321::DEBUG::2011-05-24 11:35:04,340::lvm::527::OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-321::DEBUG::2011-05-24 11:36:33,083::lvm::527::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex Thread-321::DEBUG::2011-05-24 11:36:33,106::lvm::529::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex Thread-321::DEBUG::2011-05-24 11:36:33,187::lvm::538::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex Thread-321::DEBUG::2011-05-24 11:36:33,214::lvm::550::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex Thread-321::INFO::2011-05-24 11:36:33,227::sp::862::Storage.StoragePool::(_linkStorageDomain) Linking /rhev/data-center/mnt/blockSD/40b5a575-8e2e-44a7-ab56-46ab5aa4bab9 to /rhev/data-center/06b29e0f-8c22-431e-a6f1-75989a582c52/40b5a575-8e2e-44a7-ab56-46ab5aa4bab9 Thread-321::DEBUG::2011-05-24 11:36:33,279::persistentDict::144::Storage.PersistentDict::(transaction) Starting transaction Thread-321::DEBUG::2011-05-24 11:36:33,289::persistentDict::137::Storage.PersistentDict::(transaction) Reusing active transaction Thread-321::DEBUG::2011-05-24 11:36:33,291::persistentDict::137::Storage.PersistentDict::(transaction) Reusing active transaction Thread-321::DEBUG::2011-05-24 11:36:33,295::persistentDict::152::Storage.PersistentDict::(transaction) Finished transaction Thread-321::DEBUG::2011-05-24 11:36:33,305::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgchange --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/36006016066102900341493a64a79e011|/dev/mapper/36006016066102900351493a64a79e011|/dev/mapper/36006016066102900361493a64a79e011|/dev/mapper/360060160661029007e98a17ffa81e011|/dev/mapper/36006016066102900be8c283b0344e011%\\", \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --refresh 2b27b725-9063-4a27-899a-3ec8d02c1ade' (cwd None) Thread-321::DEBUG::2011-05-24 11:37:59,899::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0 in addition there are errors on interliving getVolumeSize of vms in the middle of it: Thread-49::INFO::2011-05-24 11:33:17,045::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: getVolumeSize, args: ( sdUUID=2b27b725-9063-4a27-899a-3ec8d02c1ade spUUID=06b29e0f-8c22-431e-a6f1-75989a582c52 imgUUID=46e4383 4-4827-408a-b444-9754446d762d volUUID=044189e5-8b79-4644-8a75-ed4037e713d2) clientIFinit::INFO::2011-05-24 11:33:17,050::clientIF::527::vds::(_getNetworkIp) network None: using 0 Thread-49::DEBUG::2011-05-24 11:33:17,054::task::492::TaskManager.Task::(_debug) Task 1a0ba393-b832-4418-8d47-4c4b4f7489b2: moving from state init -> state preparing clientIFinit::INFO::2011-05-24 11:33:17,057::clientIF::670::vds::(create) vmContainerLock aquired by vm 57965f76-667b-4d56-aa9f-d1508dc88ef8 Thread-49::ERROR::2011-05-24 11:33:17,058::task::856::TaskManager.Task::(_setError) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 864, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/hsm.py", line 1611, in public_getVolumeSize self.validateConnectedPool(spUUID) File "/usr/share/vdsm/storage/hsm.py", line 84, in validateConnectedPool raise se.StoragePoolUnknown(spUUID) StoragePoolUnknown: Unknown pool id, pool not connected: ('06b29e0f-8c22-431e-a6f1-75989a582c52',) Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: