Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 707217

Summary:

blocked on lvm scalability: vdsm fails to connect to storage pool for 5 minutes - on a loaded environment

Product:

Red Hat Enterprise Linux 6

Reporter:

Moran Goldboim <mgoldboi>

Component:

vdsm

Assignee:

Eduardo Warszawski <ewarszaw>

Status:

CLOSED ERRATA

QA Contact:

Moran Goldboim <mgoldboi>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

6.2

CC:

abaron, bazulay, ewarszaw, fsimonce, hateya, iheim, ilvovsky, ykaul

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Linux

Whiteboard:

Fixed In Version:

vdsm-4.9-80.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-12-06 07:19:11 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

658639

Bug Blocks:

Attachments:

Description	Flags
vdsm log	none

Description Moran Goldboim 2011-05-24 12:42:34 UTC

Created attachment 500606 [details]
vdsm log

Description of problem:
Running a host with 60 vms and restarting libvirt, vdsm is restarting and getting connectStoragePool command from rhevm. this command isn't answered for 3 minutes and rhevm is starting migration process for the running vms.
some lvm operations are taking lots of time:
Thread-321::DEBUG::2011-05-24 11:33:42,997::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgchange --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/36006016066102900341493a64a79e011|/dev/mapper/36006016066102900351493a64a79e011|/dev/mapper/36006016066102900361493a64a79e011|/dev/mapper/360060160661029007e98a17ffa81e011|/dev/mapper/36006016066102900be8c283b0344e011%\\", \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --refresh 40b5a575-8e2e-44a7-ab56-46ab5aa4bab9' (cwd None)
Thread-321::DEBUG::2011-05-24 11:35:04,221::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0
Thread-321::DEBUG::2011-05-24 11:35:04,340::lvm::527::OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-321::DEBUG::2011-05-24 11:36:33,083::lvm::527::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-321::DEBUG::2011-05-24 11:36:33,106::lvm::529::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex
Thread-321::DEBUG::2011-05-24 11:36:33,187::lvm::538::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-321::DEBUG::2011-05-24 11:36:33,214::lvm::550::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-321::INFO::2011-05-24 11:36:33,227::sp::862::Storage.StoragePool::(_linkStorageDomain) Linking /rhev/data-center/mnt/blockSD/40b5a575-8e2e-44a7-ab56-46ab5aa4bab9 to /rhev/data-center/06b29e0f-8c22-431e-a6f1-75989a582c52/40b5a575-8e2e-44a7-ab56-46ab5aa4bab9
Thread-321::DEBUG::2011-05-24 11:36:33,279::persistentDict::144::Storage.PersistentDict::(transaction) Starting transaction
Thread-321::DEBUG::2011-05-24 11:36:33,289::persistentDict::137::Storage.PersistentDict::(transaction) Reusing active transaction
Thread-321::DEBUG::2011-05-24 11:36:33,291::persistentDict::137::Storage.PersistentDict::(transaction) Reusing active transaction
Thread-321::DEBUG::2011-05-24 11:36:33,295::persistentDict::152::Storage.PersistentDict::(transaction) Finished transaction
Thread-321::DEBUG::2011-05-24 11:36:33,305::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgchange --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/36006016066102900341493a64a79e011|/dev/mapper/36006016066102900351493a64a79e011|/dev/mapper/36006016066102900361493a64a79e011|/dev/mapper/360060160661029007e98a17ffa81e011|/dev/mapper/36006016066102900be8c283b0344e011%\\", \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --refresh 2b27b725-9063-4a27-899a-3ec8d02c1ade' (cwd None)
Thread-321::DEBUG::2011-05-24 11:37:59,899::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0

in addition there are errors on interliving getVolumeSize of vms in the middle of it:
Thread-49::INFO::2011-05-24 11:33:17,045::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: getVolumeSize, args: ( sdUUID=2b27b725-9063-4a27-899a-3ec8d02c1ade spUUID=06b29e0f-8c22-431e-a6f1-75989a582c52 imgUUID=46e4383
4-4827-408a-b444-9754446d762d volUUID=044189e5-8b79-4644-8a75-ed4037e713d2)
clientIFinit::INFO::2011-05-24 11:33:17,050::clientIF::527::vds::(_getNetworkIp) network None: using 0
Thread-49::DEBUG::2011-05-24 11:33:17,054::task::492::TaskManager.Task::(_debug) Task 1a0ba393-b832-4418-8d47-4c4b4f7489b2: moving from state init -> state preparing
clientIFinit::INFO::2011-05-24 11:33:17,057::clientIF::670::vds::(create) vmContainerLock aquired by vm 57965f76-667b-4d56-aa9f-d1508dc88ef8
Thread-49::ERROR::2011-05-24 11:33:17,058::task::856::TaskManager.Task::(_setError) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 864, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1611, in public_getVolumeSize
    self.validateConnectedPool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 84, in validateConnectedPool
    raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected: ('06b29e0f-8c22-431e-a6f1-75989a582c52',)



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Federico Simoncelli 2011-06-10 14:15:23 UTC

There are two commands that are taking a long time to complete, they both are lvm vgchanges:

Thread-321::DEBUG::2011-05-24 11:33:42,997::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgchange --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/36006016066102900341493a64a79e011|/dev/mapper/36006016066102900351493a64a79e011|/dev/mapper/36006016066102900361493a64a79e011|/dev/mapper/360060160661029007e98a17ffa81e011|/dev/mapper/36006016066102900be8c283b0344e011%\\", \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --refresh 40b5a575-8e2e-44a7-ab56-46ab5aa4bab9' (cwd None)
Thread-321::DEBUG::2011-05-24 11:35:04,221::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0

Thread-321::DEBUG::2011-05-24 11:36:33,305::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgchange --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/36006016066102900341493a64a79e011|/dev/mapper/36006016066102900351493a64a79e011|/dev/mapper/36006016066102900361493a64a79e011|/dev/mapper/360060160661029007e98a17ffa81e011|/dev/mapper/36006016066102900be8c283b0344e011%\\", \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --refresh 2b27b725-9063-4a27-899a-3ec8d02c1ade' (cwd None)
Thread-321::DEBUG::2011-05-24 11:37:59,899::lvm::359::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0

The 'refresh' option is known to take long time in vgs with many lvs.
Resolution of this issue depends on bug 658639 - [RHEL 6.2] [RFE] LVM scalability for RHEV

Comment 5 Eduardo Warszawski 2011-06-24 05:42:23 UTC

http://gerrit.usersys.redhat.com/#change,621

Comment 8 Haim 2011-07-20 14:37:31 UTC

verified (4.9-83). on host loaded with luns (120), connectStoragePool takes 1 minute and half.
still there are 5 lvm commands, 4 specifically for various vgs and another general one. 
there still work to be done here, but this bz can be closed.

Comment 9 errata-xmlrpc 2011-12-06 07:19:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html