Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 486074[details]
full vdsm-log
Description of problem:
When trying to activate unreachable NFS storage domain - vms become non-responsive for about 200 seconds.
from watching the logs refreshIfDirty is causing the problem
Thread-4160::DEBUG::2011-03-17 18:09:37,659::misc::1011::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc._refreshDomains)
Thread-2871::DEBUG::2011-03-17 18:09:37,660::task::491::TaskManager.Task::(_debug) Task be4a9093-a850-49ba-bd38-9cac62eb8607: moving from state init -> state preparing
Thread-2871::INFO::2011-03-17 18:09:37,661::sp::1410::Storage.StoragePool::(getDomains) Get storage pool domains: {'9908a9cd-c8d2-4bc8-b4c5-57edd2625c7c': 'Active', '74b99101-ae
73-4c38-a2bf-4f8cd930e2f7': 'Attached'}
Thread-2871::DEBUG::2011-03-17 18:09:37,661::sdc::54::Storage.StorageDomainCache::(_refreshIfDirty) Cache is cold-expired
Thread-2871::DEBUG::2011-03-17 18:09:37,661::misc::1011::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc._refreshDomains)
...
Thread-5162::DEBUG::2011-03-17 18:12:59,248::
'monitorResponse': '-1' statsAge': '203.60'
Version-Release number of selected component (if applicable):
-vdsm-4.9-54.el6.x86_64
-lvm2-2.02.83-2.el6.x86_64
Steps to Reproduce:
1.Have 2 nfs DSD in the pool from 2 different servers
2.Block connection to non master domain using iptables
3.try to activate the unreachable SD
Full vdsm log attached
Comment 3Federico Simoncelli
2011-03-24 10:30:43 UTC
The vm stats are not updated (statsAge > 2) because the thread is blocked on getVolumeSize.
This is very easy to reproduce without a vm running (the monitorResponse/statsAge values are only a symptom of the problem).
See also bug 679123.
For debug purpose we can disable the pool.refresh() call in public_activateStorageDomain:
--- a/vdsm/storage/spm.py
+++ b/vdsm/storage/spm.py
@@ -1191,9 +1191,7 @@ class SPM:
hsm.HSM.validateSdUUID(sdUUID)
hsm.HSM.validatePoolSD(spUUID, sdUUID)
except:
- pool.refresh()
- hsm.HSM.validateSdUUID(sdUUID)
- hsm.HSM.validatePoolSD(spUUID, sdUUID)
+ raise
pool.activateSD(sdUUID)
Having changed that getVolumeSize doesn't get stuck anymore and we finally see the real error in the logs:
Thread-82::ERROR::2011-03-24 10:23:10,380::task::854::TaskManager.Task::(_setError) Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 862, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/storage/spm.py", line 114, in run
return self.func(*args, **kwargs)
File "/usr/share/vdsm/storage/spm.py", line 1191, in public_activateStorageDomain
hsm.HSM.validateSdUUID(sdUUID)
File "/usr/share/vdsm/storage/hsm.py", line 111, in validateSdUUID
SDF.produce(sdUUID=sdUUID).validate()
File "/usr/share/vdsm/storage/fileSD.py", line 185, in validate
self.Metadata(cache=False)
File "/usr/share/vdsm/storage/sd.py", line 482, in Metadata
return self._metadata.metadata
File "/usr/share/vdsm/storage/metadata.py", line 110, in __getmetadata
md = self._get()
File "/usr/share/vdsm/storage/fileSD.py", line 39, in _get
return misc.stripNewLines(oop.directReadLines(self.metafile))
File "/usr/share/vdsm/storage/processPool.py", line 35, in wrapper
return self.runExternally(func, *args, **kwds)
File "/usr/share/vdsm/storage/processPool.py", line 63, in runExternally
raise Timeout("Operation Stuck")
Timeout: Operation Stuck
Created attachment 486074 [details] full vdsm-log Description of problem: When trying to activate unreachable NFS storage domain - vms become non-responsive for about 200 seconds. from watching the logs refreshIfDirty is causing the problem Thread-4160::DEBUG::2011-03-17 18:09:37,659::misc::1011::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc._refreshDomains) Thread-2871::DEBUG::2011-03-17 18:09:37,660::task::491::TaskManager.Task::(_debug) Task be4a9093-a850-49ba-bd38-9cac62eb8607: moving from state init -> state preparing Thread-2871::INFO::2011-03-17 18:09:37,661::sp::1410::Storage.StoragePool::(getDomains) Get storage pool domains: {'9908a9cd-c8d2-4bc8-b4c5-57edd2625c7c': 'Active', '74b99101-ae 73-4c38-a2bf-4f8cd930e2f7': 'Attached'} Thread-2871::DEBUG::2011-03-17 18:09:37,661::sdc::54::Storage.StorageDomainCache::(_refreshIfDirty) Cache is cold-expired Thread-2871::DEBUG::2011-03-17 18:09:37,661::misc::1011::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc._refreshDomains) ... Thread-5162::DEBUG::2011-03-17 18:12:59,248:: 'monitorResponse': '-1' statsAge': '203.60' Version-Release number of selected component (if applicable): -vdsm-4.9-54.el6.x86_64 -lvm2-2.02.83-2.el6.x86_64 Steps to Reproduce: 1.Have 2 nfs DSD in the pool from 2 different servers 2.Block connection to non master domain using iptables 3.try to activate the unreachable SD Full vdsm log attached