Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1022976 - SD is partially accessible after extending.
SD is partially accessible after extending.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.2.0
x86_64 Linux
urgent Severity urgent
: ---
: 3.3.0
Assigned To: Sergey Gotliv
Aharon Canan
storage
: ZStream
Depends On: 1023206
Blocks: 1025467 3.3snap3
  Show dependency treegraph
 
Reported: 2013-10-24 07:49 EDT by Pavel Zhukov
Modified: 2016-02-10 12:31 EST (History)
21 users (show)

See Also:
Fixed In Version: is24
Doc Type: Bug Fix
Doc Text:
LvmCache did not invalidate stale filters, so after adding a new FC or iSCSI LUN to a volume group, hosts could not access the storage domains and became non-operational. Now, all filters are validated after a new device is added and before the storage domain is extended, so hosts can access storage domains which have been extended.
Story Points: ---
Clone Of:
: 1025467 (view as bug list)
Environment:
Last Closed: 2014-01-21 11:19:15 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 522173 None None None Never
oVirt gerrit 20552 None None None Never
oVirt gerrit 21223 None None None Never
Red Hat Product Errata RHBA-2014:0040 normal SHIPPED_LIVE vdsm bug fix and enhancement update 2014-01-21 15:26:21 EST

  None (edit)
Description Pavel Zhukov 2013-10-24 07:49:14 EDT
Description of problem:
After extending of the SD hosts went to non-operational status because of inaccessible SD. 

Version-Release number of selected component (if applicable):
vdsm-4.10.2-23.0.el6ev.x86_64

How reproducible:
Unknown yet

Steps to Reproduce:
1. Map new LUN
2. Run multipath -r (optional)
3. Extend SD

Actual results:
Hosts went to N/O state VMs started to migrate but failed and stuck in 'Migrating from' status. 

Expected results:
Hosts are up 

Additional info:
Comment 7 Pavel Zhukov 2013-10-24 16:37:49 EDT
looks like vgs command (_reloadvgs) returns 0 even if pvs are missed [1], as result domainThreadMonitor uses wrong lvmcache (with stale filters). 
vgck return 5 if pvs are missed and cmd function invalides filters, tries to  run with new ones. Bug chkVG doesn't update lvmcache. 

[1] 
Thread-688802::DEBUG::2013-10-23 11:09:01,967::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a%20017380063ea059a|20017380063ea059b|20017380063ea06a3|20017380063ea06a4|20017380063ea06a5|20017380063ea06a6|20017380063ea06a7|20017380063ea06a8|20017380063ea06b6|20017380063ea06b7|20017380063ea06b8|20017380063ea06b9|20017380063ea06ba|20017380063ea06bb|20017380063ea06bc|20017380063ea06bd|20017380063ea06be|20017380063ea06bf|20017380063ea06c0|20017380063ea06c1|20017380063ea06c2|20017380063ea06c3|20017380063ea06c4|20017380063ea06c5|20017380063ea06c6|20017380063ea06c7|20017380063ea06c8|20017380063ea06c9|20017380063ea06ca|20017380063ea06cb|20017380063ea084b|20017380063ea084c|20017380063ea084d|20017380063ea084e|20017380063ea084f%\', \'r%.*%\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 8a9259ec-90c7-455a-ba90-9d29584425e4' (cwd None)
Thread-688802::DEBUG::2013-10-23 11:09:02,639::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = "  Couldn't find device with uuid hegDlo-Q0sQ-bmf3-E293-bIJF-0fj3-85jMDP.\n  Couldn't find device with uuid FkqpOc-6XDn-2igg-nA2n-110Q-LHlU-TwqiL9.\n  Couldn't find device with uuid VUSiHy-oTWE-ORNh-HkxU-TDu6-GBNk-pgwZBo.\n  Couldn't find device with uuid lGs4mM-wZix-pYYn-8uTr-i3As-Ocnm-PM1Aia.\n  Couldn't find device with uuid 4WtfsQ-Kpxm-uylt-MQ2R-hJjb-KEUU-SI1v66.\n"; <rc> = 0
Comment 10 Nir Soffer 2013-10-27 19:58:17 EDT
This seems to be the error in this log:

This vgchk command failed because of stale filters:

Thread-688802::DEBUG::2013-10-23 11:09:02,647::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgck --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a%20017380063ea059a|20017380063ea059b|20017380063ea06a3|20017380063ea06a4|20017380063ea06a5|20017380063ea06a6|20017380063ea06a7|20017380063ea06a8|20017380063ea06b6|20017380063ea06b7|20017380063ea06b8|20017380063ea06b9|20017380063ea06ba|20017380063ea06bb|20017380063ea06bc|20017380063ea06bd|20017380063ea06be|20017380063ea06bf|20017380063ea06c0|20017380063ea06c1|20017380063ea06c2|20017380063ea06c3|20017380063ea06c4|20017380063ea06c5|20017380063ea06c6|20017380063ea06c7|20017380063ea06c8|20017380063ea06c9|20017380063ea06ca|20017380063ea06cb|20017380063ea084b|20017380063ea084c|20017380063ea084d|20017380063ea084e|20017380063ea084f%\', \'r%.*%\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " 8a9259ec-90c7-455a-ba90-9d29584425e4' (cwd None)
Thread-688802::DEBUG::2013-10-23 11:09:03,230::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = "  Couldn't find device with uuid hegDlo-Q0sQ-bmf3-E293-bIJF-0fj3-85jMDP.\n  Couldn't find device with uuid FkqpOc-6XDn-2igg-nA2n-110Q-LHlU-TwqiL9.\n  Couldn't find device with uuid VUSiHy-oTWE-ORNh-HkxU-TDu6-GBNk-pgwZBo.\n  Couldn't find device with uuid lGs4mM-wZix-pYYn-8uTr-i3As-Ocnm-PM1Aia.\n  Couldn't find device with uuid 4WtfsQ-Kpxm-uylt-MQ2R-hJjb-KEUU-SI1v66.\n  The volume group is missing 5 physical volumes.\n"; <rc> = 5

Then filters are invalidated and command is run again and succeeds:

Thread-688802::DEBUG::2013-10-23 11:09:03,238::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgck --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a%20017380063ea059a|20017380063ea059b|20017380063ea06a3|20017380063ea06a4|20017380063ea06a5|20017380063ea06a6|20017380063ea06a7|20017380063ea06a8|20017380063ea06b6|20017380063ea06b7|20017380063ea06b8|20017380063ea06b9|20017380063ea06ba|20017380063ea06bb|20017380063ea06bc|20017380063ea06bd|20017380063ea06be|20017380063ea06bf|20017380063ea06c0|20017380063ea06c1|20017380063ea06c2|20017380063ea06c3|20017380063ea06c4|20017380063ea06c5|20017380063ea06c6|20017380063ea06c7|20017380063ea06c8|20017380063ea06c9|20017380063ea06ca|20017380063ea06cb|20017380063ea084b|20017380063ea084c|20017380063ea084d|20017380063ea084e|20017380063ea084f|20017380063ea08c5|20017380063ea08c6|20017380063ea08c7|20017380063ea08c8|20017380063ea08c9%\', \'r%.*%\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " 8a9259ec-90c7-455a-ba90-9d29584425e4' (cwd None)

But seems that vg.partial flag was not corrected - therefore selftest() raises.

Thread-688802::ERROR::2013-10-23 11:09:13,460::domainMonitor::225::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 8a9259ec-90c7-455a-ba90-9d29584425e4 monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 201, in _monitorDomain
    self.domain.selftest()
  File "/usr/share/vdsm/storage/blockSD.py", line 805, in selftest
    raise se.StorageDomainAccessError(self.sdUUID)
StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('8a9259ec-90c7-455a-ba90-9d29584425e4',)

So it seems that at least partial solution is to update vg status after running vgck.
Comment 26 Zdenek Kabelac 2013-10-31 17:46:48 EDT
Looking here into comment 15  - and putting it into context with Bug 1020401 - I'd suggest to apply same workaround:

Modify /etc/lvm/lvm.conf  - devices { obtain_device_list_from_udev=0 }

Udev in RHEL6.4/6.5 is unfortunately broken and can't be fixed to work reliable under heavy workload.

Also please remove dependency on 1023206  - vgs is not a tool for checking consistency - it's reporting tool.
Comment 27 Nir Soffer 2013-10-31 19:24:09 EDT
(In reply to Zdenek Kabelac from comment #26)
> Looking here into comment 15  - and putting it into context with Bug 1020401
> - I'd suggest to apply same workaround:
> 
> Modify /etc/lvm/lvm.conf  - devices { obtain_device_list_from_udev=0 }
> 
> Udev in RHEL6.4/6.5 is unfortunately broken and can't be fixed to work
> reliable under heavy workload.

There is no load here - this is caused by wrong filter caching in vdsm. The pv is not found because we our filter is missing the new pv.
Comment 28 Aharon Canan 2013-11-24 11:46:32 EST
verified using is24.1

after consulting with sergey, following comment #22 and steps in description
Comment 29 errata-xmlrpc 2014-01-21 11:19:15 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html

Note You need to log in before you can comment on or make changes to this bug.