Bug 1014942
Summary: | [vdsm] Creating several disks at once fails CreateVolume with LVReplaceTagError sometimes | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Gadi Ickowicz <gickowic> | ||||||||
Component: | vdsm | Assignee: | Nir Soffer <nsoffer> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Gadi Ickowicz <gickowic> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 3.3.0 | CC: | abaron, acanan, amureini, bazulay, gickowic, iheim, lpeer, nlevinki, nsoffer, scohen, srevivo, yeylon | ||||||||
Target Milestone: | --- | Flags: | nsoffer:
needinfo-
amureini: Triaged+ |
||||||||
Target Release: | 3.3.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | storage | ||||||||||
Fixed In Version: | is22 | Doc Type: | Bug Fix | ||||||||
Doc Text: |
The udev cache was used for obtaining a device list, but this sometimes returned an incomplete list which led to missing volume errors. Therefore, creating several disks in parallel could fail.
LVM disabled the udev cache in the in RHEL 6.5, eliminating this error. However, on machines upgraded from earlier versions, lvm does not change the current configuration after an upgrade. It is the responsiblity of the system adminstrator to check the new lvm configuration and update the old confguration with new values.
The udev cache is now disabled in vdsm lvm configuration, which ensures proper configuration also on upgraded machines.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-01-21 16:17:04 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1019461, 1032811 | ||||||||||
Attachments: |
|
1. Can you share the code running in the automated tests that reproduce this? 2. Is is possible that a previous test was not finished before the new test began? Do you run the test with a fresh storage domain or run several tests on the same domain one after another? Created attachment 809892 [details] vdsm logs from failed run and successful run (In reply to Nir Soffer from comment #1) > 1. Can you share the code running in the automated tests that reproduce this? > 2. Is is possible that a previous test was not finished before the new test > began? Do you run the test with a fresh storage domain or run several tests > on the same domain one after another? Storage domain is a new storage domain each time. Attaching archive with a successful run and another failed run for comparison. Can you provide /var/log/messages log for the duration of the test of both a successful and failed test? Created attachment 809991 [details]
var/log/messages for failed and successful run
Added /var/log/messages for both a successful and a failed run
First, please complete the basic details: RHEL version: multipathd version: lvm version: Second, please run the same test with the previous release: IS 16. Please run the same test with build IS 15; we have a theory that the failure is caused by change introduced in IS 16. Nir, please try to ask for all missing info you need at once. Like Gadi wrote, we added logs of successful run and another failed run for comparison. we can't rerun as we have other things in our pipe. I am sure you can recheck it without us (you asked to run using is15 and then asked is16) I agree that Gadi could attach the automation code, but you can understand the flow from the logs - also he mentioned what the test does. we will try to do our best to help but based on our priorities. (In reply to Aharon Canan from comment #7) When your priorities allow and you run this test again, do enable lvm logging. lvm default log level is 0, means no messages are logged even for critical errors. Please set lvm log level to 7 (LOG_DEBUG) while running this test. To change lvm log level, edit /etc/lvm/lvm.conf, find the log section and edit level to 7: log { ... level = 7 ... } This bug is caused by lvm regresion in 6.5, see bug 1020401. A fix for lvm bug 1020401 is in a qa now, and will be available in lvm2-2.02.100-7.el6. Please test again when this version is available. If you want to test now, you can edit /etc/lvm/lvm.conf and disable obtain_device_list_from_udev option: devices { obtain_device_list_from_udev = 0 } Patch was merge in upstream. (In reply to Nir Soffer from comment #10) > A fix for lvm bug 1020401 is in a qa now, and will be available in > lvm2-2.02.100-7.el6. This package is installed on our hosts and so far I have not seen this error. > > Please test again when this version is available. > > If you want to test now, you can edit /etc/lvm/lvm.conf and disable > obtain_device_list_from_udev option: > > devices { > obtain_device_list_from_udev = 0 > } (In reply to Gadi Ickowicz from comment #12) > (In reply to Nir Soffer from comment #10) > > A fix for lvm bug 1020401 is in a qa now, and will be available in > > lvm2-2.02.100-7.el6. > This package is installed on our hosts and so far I have not seen this error. It the depend on how you install the host. If you do a clean install of 6.5, then this issue should be resolve with any vdsm build. But if you install an older that includes lvm2 < 2.02.100-7, then your lvm.conf would not be updated with new lvm defaults, and the issue will be reproduced with current vdsm builds, but should be fixed with the next build containing the fix. Do test machines upgraded from 6.4 to 6.5? (In reply to Nir Soffer from comment #13) > (In reply to Gadi Ickowicz from comment #12) > > (In reply to Nir Soffer from comment #10) > > > A fix for lvm bug 1020401 is in a qa now, and will be available in > > > lvm2-2.02.100-7.el6. > > This package is installed on our hosts and so far I have not seen this error. > > It the depend on how you install the host. If you do a clean install of 6.5, > then this issue should be resolve with any vdsm build. But if you install an > older that includes lvm2 < 2.02.100-7, then your lvm.conf would not be > updated with new lvm defaults, and the issue will be reproduced with current > vdsm builds, but should be fixed with the next build containing the fix. > > Do test machines upgraded from 6.4 to 6.5? The machine is a clean install of 6.5 Unless you want to test machine upgraded from 6.4, you can close this bug. Now when we included our own fix, we do not depend any more on bug 1020401. verified on is22 - error did not show up in any nightly automation runs since is22 This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0040.html |
Created attachment 806897 [details] vdsm + engine logs Description of problem: Sometimes when creating several vm disks in parallel (for several vms) one of the disk creations fail with LVReplaceTagError. So far this was only seen using automated tests. 9fc26478-80c6-4315-a4d0-386f9394f5c0::ERROR::2013-10-02 01:56:20,557::volume::505::Storage.Volume::(create) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/volume.py", line 466, in create srcVolUUID, imgPath, volPath) File "/usr/share/vdsm/storage/blockVolume.py", line 205, in _create addTags=mdTags) File "/usr/share/vdsm/storage/lvm.py", line 1198, in changeLVTags (lvname, ", ".join(addTags), ", ".join(delTags), err[-1])) LogicalVolumeReplaceTagError: Replace Logical Volume tag error: ('lv: `48874cca-4e85-4beb-8afe-3d72a0d908cc/2c612fee-8612-4df1-8ff8-899ecd3b98df` add: `MD_6, PU_00000000-0000-0000-0000-000000000000, IU_3825fdcd-1290-4c88-9607-ac7397b4b211` del: `OVIRT_VOL_INITIALIZING` ( Skipping volume group 48874cca-4e85-4beb-8afe-3d72a0d908cc)',) c26ef2d7-e6b3-4b96-8f67-c6128b6575f0::DEBUG::2013-10-02 01:56:20,559::lvm::439::OperationMutex::(_reloadlvs) Operation 'lvm reload operation' got the operation mutex 9fc26478-80c6-4315-a4d0-386f9394f5c0::DEBUG::2013-10-02 01:56:20,561::resourceManager::615::ResourceManager::(releaseResource) Trying to release resource '48874cca-4e85-4beb-8afe-3d72a0d908cc_imageNS.3825fdcd-1290-4c88-9607-ac7397b4b211' 9fc26478-80c6-4315-a4d0-386f9394f5c0::DEBUG::2013-10-02 01:56:20,563::resourceManager::634::ResourceManager::(releaseResource) Released resource '48874cca-4e85-4beb-8afe-3d72a0d908cc_imageNS.3825fdcd-1290-4c88-9607-ac7397b4b211' (0 active users) 9fc26478-80c6-4315-a4d0-386f9394f5c0::DEBUG::2013-10-02 01:56:20,564::resourceManager::640::ResourceManager::(releaseResource) Resource '48874cca-4e85-4beb-8afe-3d72a0d908cc_imageNS.3825fdcd-1290-4c88-9607-ac7397b4b211' is free, finding out if anyone is waiting for it. 9fc26478-80c6-4315-a4d0-386f9394f5c0::DEBUG::2013-10-02 01:56:20,565::resourceManager::648::ResourceManager::(releaseResource) No one is waiting for resource '48874cca-4e85-4beb-8afe-3d72a0d908cc_imageNS.3825fdcd-1290-4c88-9607-ac7397b4b211', Clearing records. c26ef2d7-e6b3-4b96-8f67-c6128b6575f0::DEBUG::2013-10-02 01:56:20,566::lvm::308::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm lvs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a|/dev/mapper/3514f0c54d5c017fe|/dev/mapper/3514f0c54d5c017ff|/dev/mapper/3514f0c54d5c01800|/dev/mapper/3514f0c54d5c0181b|/dev/mapper/3514f0c54d5c01832|/dev/mapper/3514f0c54d5c01833|/dev/mapper/3514f0c54d5c018ba|\', \'r|.*|\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags 48874cca-4e85-4beb-8afe-3d72a0d908cc' (cwd None) 9fc26478-80c6-4315-a4d0-386f9394f5c0::ERROR::2013-10-02 01:56:20,574::task::850::TaskManager.Task::(_setError) Task=`9fc26478-80c6-4315-a4d0-386f9394f5c0`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 318, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1994, in createVolume desc=desc, srcImgUUID=srcImgUUID, srcVolUUID=srcVolUUID) File "/usr/share/vdsm/storage/sd.py", line 411, in createVolume preallocate, diskType, volUUID, desc, srcImgUUID, srcVolUUID) File "/usr/share/vdsm/storage/volume.py", line 466, in create srcVolUUID, imgPath, volPath) File "/usr/share/vdsm/storage/blockVolume.py", line 205, in _create addTags=mdTags) File "/usr/share/vdsm/storage/lvm.py", line 1198, in changeLVTags (lvname, ", ".join(addTags), ", ".join(delTags), err[-1])) LogicalVolumeReplaceTagError: Replace Logical Volume tag error: ('lv: `48874cca-4e85-4beb-8afe-3d72a0d908cc/2c612fee-8612-4df1-8ff8-899ecd3b98df` add: `MD_6, PU_00000000-0000-0000-0000-000000000000, IU_3825fdcd-1290-4c88-9607-ac7397b4b211` del: `OVIRT_VOL_INITIALIZING` ( Skipping volume group 48874cca-4e85-4beb-8afe-3d72a0d908cc)',) Version-Release number of selected component (if applicable): vdsm-4.12.0-170.el6ev.x86_64 How reproducible: ~50% Steps to Reproduce: Reproduced through automated tests 1. Try to create 6 vms in parallel each with a 10GB thin provision disk Actual results: Sometimes one of the disk creations will fail Expected results: Disk should be created Additional info: