Bug 1160620
| Summary: | createVolume (iscsi) fails on qemu-img create | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ilanit Stein <istein> | ||||
| Component: | vdsm | Assignee: | Liron Aravot <laravot> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Raz Tamir <ratamir> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.4.3-1 | CC: | acanan, amureini, bazulay, ebenahar, ecohen, fdeutsch, gcheresh, gklein, iheim, istein, kwolf, lpeer, lsurette, nsoffer, rbalakri, Rhev-m-bugs, scohen, tnisan, yeylon | ||||
| Target Milestone: | --- | Keywords: | AutomationBlocker, Regression, Triaged, ZStream | ||||
| Target Release: | 3.5.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | vdsm-4.16.7.6-1.el6ev | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1167484 (view as bug list) | Environment: | |||||
| Last Closed: | 2015-02-16 13:39:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1167484 | ||||||
| Attachments: |
|
||||||
|
Description
Ilanit Stein
2014-11-05 09:25:00 UTC
Created attachment 953971 [details]
test logs
Looking in the vdsm logs, it can be see that qemu-img fails inconsistently (sometimes it succeeds) - we create the lv and immediately fails on qemu-img create on input/output error.
1. Ilanit, any chance that there were issues with the storage machine? have you tried that again/on other storage? did it ever run fine on RHEL 6.6?
please attach also /var/log/messages relevant output
2. Kevin, any idea about that?
1 - failed
----------------------------------------------------------------------
Thread-31::DEBUG::2014-11-03 17:28:47,393::task::595::TaskManager.Task::(_updateState) Task=`54165d94-129f-4bcf-b48e-dd76a9dcaf54`::moving from state init -> state preparing
Thread-31::INFO::2014-11-03 17:28:47,393::logUtils::44::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='81eac9e9-9e9c-42b2-9321-a51460f87daa', spUUID='0c02850d-6a66-403e-b1ee-d5d46083685f', imgUUID='2a364c66-fc99-40b2-ae7f-a6c9a4e4430e', size='3221225472', volFormat=4, preallocate=2, diskType=2, volUUID='b1463849-697e-4f8d-8bc0-4c469c2f9c56', desc='', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000')
Thread-31::DEBUG::2014-11-03 17:28:47,394::resourceManager::198::ResourceManager.Request::(__init__) ResName=`Storage.81eac9e9-9e9c-42b2-9321-a51460f87daa`ReqID=`a4d32dcc-b9a2-4c89-a02d-51e731d38345`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '1464' at 'createVolume'
LV operations:
ab5d4378-cb58-4be8-b069-07a13cf5b0ef::DEBUG::2014-11-03 17:28:51,059::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm lvcreate --config " dev
ices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filte
r = [ \'a|/dev/mapper/360060160f4a030001418580f8161e411|/dev/mapper/360060160f4a0300020761df78f62e411|/dev/mapper/360060160f4a030005c85dd3b9062e411|/dev/mappe
r/360060160f4a030007e6871105c61e411|/dev/mapper/360060160f4a0300080b208299062e411|/dev/mapper/360060160f4a03000d2ef2a189062e411|/dev/mapper/360060160f4a03000d
cb0b29c5e63e411|\', \'r|.*|\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_da
ys = 0 } " --autobackup n --contiguous n --size 1024m --addtag OVIRT_VOL_INITIALIZING --name 93690b57-0297-4f8d-a7c1-4f670064485b 81eac9e9-9e9c-42b2-9321-a514
60f87daa' (cwd None)
f4cdf24f-4152-49c9-89b4-278f753fbb41::DEBUG::2014-11-03 17:28:51,131::lvm::296::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' WARNING: This metadata update i
s NOT backed up\n WARNING: This metadata update is NOT backed up\n'; <rc> = 0
f4cdf24f-4152-49c9-89b4-278f753fbb41::DEBUG::2014-11-03 17:28:51,132::lvm::513::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the
operation mutex
f4cdf24f-4152-49c9-89b4-278f753fbb41::DEBUG::2014-11-03 17:28:51,132::lvm::525::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex
f4cdf24f-4152-49c9-89b4-278f753fbb41::DEBUG::2014-11-03 17:28:51,133::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm lvchange --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/360060160f4a030001418580f8161e411|/dev/mapper/360060160f4a0300020761df78f62e411|/dev/mapper/360060160f4a030005c85dd3b9062e411|/dev/mapper/360060160f4a030007e6871105c61e411|/dev/mapper/360060160f4a0300080b208299062e411|/dev/mapper/360060160f4a03000d2ef2a189062e411|/dev/mapper/360060160f4a03000dcb0b29c5e63e411|\', \'r|.*|\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } " --autobackup n --available n 81eac9e9-9e9c-42b2-9321-a51460f87daa/2e46ef36-7276-4d19-b1f9-59079c0e7bf1' (cwd None)
Dummy-30::DEBUG::2014-11-03 17:28:51,372::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) 'dd if=/rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None)
qemu-img:
c469c2f9c56 3145728K' (cwd None)
54165d94-129f-4bcf-b48e-dd76a9dcaf54::DEBUG::2014-11-03 17:28:48,063::volume::1042::Storage.Misc.excCmd::(createVolume) FAILED: <err> = '/rhev/data-center/0c0
2850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/images/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0-4c469c2f9c56: error
while creating qcow2: Input/output error\n'; <rc> = 1
54165d94-129f-4bcf-b48e-dd76a9dcaf54::ERROR::2014-11-03 17:28:48,064::volume::502::Storage.Volume::(create) Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/volume.py", line 463, in create
srcVolUUID, imgPath, volPath)
File "/usr/share/vdsm/storage/blockVolume.py", line 165, in _create
preallocate)
File "/usr/share/vdsm/storage/volume.py", line 1044, in createVolume
raise se.VolumeCreationError(out)
VolumeCreationError: Error creating a new volume: (["Formatting '/rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/i
mages/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0-4c469c2f9c56', fmt=qcow2 size=3221225472 encryption=off cluster_size=65536 "],)
----------------------------------------------------------------------
succeeds:
----------------------------------------------------------------------
Thread-31::INFO::2014-11-03 17:28:39,651::logUtils::44::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='81eac9e9-9e9c-42b2-9321-a51460f87daa', spUUID='0c02850d-6a66-403e-b1ee-d5d46083685f', imgUUID='0e120fde-8bde-4817-83dc-10f542229007', size='3221225472', volFormat=4, preallocate=2, diskType=2, volUUID='91597b02-0892-4f43-9e2e-31da2735213d', desc='', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000')
db851c79-2ea8-4297-a6ae-5507a630c2b2::DEBUG::2014-11-03 17:28:40,443::volume::1042::Storage.Misc.excCmd::(createVolume) '/usr/bin/qemu-img create -f qcow2 /rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/images/0e120fde-8bde-4817-83dc-10f542229007/91597b02-0892-4f43-9e2e-31da2735213d 3145728K' (cwd None)
db851c79-2ea8-4297-a6ae-5507a630c2b2::DEBUG::2014-11-03 17:28:40,461::volume::1042::Storage.Misc.excCmd::(createVolume) SUCCESS: <err> = ''; <rc> = 0
----------------------------------------------------------------------
(In reply to Liron Aravot from comment #3) > 2. Kevin, any idea about that? It's always hard to understand VDSM logs. Which part of the following error actually comes from qemu-img? > c469c2f9c56 3145728K' (cwd None) > 54165d94-129f-4bcf-b48e-dd76a9dcaf54::DEBUG::2014-11-03 > 17:28:48,063::volume::1042::Storage.Misc.excCmd::(createVolume) FAILED: > <err> = '/rhev/data-center/0c0 > 2850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/ > images/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0- > 4c469c2f9c56: error > while creating qcow2: Input/output error\n'; <rc> = 1 I checked for the string "error while creating" in qemu-img and it doesn't exist. This part must come from VDSM. This makes me wonder where the "Input/output error" comes from. If this was a qemu-img output line, it should be including 'qemu-img:' and the file name - does VDSM parse the output lines? Anyway -EIO isn't used internally in qemu-img in the path for creating qcow2 images. This must be an error coming from the kernel, probably in response to a write request. You could try to confirm with strace, if you can reproduce this manually outside VDSM. I have the same behavior in 3.5 Although I can see Status as locked in xml, it fails on Status not found (In reply to comment #3) Ilanit, any chance that there were issues with the storage machine? -> Not that I am aware of. have you tried that again/on other storage? -> No did it ever run fine on RHEL 6.6? -> Yes, on av12.1 please attach also /var/log/messages relevant output -> It is already attached (test logs, see for each server, has it's own messages log. This might be related to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1162078 (In reply to Kevin Wolf from comment #4) > (In reply to Liron Aravot from comment #3) > > 2. Kevin, any idea about that? > > It's always hard to understand VDSM logs. Which part of the following error > actually comes from qemu-img? > > > c469c2f9c56 3145728K' (cwd None) > > 54165d94-129f-4bcf-b48e-dd76a9dcaf54::DEBUG::2014-11-03 > > 17:28:48,063::volume::1042::Storage.Misc.excCmd::(createVolume) FAILED: > > <err> = '/rhev/data-center/0c0 > > 2850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/ > > images/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0- > > 4c469c2f9c56: error > > while creating qcow2: Input/output error\n'; <rc> = 1 > > I checked for the string "error while creating" in qemu-img and it doesn't > exist. This part must come from VDSM. This makes me wonder where the > "Input/output error" comes from. > > If this was a qemu-img output line, it should be including 'qemu-img:' and > the file name - does VDSM parse the output lines? > Hi Kevin, that error comes from qemu-img, i assume it's added somewhere along the way - see the above when running it manually on my machine [root@dhcp-1-43 ~]# /usr/bin/qemu-img create -f qcow2 /mnt/asdd/a.zzz 3145728K Formatting '/mnt/asdd/a.zzz', fmt=qcow2 size=3221225472 encryption=off cluster_size=65536 /mnt/asdd/a.zzz: error while creating qcow2: No such file or directory looking in /var/log/messages i see no difference between the sucessful creation to the failed one: failed- ---------------------------------------------------------- Nov 3 17:28:46 puma18 multipathd: dm-16: remove map (uevent) Nov 3 17:28:46 puma18 multipathd: dm-16: devmap not registered, can't remove Nov 3 17:28:48 puma18 vdsm root WARNING File: /rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/images/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0-4c469c2f9c56 already removed Nov 3 17:28:48 puma18 vdsm TaskManager.Task ERROR Task=`54165d94-129f-4bcf-b48e-dd76a9dcaf54`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 873, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/storage/task.py", line 334, in run#012 return self.cmd(*self.argslist, **self.argsdict)#012 File "/usr/share/vdsm/storage/securable.py", line 75, in wrapper#012 return method(self, *args, **kwargs)#012 File "/usr/share/vdsm/storage/sp.py", line 1771, in createVolume#012 desc=desc, srcImgUUID=srcImgUUID, srcVolUUID=srcVolUUID)#012 File "/usr/share/vdsm/storage/sd.py", line 421, in createVolume#012 preallocate, diskType, volUUID, desc, srcImgUUID, srcVolUUID)#012 File "/usr/share/vdsm/storage/volume.py", line 463, in create#012 srcVolUUID, imgPath, volPath)#012 File "/usr/share/vdsm/storage/blockVolume.py", line 165, in _create#012 preallocate)#012 File "/usr/share/vdsm/storage/volume.py", line 1044, in createVolume#012 raise se.VolumeCreationError(out)#012VolumeCreationError: Error creating a new volume: (["Formatting '/rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/images/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0-4c469c2f9c56', fmt=qcow2 size=3221225472 encryption=off cluster_size=65536 "],) Nov 3 17:28:48 puma18 multipathd: dm-16: remove map (uevent) Nov 3 17:28:48 puma18 multipathd: dm-16: devmap not registered, can't remove ---------------------------------------------------------- succeeded: ---------------------------------------------------------- Nov 3 17:26:15 puma18 cpuspeed: Disabling ondemand cpu frequency scaling governor Nov 3 17:26:15 puma18 cpuspeed: Enabling performance cpu frequency scaling governor Nov 3 17:28:40 puma18 vdsm root WARNING File: /rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/images/0e120fde-8bde-4817-83dc-10f542229007/91597b02-0892-4f43-9e2e-31da2735213d already removed Nov 3 17:28:41 puma18 multipathd: dm-16: remove map (uevent) Nov 3 17:28:41 puma18 multipathd: dm-16: devmap not registered, can't remove Nov 3 17:28:41 puma18 multipathd: dm-16: remove map (uevent) Nov 3 17:28:41 puma18 multipathd: dm-16: devmap not registered, can't remove ---------------------------------------------------------- > Anyway -EIO isn't used internally in qemu-img in the path for creating qcow2 > images. This must be an error coming from the kernel, probably in response > to a > write request. You could try to confirm with strace, if you can reproduce > this > manually outside VDSM. Well that doesn't reproduce consistently, as some calls finish with success. I assume from your answer that you haven't encountered that issue. Ilanit, any chance you can try the above just be on the safe side on the test env? Kevin, any ideas on how to proceed with that besides that? thanks (In reply to Liron Aravot from comment #8) > (In reply to Kevin Wolf from comment #4) > > I checked for the string "error while creating" in qemu-img and it doesn't > > exist. This part must come from VDSM. This makes me wonder where the > > "Input/output error" comes from. > > > > If this was a qemu-img output line, it should be including 'qemu-img:' and > > the file name - does VDSM parse the output lines? > > > > Hi Kevin, > that error comes from qemu-img, i assume it's added somewhere along the way > - see the above when running it manually on my machine Sorry, yes. I was looking at the RHEL 7 version of qemu-img. The conclusion that there is no internal usage of -EIO remains the same for RHEL 6, though. The error must come from the kernel. > Well that doesn't reproduce consistently, as some calls finish with success. > I assume from your answer that you haven't encountered that issue. No, it doesn't ring a bell. > Kevin, any ideas on how to proceed with that besides that? I think finding a minimal reproducer outside of VDSM is the key. My expectation is that the same should be possible to trigger even without qemu-img then, with something as simple as dd, but an strace of qemu-img (with children) of a failing case would be useful to confirm that qemu-img isn't trying to do anything crazy. This problem was not seen on same automatic test, and same storage server, for av12.1, running on rhel6.6 hosts. Ilanit, can you check what were the qemu versions in the success/failed runs? Some more info: 1. Problem was not seen on same automatic test, and same storage server, for 3.5, running on rhel7 hosts. 2. Same test, run on av13.1, with NFS, instead of iscsi - PASS (no add disk failures) For both tests, the 1 that pass on av12.1, and these that fail, starting av12.4, the qemu-img is the same:
'qemu-img': {'release': '2.448.el6', 'buildtime': 1411764164L, 'version': '0.12.1.2'},
I ma not sure about the inner workings of vdsm, but if it's LVM/dm related the it could be related to bug 1148979. (In reply to Fabian Deutsch from comment #14) > I ma not sure about the inner workings of vdsm, but if it's LVM/dm related > the it could be related to bug 1148979. Nir, does this align with your findings? Fabian, we tested with device-mapper-multipath-0.4.9-80.el6_6.1 (which the bug you suggested was fixed in), but still managed to reproduce - so i'm afraid that this doesn't solve it. thanks, Liron *** Bug 1164030 has been marked as a duplicate of this bug. *** by mistake moved to verified. Verified - vt13.1 |