Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1160620

Summary: createVolume (iscsi) fails on qemu-img create
Product: Red Hat Enterprise Virtualization Manager Reporter: Ilanit Stein <istein>
Component: vdsmAssignee: Liron Aravot <laravot>
Status: CLOSED CURRENTRELEASE QA Contact: Raz Tamir <ratamir>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.4.3-1CC: acanan, amureini, bazulay, ebenahar, ecohen, fdeutsch, gcheresh, gklein, iheim, istein, kwolf, lpeer, lsurette, nsoffer, rbalakri, Rhev-m-bugs, scohen, tnisan, yeylon
Target Milestone: ---Keywords: AutomationBlocker, Regression, Triaged, ZStream
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: vdsm-4.16.7.6-1.el6ev Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1167484 (view as bug list) Environment:
Last Closed: 2015-02-16 13:39:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1167484    
Attachments:
Description Flags
test logs none

Description Ilanit Stein 2014-11-05 09:25:00 UTC
Description of problem:
In automatic test [1], 5 VMs are created, and disks are added, in parallel.
For 2 or 3 of the 5, add disk fail on: 
"VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205"

[1] http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.4-git/view/Virt/job/3.4-git-compute-virt-vm_migration-iscsi/68/

Version-Release number of selected component (if applicable):
It stated to show up in av12.4, and it still in av12.5, 
Till av12.2 it was not seen.
There is no test result for av12.3. 
test hosts are rhel6.6.

How reproducible:
100% on recent av12.5 test runs.

Additional info:
engine.log ERROR:
2014-11-03 15:43:02,295 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-4) Failed in HSMGetAllTasksStatusesVDS method
2014-11-03 15:43:02,299 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-4) Failed in HSMGetAllTasksStatusesVDS method
2014-11-03 15:43:02,300 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-4) SPMAsyncTask::PollTask: Polling task 22eeac25-9005-42b5-afcb-7e205bee685c (Parent Command AddDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'.
2014-11-03 15:43:02,360 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-4) BaseAsyncTask::LogEndTaskFailure: Task 22eeac25-9005-42b5-afcb-7e205bee685c (Parent Command AddDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure:
-- Result: cleanSuccess
-- Message: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205,
-- Exception: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205
2014-11-03 15:43:02,373 INFO  [org.ovirt.engine.core.bll.CommandAsyncTask] (DefaultQuartzScheduler_Worker-4) CommandAsyncTask::EndActionIfNecessary: All tasks of command 35307878-f0d6-4503-8c93-6716666ee888 has ended -> executing endAction
2014-11-03 15:43:02,373 INFO  [org.ovirt.engine.core.bll.CommandAsyncTask] (DefaultQuartzScheduler_Worker-4) CommandAsyncTask::endAction: Ending action for 1 tasks (command ID: 35307878-f0d6-4503-8c93-6716666ee888): calling endAction .
2014-11-03 15:43:02,374 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-4) SPMAsyncTask::PollTask: Polling task 82e40619-8e53-4290-99c1-0acd5de340b7 (Parent Command AddDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'.
2014-11-03 15:43:02,377 INFO  [org.ovirt.engine.core.bll.CommandAsyncTask] (org.ovirt.thread.pool-4-thread-31) CommandAsyncTask::EndCommandAction [within thread] context: Attempting to endAction AddDisk, executionIndex: 0
2014-11-03 15:43:02,413 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (DefaultQuartzScheduler_Worker-4) BaseAsyncTask::LogEndTaskFailure: Task 82e40619-8e53-4290-99c1-0acd5de340b7 (Parent Command AddDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure:
-- Result: cleanSuccess
-- Message: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205,
-- Exception: VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205
2014-11-03 15:43:02,439 INFO  [org.ovirt.engine.core.bll.CommandAsyncTask] (DefaultQuartzScheduler_Worker-4) CommandAsyncTask::EndActionIfNecessary: All tasks of command 6c5714a4-a18c-48f8-a3c6-efa89a2acf00 has ended -> executing endAction
2014-11-03 15:43:02,440 INFO  [org.ovirt.engine.core.bll.CommandAsyncTask] (DefaultQuartzScheduler_Worker-4) CommandAsyncTask::endAction: Ending action for 1 tasks (command ID: 6c5714a4-a18c-48f8-a3c6-efa89a2acf00): calling endAction .
2014-11-03 15:43:02,440 INFO  [org.ovirt.engine.core.bll.CommandAsyncTask] (org.ovirt.thread.pool-4-thread-32) CommandAsyncTask::EndCommandAction [within thread] context: Attempting to endAction AddDisk, executionIndex: 0
2014-11-03 15:43:02,453 ERROR [org.ovirt.engine.core.bll.AddDiskCommand] (org.ovirt.thread.pool-4-thread-31) [a5e3d605-174a-4fda_addDisk_create] Ending command with failure: org.ovirt.engine.core.bll.AddDiskCommand
2014-11-03 15:43:02,486 ERROR [org.ovirt.engine.core.bll.AddImageFromScratchCommand] (org.ovirt.thread.pool-4-thread-31) [6be4e8c8] Ending command with failure: org.ovirt.engine.core.bll.AddImageFromScratchCommand
2014-11-03 15:43:02,511 ERROR [org.ovirt.engine.core.bll.AddDiskCommand] (org.ovirt.thread.pool-4-thread-32) [761c2472-76ad-4f99_addDisk_create] Ending command with failure: org.ovirt.engine.core.bll.AddDiskCommand
2014-11-03 15:43:02,559 ERROR [org.ovirt.engine.core.bll.AddImageFromScratchCommand] (org.ovirt.thread.pool-4-thread-32) [6998a7e2] Ending command with failure: org.ovirt.engine.core.bll.AddImageFromScratchCommand

Comment 1 Ilanit Stein 2014-11-05 09:27:01 UTC
Created attachment 953971 [details]
test logs

Comment 3 Liron Aravot 2014-11-09 17:07:29 UTC
Looking in the vdsm logs, it can be see that qemu-img fails inconsistently (sometimes it succeeds) - we create the lv and immediately fails on qemu-img create on input/output error.

1. Ilanit, any chance that there were issues with the storage machine? have you tried that again/on other storage? did it ever run fine on RHEL 6.6?
please attach also /var/log/messages relevant output

2. Kevin, any idea about that?


1 - failed
----------------------------------------------------------------------

Thread-31::DEBUG::2014-11-03 17:28:47,393::task::595::TaskManager.Task::(_updateState) Task=`54165d94-129f-4bcf-b48e-dd76a9dcaf54`::moving from state init -> state preparing
Thread-31::INFO::2014-11-03 17:28:47,393::logUtils::44::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='81eac9e9-9e9c-42b2-9321-a51460f87daa', spUUID='0c02850d-6a66-403e-b1ee-d5d46083685f', imgUUID='2a364c66-fc99-40b2-ae7f-a6c9a4e4430e', size='3221225472', volFormat=4, preallocate=2, diskType=2, volUUID='b1463849-697e-4f8d-8bc0-4c469c2f9c56', desc='', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000')
Thread-31::DEBUG::2014-11-03 17:28:47,394::resourceManager::198::ResourceManager.Request::(__init__) ResName=`Storage.81eac9e9-9e9c-42b2-9321-a51460f87daa`ReqID=`a4d32dcc-b9a2-4c89-a02d-51e731d38345`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '1464' at 'createVolume'


LV operations:

ab5d4378-cb58-4be8-b069-07a13cf5b0ef::DEBUG::2014-11-03 17:28:51,059::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm lvcreate --config " dev
ices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filte
r = [ \'a|/dev/mapper/360060160f4a030001418580f8161e411|/dev/mapper/360060160f4a0300020761df78f62e411|/dev/mapper/360060160f4a030005c85dd3b9062e411|/dev/mappe
r/360060160f4a030007e6871105c61e411|/dev/mapper/360060160f4a0300080b208299062e411|/dev/mapper/360060160f4a03000d2ef2a189062e411|/dev/mapper/360060160f4a03000d
cb0b29c5e63e411|\', \'r|.*|\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_da
ys = 0 } " --autobackup n --contiguous n --size 1024m --addtag OVIRT_VOL_INITIALIZING --name 93690b57-0297-4f8d-a7c1-4f670064485b 81eac9e9-9e9c-42b2-9321-a514
60f87daa' (cwd None)
f4cdf24f-4152-49c9-89b4-278f753fbb41::DEBUG::2014-11-03 17:28:51,131::lvm::296::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: This metadata update i
s NOT backed up\n  WARNING: This metadata update is NOT backed up\n'; <rc> = 0
f4cdf24f-4152-49c9-89b4-278f753fbb41::DEBUG::2014-11-03 17:28:51,132::lvm::513::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the 
operation mutex
f4cdf24f-4152-49c9-89b4-278f753fbb41::DEBUG::2014-11-03 17:28:51,132::lvm::525::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex
f4cdf24f-4152-49c9-89b4-278f753fbb41::DEBUG::2014-11-03 17:28:51,133::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm lvchange --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/360060160f4a030001418580f8161e411|/dev/mapper/360060160f4a0300020761df78f62e411|/dev/mapper/360060160f4a030005c85dd3b9062e411|/dev/mapper/360060160f4a030007e6871105c61e411|/dev/mapper/360060160f4a0300080b208299062e411|/dev/mapper/360060160f4a03000d2ef2a189062e411|/dev/mapper/360060160f4a03000dcb0b29c5e63e411|\', \'r|.*|\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } " --autobackup n --available n 81eac9e9-9e9c-42b2-9321-a51460f87daa/2e46ef36-7276-4d19-b1f9-59079c0e7bf1' (cwd None)
Dummy-30::DEBUG::2014-11-03 17:28:51,372::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) 'dd if=/rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None)


qemu-img:


c469c2f9c56 3145728K' (cwd None)
54165d94-129f-4bcf-b48e-dd76a9dcaf54::DEBUG::2014-11-03 17:28:48,063::volume::1042::Storage.Misc.excCmd::(createVolume) FAILED: <err> = '/rhev/data-center/0c0
2850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/images/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0-4c469c2f9c56: error
 while creating qcow2: Input/output error\n'; <rc> = 1
54165d94-129f-4bcf-b48e-dd76a9dcaf54::ERROR::2014-11-03 17:28:48,064::volume::502::Storage.Volume::(create) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 463, in create
    srcVolUUID, imgPath, volPath)
  File "/usr/share/vdsm/storage/blockVolume.py", line 165, in _create
    preallocate)
  File "/usr/share/vdsm/storage/volume.py", line 1044, in createVolume
    raise se.VolumeCreationError(out)
VolumeCreationError: Error creating a new volume: (["Formatting '/rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/i
mages/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0-4c469c2f9c56', fmt=qcow2 size=3221225472 encryption=off cluster_size=65536 "],)
----------------------------------------------------------------------


succeeds:
----------------------------------------------------------------------
Thread-31::INFO::2014-11-03 17:28:39,651::logUtils::44::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='81eac9e9-9e9c-42b2-9321-a51460f87daa', spUUID='0c02850d-6a66-403e-b1ee-d5d46083685f', imgUUID='0e120fde-8bde-4817-83dc-10f542229007', size='3221225472', volFormat=4, preallocate=2, diskType=2, volUUID='91597b02-0892-4f43-9e2e-31da2735213d', desc='', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000')

db851c79-2ea8-4297-a6ae-5507a630c2b2::DEBUG::2014-11-03 17:28:40,443::volume::1042::Storage.Misc.excCmd::(createVolume) '/usr/bin/qemu-img create -f qcow2 /rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/images/0e120fde-8bde-4817-83dc-10f542229007/91597b02-0892-4f43-9e2e-31da2735213d 3145728K' (cwd None)
db851c79-2ea8-4297-a6ae-5507a630c2b2::DEBUG::2014-11-03 17:28:40,461::volume::1042::Storage.Misc.excCmd::(createVolume) SUCCESS: <err> = ''; <rc> = 0

----------------------------------------------------------------------

Comment 4 Kevin Wolf 2014-11-10 11:07:06 UTC
(In reply to Liron Aravot from comment #3)
> 2. Kevin, any idea about that?

It's always hard to understand VDSM logs. Which part of the following error
actually comes from qemu-img?

> c469c2f9c56 3145728K' (cwd None)
> 54165d94-129f-4bcf-b48e-dd76a9dcaf54::DEBUG::2014-11-03
> 17:28:48,063::volume::1042::Storage.Misc.excCmd::(createVolume) FAILED:
> <err> = '/rhev/data-center/0c0
> 2850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/
> images/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0-
> 4c469c2f9c56: error
>  while creating qcow2: Input/output error\n'; <rc> = 1

I checked for the string "error while creating" in qemu-img and it doesn't
exist. This part must come from VDSM. This makes me wonder where the
"Input/output error" comes from.

If this was a qemu-img output line, it should be including 'qemu-img:' and
the file name - does VDSM parse the output lines?

Anyway -EIO isn't used internally in qemu-img in the path for creating qcow2
images. This must be an error coming from the kernel, probably in response to a
write request. You could try to confirm with strace, if you can reproduce this
manually outside VDSM.

Comment 5 GenadiC 2014-11-10 14:15:55 UTC
I have the same behavior in 3.5
Although I can see Status as locked in xml, it fails on Status not found

Comment 6 Ilanit Stein 2014-11-10 15:15:28 UTC
(In reply to comment #3)
Ilanit, any chance that there were issues with the storage machine? 

-> Not that I am aware of.
have you tried that again/on other storage?
-> No

did it ever run fine on RHEL 6.6?
-> Yes, on av12.1

please attach also /var/log/messages relevant output
-> It is already attached (test logs, see for each server, has it's own messages log.

Comment 7 Ilanit Stein 2014-11-11 09:38:28 UTC
This might be related to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1162078

Comment 8 Liron Aravot 2014-11-11 10:00:33 UTC
(In reply to Kevin Wolf from comment #4)
> (In reply to Liron Aravot from comment #3)
> > 2. Kevin, any idea about that?
> 
> It's always hard to understand VDSM logs. Which part of the following error
> actually comes from qemu-img?
> 
> > c469c2f9c56 3145728K' (cwd None)
> > 54165d94-129f-4bcf-b48e-dd76a9dcaf54::DEBUG::2014-11-03
> > 17:28:48,063::volume::1042::Storage.Misc.excCmd::(createVolume) FAILED:
> > <err> = '/rhev/data-center/0c0
> > 2850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/
> > images/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0-
> > 4c469c2f9c56: error
> >  while creating qcow2: Input/output error\n'; <rc> = 1
> 
> I checked for the string "error while creating" in qemu-img and it doesn't
> exist. This part must come from VDSM. This makes me wonder where the
> "Input/output error" comes from.
> 
> If this was a qemu-img output line, it should be including 'qemu-img:' and
> the file name - does VDSM parse the output lines?
> 

Hi Kevin,
that error comes from qemu-img, i assume it's added somewhere along the way - see the above when running it manually on my machine

[root@dhcp-1-43 ~]# /usr/bin/qemu-img create -f qcow2 /mnt/asdd/a.zzz 3145728K
Formatting '/mnt/asdd/a.zzz', fmt=qcow2 size=3221225472 encryption=off cluster_size=65536 
/mnt/asdd/a.zzz: error while creating qcow2: No such file or directory


looking in /var/log/messages i see no difference between the sucessful creation to the failed one:

failed-
----------------------------------------------------------
Nov  3 17:28:46 puma18 multipathd: dm-16: remove map (uevent)
Nov  3 17:28:46 puma18 multipathd: dm-16: devmap not registered, can't remove
Nov  3 17:28:48 puma18 vdsm root WARNING File: /rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/images/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0-4c469c2f9c56 already removed
Nov  3 17:28:48 puma18 vdsm TaskManager.Task ERROR Task=`54165d94-129f-4bcf-b48e-dd76a9dcaf54`::Unexpected error#012Traceback (most recent call last):#012  File "/usr/share/vdsm/storage/task.py", line 873, in _run#012    return fn(*args, **kargs)#012  File "/usr/share/vdsm/storage/task.py", line 334, in run#012    return self.cmd(*self.argslist, **self.argsdict)#012  File "/usr/share/vdsm/storage/securable.py", line 75, in wrapper#012    return method(self, *args, **kwargs)#012  File "/usr/share/vdsm/storage/sp.py", line 1771, in createVolume#012    desc=desc, srcImgUUID=srcImgUUID, srcVolUUID=srcVolUUID)#012  File "/usr/share/vdsm/storage/sd.py", line 421, in createVolume#012    preallocate, diskType, volUUID, desc, srcImgUUID, srcVolUUID)#012  File "/usr/share/vdsm/storage/volume.py", line 463, in create#012    srcVolUUID, imgPath, volPath)#012  File "/usr/share/vdsm/storage/blockVolume.py", line 165, in _create#012    preallocate)#012  File "/usr/share/vdsm/storage/volume.py", line 1044, in createVolume#012    raise se.VolumeCreationError(out)#012VolumeCreationError: Error creating a new volume: (["Formatting '/rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/images/2a364c66-fc99-40b2-ae7f-a6c9a4e4430e/b1463849-697e-4f8d-8bc0-4c469c2f9c56', fmt=qcow2 size=3221225472 encryption=off cluster_size=65536 "],)
Nov  3 17:28:48 puma18 multipathd: dm-16: remove map (uevent)
Nov  3 17:28:48 puma18 multipathd: dm-16: devmap not registered, can't remove
----------------------------------------------------------

succeeded:
----------------------------------------------------------
Nov  3 17:26:15 puma18 cpuspeed: Disabling ondemand cpu frequency scaling governor
Nov  3 17:26:15 puma18 cpuspeed: Enabling performance cpu frequency scaling governor
Nov  3 17:28:40 puma18 vdsm root WARNING File: /rhev/data-center/0c02850d-6a66-403e-b1ee-d5d46083685f/81eac9e9-9e9c-42b2-9321-a51460f87daa/images/0e120fde-8bde-4817-83dc-10f542229007/91597b02-0892-4f43-9e2e-31da2735213d already removed
Nov  3 17:28:41 puma18 multipathd: dm-16: remove map (uevent)
Nov  3 17:28:41 puma18 multipathd: dm-16: devmap not registered, can't remove
Nov  3 17:28:41 puma18 multipathd: dm-16: remove map (uevent)
Nov  3 17:28:41 puma18 multipathd: dm-16: devmap not registered, can't remove
----------------------------------------------------------

> Anyway -EIO isn't used internally in qemu-img in the path for creating qcow2
> images. This must be an error coming from the kernel, probably in response
> to a
> write request. You could try to confirm with strace, if you can reproduce
> this
> manually outside VDSM.

Well that doesn't reproduce consistently, as some calls finish with success. I assume from your answer that you haven't encountered that issue.

Ilanit, any chance you can try the above just be on the safe side on the test env?

Kevin, any ideas on how to proceed with that besides that?

thanks

Comment 9 Kevin Wolf 2014-11-11 10:29:35 UTC
(In reply to Liron Aravot from comment #8)
> (In reply to Kevin Wolf from comment #4)
> > I checked for the string "error while creating" in qemu-img and it doesn't
> > exist. This part must come from VDSM. This makes me wonder where the
> > "Input/output error" comes from.
> > 
> > If this was a qemu-img output line, it should be including 'qemu-img:' and
> > the file name - does VDSM parse the output lines?
> > 
> 
> Hi Kevin,
> that error comes from qemu-img, i assume it's added somewhere along the way
> - see the above when running it manually on my machine

Sorry, yes. I was looking at the RHEL 7 version of qemu-img.

The conclusion that there is no internal usage of -EIO remains the same for
RHEL 6, though. The error must come from the kernel.

> Well that doesn't reproduce consistently, as some calls finish with success.
> I assume from your answer that you haven't encountered that issue.

No, it doesn't ring a bell.

> Kevin, any ideas on how to proceed with that besides that?

I think finding a minimal reproducer outside of VDSM is the key. My expectation
is that the same should be possible to trigger even without qemu-img then, with
something as simple as dd, but an strace of qemu-img (with children) of a
failing case would be useful to confirm that qemu-img isn't trying to do
anything crazy.

Comment 10 Ilanit Stein 2014-11-11 16:39:08 UTC
This problem was not seen on same automatic test, and same storage server,   
for av12.1, running on rhel6.6 hosts.

Comment 11 Liron Aravot 2014-11-12 07:19:59 UTC
Ilanit, can you check what were the qemu versions in the success/failed runs?

Comment 12 Ilanit Stein 2014-11-12 08:12:14 UTC
Some more info:

1. Problem was not seen on same automatic test, and same storage server,
for 3.5, running on rhel7 hosts.
 
2. Same test, run on av13.1, with NFS, instead of iscsi - PASS (no add disk failures)

Comment 13 Ilanit Stein 2014-11-12 08:45:47 UTC
For both tests, the 1 that pass on av12.1, and these that fail, starting av12.4, the qemu-img is the same:

'qemu-img': {'release': '2.448.el6', 'buildtime': 1411764164L, 'version': '0.12.1.2'},

Comment 14 Fabian Deutsch 2014-11-19 16:41:24 UTC
I ma not sure about the inner workings of vdsm, but if it's LVM/dm related the it could be related to bug 1148979.

Comment 15 Allon Mureinik 2014-11-20 09:27:04 UTC
(In reply to Fabian Deutsch from comment #14)
> I ma not sure about the inner workings of vdsm, but if it's LVM/dm related
> the it could be related to bug 1148979.
Nir, does this align with your findings?

Comment 16 Liron Aravot 2014-11-20 13:55:42 UTC
Fabian,
we tested with device-mapper-multipath-0.4.9-80.el6_6.1 (which the bug you suggested was fixed in), but still managed to reproduce - so i'm afraid that this doesn't solve it.

thanks,
Liron

Comment 17 Allon Mureinik 2014-11-23 20:58:56 UTC
*** Bug 1164030 has been marked as a duplicate of this bug. ***

Comment 19 Aharon Canan 2014-11-30 11:27:49 UTC
by mistake moved to verified.

Comment 20 Raz Tamir 2014-12-07 08:49:50 UTC
Verified - vt13.1