Created attachment 1343781 [details] engine , vdsm log Description of problem: Hotplug fail when attaching a disk with cow format on glusterfs Version-Release number of selected component (if applicable): Engine: ovirt-engine-4.2.0-0.0.master.20171025204923.git6f4cbc5.el7.centos.noarch VDSM: 4.20.3-224.gitef2ce48 How reproducible: 100% Steps to Reproduce: 1.Create VM + bootable disk & start it. 2.VIA RESTAPI ONLY , create disk on gluster domain with format =cow. For example: POST: URL: https://storage-ge-08.scl.lab.tlv.redhat.com/ovirt-engine/api/disks body: <disk> <storage_domains> <storage_domain id="7db03af9-67b7-4ebd-a7e5-4c2cbc1f5d72"/> </storage_domains> <name>mydisk</name> <provisioned_size>1048576</provisioned_size> <format>cow</format> </disk> 3. Attach the newly created disk to the running VM. Actual results: Hotplug fail. Expected results: Additional info: Engine: 2017-10-26 16:49:03,030+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] START, HotPlugDiskVDSCommand(HostName = host_mixed_1, Ho tPlugDiskVDSParameters:{hostId='8c188005-b476-4837-b147-4dc84d2caac1', vmId='cd1ef6fb-3e4d-474b-802f-ce82d1ea46ec', diskId='616d13fc-68d7-4284-bbb0-25711d4b490d', addressMap='[bus=0, controller=0, unit=1, type=d rive, target=0]'}), log id: 1573ec6 2017-10-26 16:49:05,936+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] Failed in 'HotPlugDiskVDS' method 2017-10-26 16:49:05,949+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command HotPlugDiskVDS failed: internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-scsi0-0-0-1' could not be initialized 2017-10-26 16:49:05,949+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HotPl ugDiskVDSCommand' return value 'StatusOnlyReturn [status=Status [code=45, message=internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-scsi0-0-0-1' could not be initialized]]' 2017-10-26 16:49:05,949+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] HostName = host_mixed_1 2017-10-26 16:49:05,949+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] Command 'HotPlugDiskVDSCommand(HostName = host_mixed_1, HotPlugDiskVDSParameters:{hostId='8c188005-b476-4837-b147-4dc84d2caac1', vmId='cd1ef6fb-3e4d-474b-802f-ce82d1ea46ec', diskId='616d13fc-68d7-4284-bbb0-25711d4b490d', addressMap='[bus=0, controller=0, unit=1, type =drive, target=0]'})' execution failed: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-scsi0-0-0- 1' could not be initialized, code = 45 2017-10-26 16:49:05,950+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] FINISH, HotPlugDiskVDSCommand, log id: 1573ec6 2017-10-26 16:49:05,950+03 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] Command 'org.ovirt.engine.core.bll.storage.disk.AttachDiskT oVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-scsi0-0-0-1' could not be initialized, code = 45 (Failed with error FailedToPlugDisk and code 45) 2017-10-26 16:49:05,959+03 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] Transaction rolled-back for command 'org.ovirt.engine.core. bll.storage.disk.AttachDiskToVmCommand'. 2017-10-26 16:49:06,001+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] EVENT_ID: USER_FAILED_ATTACH_DISK_TO_VM(2,017), Failed to attach Disk mydisk to VM vm_TestCase18258_2615484417 (User: admin@internal-authz). vdsm: 2017-10-26 16:49:03,411+0300 INFO (jsonrpc/5) [virt.vm] (vmId='cd1ef6fb-3e4d-474b-802f-ce82d1ea46ec') Hotplug disk xml: <?xml version='1.0' encoding='UTF-8'?> <disk device="disk" snapshot="no" type="network"> <address bus="0" controller="0" target="0" type="drive" unit="1" /> <source name="storage_local_ge8_volume_1/7db03af9-67b7-4ebd-a7e5-4c2cbc1f5d72/images/616d13fc-68d7-4284-bbb0-25711d4b490d/c1d04aa6-868e-4d28-ba14-f516635d4669" protocol="gluster"> <host name="gluster01.scl.lab.tlv.redhat.com" port="0" transport="tcp" /> </source> <target bus="scsi" dev="sde" /> <serial>616d13fc-68d7-4284-bbb0-25711d4b490d</serial> <driver cache="none" error_policy="stop" io="threads" name="qemu" type="qcow2" /> </disk> (vm:3569) 2017-10-26 16:49:04,251+0300 DEBUG (mailbox-spm) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 dd if=/rhev/data-center/d5861267-455b-4b1d-9b5d-355175c78189/mastersd/dom_md/inbox iflag=direct,fullblock co unt=1 bs=1024000 (cwd None) (commands:70) 2017-10-26 16:49:04,277+0300 DEBUG (mailbox-spm) [storage.Misc.excCmd] SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0102381 s, 100 MB/s\n'; <rc> = 0 (commands:94) 2017-10-26 16:49:05,909+0300 ERROR (jsonrpc/5) [virt.vm] (vmId='cd1ef6fb-3e4d-474b-802f-ce82d1ea46ec') Hotplug failed (vm:3577) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3575, in hotplugDisk self._dom.attachDevice(driveXml) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 125, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 586, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 540, in attachDevice if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self) libvirtError: internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-scsi0-0-0-1' could not be initialized
Bug was found running automation: test_download_image.py::TestCase18258 Bug marked as 'regression' as it does not occur in 4.1.7.3
I wouldn't be surprised if there's some common root cause that causes both this and bug 1506608.
@Denis - have you checked hotplug disk with libgfapi?
Raising severity. This fails few TPs from our tier 1 automation. @Daniel, libgfapi is not enabled on this execution
(In reply to Raz Tamir from comment #4) > Raising severity. > This fails few TPs from our tier 1 automation. > > @Daniel, > libgfapi is not enabled on this execution It's enabled by default in 4.2 since commit 8dc926eae05a (merged Aug 1st). Are you explicitly switching it off and/or running on a 4.1 DC?
No, libgfapi is enabled: LibgfApiSupported: true version: 4.2
Raz, maybe we have a misunderstanding here. To quote your last two comments: (In reply to Raz Tamir from comment #4) > @Daniel, > libgfapi is not enabled on this execution (In reply to Raz Tamir from comment #6) > No, > > libgfapi is enabled: > LibgfApiSupported: true version: 4.2 So which is it? Or did I misunderstand one of these comments?
Too much noise from comment #4 Ignore comments related to libgfapi from my side. The needinfo for Daniel is still relevant
(In reply to Daniel Erez from comment #3) > @Denis - have you checked hotplug disk with libgfapi?
Daniel - this is marked as blocker for oVirt GA. What is the status of the bug?
I'm not able to reproduce it in my environment neither with libgfapi enabled nor disabled. In my case disk hotplugging works perfectly.
(In reply to Denis Chaplygin from comment #11) > I'm not able to reproduce it in my environment neither with libgfapi enabled > nor disabled. > > In my case disk hotplugging works perfectly. Closing per comment 11.
Issue reproduce on my environment (storage-ge-08.scl.lab.tlv.redhat.com) with exact same scenario, See logs attached. Engine: 2017-11-21 18:17:01,420+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] Failed in 'HotPlugDiskVDS' method 2017-11-21 18:17:01,446+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_3 command HotPlugDiskVDS failed: failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error 2017-11-21 18:17:01,446+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand' return value 'StatusOnlyReturn [status=Status [code=45, message=failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error]]' 2017-11-21 18:17:01,446+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] HostName = host_mixed_3 2017-11-21 18:17:01,447+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] Command 'HotPlugDiskVDSCommand(HostName = host_mixed_3, HotPlugDiskVDSParameters:{hostId='a1060797-25e3-4ca5-9c62-67d9d830f5e6', vmId='32196200-1e99-4393-8cd5-3ae2bc7441d4', diskId='93b45fd5-5a37-4340-a327-ac35a6a0216b', addressMap='[bus=0, controller=0, unit=1, type=drive, target=0]'})' execution failed: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error, code = 45 2017-11-21 18:17:01,447+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] FINISH, HotPlugDiskVDSCommand, log id: 35abedc1 2017-11-21 18:17:01,451+02 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] Command 'org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error, code = 45 (Failed with error FailedToPlugDisk and code 45) 2017-11-21 18:17:01,771+02 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand'. VDSM: 2017-11-21 18:17:01,381+0200 DEBUG (check/loop) [storage.check] START check u'/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge8__volume__2/db55099b-738b-4182-ba67-6560ae68fc6b /dom_md/metadata' (delay=0.00) (check:282) 2017-11-21 18:17:01,392+0200 ERROR (jsonrpc/2) [virt.vm] (vmId='32196200-1e99-4393-8cd5-3ae2bc7441d4') Hotplug failed (vm:3564) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3562, in hotplugDisk self._dom.attachDevice(driveXml) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 125, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 586, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 540, in attachDevice if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self) libvirtError: failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error
Created attachment 1356852 [details] engine , vdsm logs new reproduction Adding logs . New reproduction found on builds: Engine: 4.2.0-0.0.master.20171118160347.git80b3e22.el7.centos VDSM: 4.20.7-35.git0f9b415.
Avihai - can you also attach libvirt/qemu's logs?
@Denis - can you please take a look at the new logs?
(In reply to Allon Mureinik from comment #15) > Avihai - can you also attach libvirt/qemu's logs? Added relevant logs. Pay attention that for some reason libvirt.log is empty, unfortunetly only previous libvirt log has content.
Created attachment 1356930 [details] libvirt logs
Well, in that case it failed with a clear error: libvirtError: failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error Probably you have an issue with gluster cluster connectivity or the volume itself. You need to verify, that * gluster01.scl.lab.tlv.redhat.com is available from your VM host * Gluster ports are not filtered on both VM host and all gluster hosts * gluster peers can reach each other * Volume storage_local_ge8_volume_0 is started and all bricks are running * Volume is not under heal/split-brain condition.
There are two problems: 1)Gluster cluster is in inconsistent state and when you try to hot plug a new drive, operation fails due to gluster error. That issue should disappear after gluster cluster repair 2)For some reason, even with libgfapi enabled, first disk of freshly started VM is a 'file' disk, not 'network' disk and i can reproduce it on my environment too. This definitely needs to be fixed urgently and i'm working on that right now. Unfortunately, when it will be fixed, Avihai's VM will not be able to start, as it will requires working gluster cluster at startup phase.
(In reply to Denis Chaplygin from comment #20) > There are two problems: > > 1)Gluster cluster is in inconsistent state and when you try to hot plug a > new drive, operation fails due to gluster error. That issue should disappear > after gluster cluster repair > 2)For some reason, even with libgfapi enabled, first disk of freshly started > VM is a 'file' disk, not 'network' disk and i can reproduce it on my > environment too. This definitely needs to be fixed urgently and i'm working > on that right now. Unfortunately, when it will be fixed, Avihai's VM will > not be able to start, as it will requires working gluster cluster at startup > phase. Thanks Denis! So I'm moving the bug to Gluster team for further investigation.
Leave it on me, i just found a root cause inside of engine and fill make a patch tomorrow.
(In reply to Denis Chaplygin from comment #19) > Well, in that case it failed with a clear error: > > libvirtError: failed to initialize gluster connection (src=0x7fcd3c000d80 > priv=0x7fcd3c001d50): Input/output error > > > Probably you have an issue with gluster cluster connectivity or the volume > itself. > > You need to verify, that > * gluster01.scl.lab.tlv.redhat.com is available from your VM host > * Gluster ports are not filtered on both VM host and all gluster hosts > * gluster peers can reach each other > * Volume storage_local_ge8_volume_0 is started and all bricks are running > * Volume is not under heal/split-brain condition. As root cause is known my input is not necessary. My gluster cluster is now up & fixed finally. Please tell me if you are missing any more info .
Bug verification is blocked due to Bug 1497511.
Verified on: Engine 4.2.1.4-0.1.el7 VDSM 4.20.17-1 Libvirt 3.9.0-7 Qemu 2.10.0-18
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.