Bug 1506677 - Hotplug fail when attaching a disk with cow format on glusterfs
Summary: Hotplug fail when attaching a disk with cow format on glusterfs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.0
: 4.2.0
Assignee: Denis Chaplygin
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks: 1509635
TreeView+ depends on / blocked
 
Reported: 2017-10-26 14:17 UTC by Avihai
Modified: 2018-02-12 10:09 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-02-12 10:09:51 UTC
oVirt Team: Gluster
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)
engine , vdsm log (1.98 MB, application/x-gzip)
2017-10-26 14:17 UTC, Avihai
no flags Details
engine , vdsm logs new reproduction (348.90 KB, application/x-gzip)
2017-11-21 16:25 UTC, Avihai
no flags Details
libvirt logs (946.83 KB, application/x-gzip)
2017-11-21 18:32 UTC, Avihai
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 84563 0 master MERGED vm: Send DiskType with every disk creation/hotplug 2020-08-02 13:57:56 UTC
oVirt gerrit 84583 0 master MERGED virt: Made disk type detection code reusable. 2020-08-02 13:57:56 UTC
oVirt gerrit 84588 0 master MERGED virt: Libvirt VM XML builder should use disk type. 2020-08-02 13:57:56 UTC
oVirt gerrit 84592 0 master MERGED virt: Added network disk support to libvirtxml 2020-08-02 13:57:56 UTC

Description Avihai 2017-10-26 14:17:52 UTC
Created attachment 1343781 [details]
engine , vdsm  log

Description of problem:
Hotplug fail when attaching a disk with cow format on glusterfs

Version-Release number of selected component (if applicable):
Engine:
ovirt-engine-4.2.0-0.0.master.20171025204923.git6f4cbc5.el7.centos.noarch 

VDSM:
4.20.3-224.gitef2ce48

How reproducible:
100%


Steps to Reproduce:
1.Create VM + bootable disk & start it.
2.VIA RESTAPI ONLY , create disk on gluster domain with format =cow.

For example:
POST:

URL:
https://storage-ge-08.scl.lab.tlv.redhat.com/ovirt-engine/api/disks

 body:

<disk>
  <storage_domains>
    <storage_domain id="7db03af9-67b7-4ebd-a7e5-4c2cbc1f5d72"/>
  </storage_domains>
  <name>mydisk</name>
  <provisioned_size>1048576</provisioned_size>
  <format>cow</format>
</disk>

3. Attach the newly created disk to the running VM.

Actual results:
Hotplug fail.


Expected results:


Additional info:
Engine:
2017-10-26 16:49:03,030+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] START, HotPlugDiskVDSCommand(HostName = host_mixed_1, Ho
tPlugDiskVDSParameters:{hostId='8c188005-b476-4837-b147-4dc84d2caac1', vmId='cd1ef6fb-3e4d-474b-802f-ce82d1ea46ec', diskId='616d13fc-68d7-4284-bbb0-25711d4b490d', addressMap='[bus=0, controller=0, unit=1, type=d
rive, target=0]'}), log id: 1573ec6
2017-10-26 16:49:05,936+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] Failed in 'HotPlugDiskVDS' method
2017-10-26 16:49:05,949+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM 
host_mixed_1 command HotPlugDiskVDS failed: internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-scsi0-0-0-1' could not be initialized
2017-10-26 16:49:05,949+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HotPl
ugDiskVDSCommand' return value 'StatusOnlyReturn [status=Status [code=45, message=internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-scsi0-0-0-1' could not be initialized]]'
2017-10-26 16:49:05,949+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] HostName = host_mixed_1
2017-10-26 16:49:05,949+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] Command 'HotPlugDiskVDSCommand(HostName = host_mixed_1, 
HotPlugDiskVDSParameters:{hostId='8c188005-b476-4837-b147-4dc84d2caac1', vmId='cd1ef6fb-3e4d-474b-802f-ce82d1ea46ec', diskId='616d13fc-68d7-4284-bbb0-25711d4b490d', addressMap='[bus=0, controller=0, unit=1, type
=drive, target=0]'})' execution failed: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-scsi0-0-0-
1' could not be initialized, code = 45
2017-10-26 16:49:05,950+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] FINISH, HotPlugDiskVDSCommand, log id: 1573ec6
2017-10-26 16:49:05,950+03 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] Command 'org.ovirt.engine.core.bll.storage.disk.AttachDiskT
oVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = internal error: unable to execute QEMU 
command '__com.redhat_drive_add': Device 'drive-scsi0-0-0-1' could not be initialized, code = 45 (Failed with error FailedToPlugDisk and code 45)
2017-10-26 16:49:05,959+03 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] Transaction rolled-back for command 'org.ovirt.engine.core.
bll.storage.disk.AttachDiskToVmCommand'.
2017-10-26 16:49:06,001+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [aa61a91e-44a5-4cd1-adaa-c653f70d1e91] EVENT_ID: USER_FAILED_ATTACH_DISK_TO_VM(2,017), Failed to attach Disk mydisk to VM vm_TestCase18258_2615484417 (User: admin@internal-authz).


vdsm:
2017-10-26 16:49:03,411+0300 INFO  (jsonrpc/5) [virt.vm] (vmId='cd1ef6fb-3e4d-474b-802f-ce82d1ea46ec') Hotplug disk xml: <?xml version='1.0' encoding='UTF-8'?>
<disk device="disk" snapshot="no" type="network">
    <address bus="0" controller="0" target="0" type="drive" unit="1" />
    <source name="storage_local_ge8_volume_1/7db03af9-67b7-4ebd-a7e5-4c2cbc1f5d72/images/616d13fc-68d7-4284-bbb0-25711d4b490d/c1d04aa6-868e-4d28-ba14-f516635d4669" protocol="gluster">
        <host name="gluster01.scl.lab.tlv.redhat.com" port="0" transport="tcp" />
    </source>
    <target bus="scsi" dev="sde" />
    <serial>616d13fc-68d7-4284-bbb0-25711d4b490d</serial>
    <driver cache="none" error_policy="stop" io="threads" name="qemu" type="qcow2" />
</disk>
 (vm:3569)
2017-10-26 16:49:04,251+0300 DEBUG (mailbox-spm) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 dd if=/rhev/data-center/d5861267-455b-4b1d-9b5d-355175c78189/mastersd/dom_md/inbox iflag=direct,fullblock co
unt=1 bs=1024000 (cwd None) (commands:70)
2017-10-26 16:49:04,277+0300 DEBUG (mailbox-spm) [storage.Misc.excCmd] SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0102381 s, 100 MB/s\n'; <rc> = 0 (commands:94)
2017-10-26 16:49:05,909+0300 ERROR (jsonrpc/5) [virt.vm] (vmId='cd1ef6fb-3e4d-474b-802f-ce82d1ea46ec') Hotplug failed (vm:3577)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3575, in hotplugDisk
    self._dom.attachDevice(driveXml)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 125, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 586, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 540, in attachDevice
    if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
libvirtError: internal error: unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-scsi0-0-0-1' could not be initialized

Comment 1 Avihai 2017-10-26 14:23:09 UTC
Bug was found running automation:
test_download_image.py::TestCase18258

Bug marked as 'regression' as it does not occur in 4.1.7.3

Comment 2 Allon Mureinik 2017-10-26 15:52:06 UTC
I wouldn't be surprised if there's some common root cause that causes both this and bug 1506608.

Comment 3 Daniel Erez 2017-10-30 13:43:47 UTC
@Denis - have you checked hotplug disk with libgfapi?

Comment 4 Raz Tamir 2017-11-05 11:10:11 UTC
Raising severity.
This fails few TPs from our tier 1 automation.

@Daniel,
libgfapi is not enabled on this execution

Comment 5 Allon Mureinik 2017-11-05 16:06:38 UTC
(In reply to Raz Tamir from comment #4)
> Raising severity.
> This fails few TPs from our tier 1 automation.
> 
> @Daniel,
> libgfapi is not enabled on this execution
It's enabled by default in 4.2 since commit 8dc926eae05a (merged Aug 1st). Are you explicitly switching it off and/or running on a 4.1 DC?

Comment 6 Raz Tamir 2017-11-06 15:57:28 UTC
No,

libgfapi is enabled:
LibgfApiSupported: true version: 4.2

Comment 7 Allon Mureinik 2017-11-06 16:08:22 UTC
Raz, maybe we have a misunderstanding here. To quote your last two comments:

(In reply to Raz Tamir from comment #4)
> @Daniel,
> libgfapi is not enabled on this execution

(In reply to Raz Tamir from comment #6)
> No,
> 
> libgfapi is enabled:
> LibgfApiSupported: true version: 4.2


So which is it?
Or did I misunderstand one of these comments?

Comment 8 Raz Tamir 2017-11-07 12:52:09 UTC
Too much noise from comment #4

Ignore comments related to libgfapi from my side.
The needinfo for Daniel is still relevant

Comment 9 Daniel Erez 2017-11-07 13:20:34 UTC
(In reply to Daniel Erez from comment #3)
> @Denis - have you checked hotplug disk with libgfapi?

Comment 10 Yaniv Kaul 2017-11-16 08:22:51 UTC
Daniel - this is marked as blocker for oVirt GA. What is the status of the bug?

Comment 11 Denis Chaplygin 2017-11-21 14:38:27 UTC
I'm not able to reproduce it in my environment neither with libgfapi enabled nor disabled. 

In my case disk hotplugging works perfectly.

Comment 12 Daniel Erez 2017-11-21 16:03:51 UTC
(In reply to Denis Chaplygin from comment #11)
> I'm not able to reproduce it in my environment neither with libgfapi enabled
> nor disabled. 
> 
> In my case disk hotplugging works perfectly.

Closing per comment 11.

Comment 13 Avihai 2017-11-21 16:21:49 UTC
Issue reproduce on my environment (storage-ge-08.scl.lab.tlv.redhat.com) with exact same scenario, 

See logs attached.

Engine:
2017-11-21 18:17:01,420+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] Failed in 'HotPlugDiskVDS' method
2017-11-21 18:17:01,446+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM
 host_mixed_3 command HotPlugDiskVDS failed: failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error
2017-11-21 18:17:01,446+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand' return value 'StatusOnlyReturn [status=Status [code=45, message=failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error]]'
2017-11-21 18:17:01,446+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] HostName = host_mixed_3
2017-11-21 18:17:01,447+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] Command 'HotPlugDiskVDSCommand(HostName = host_mixed_3, HotPlugDiskVDSParameters:{hostId='a1060797-25e3-4ca5-9c62-67d9d830f5e6', vmId='32196200-1e99-4393-8cd5-3ae2bc7441d4', diskId='93b45fd5-5a37-4340-a327-ac35a6a0216b', addressMap='[bus=0, controller=0, unit=1, type=drive, target=0]'})' execution failed: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error, code = 45
2017-11-21 18:17:01,447+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] FINISH, HotPlugDiskVDSCommand, log id: 35abedc1
2017-11-21 18:17:01,451+02 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] Command 'org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error, code = 45 (Failed with error FailedToPlugDisk and code 45)
2017-11-21 18:17:01,771+02 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-10) [75a44285-6766-40bc-a749-c8a662b6235d] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand'.


VDSM:
2017-11-21 18:17:01,381+0200 DEBUG (check/loop) [storage.check] START check u'/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge8__volume__2/db55099b-738b-4182-ba67-6560ae68fc6b
/dom_md/metadata' (delay=0.00) (check:282)
2017-11-21 18:17:01,392+0200 ERROR (jsonrpc/2) [virt.vm] (vmId='32196200-1e99-4393-8cd5-3ae2bc7441d4') Hotplug failed (vm:3564)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3562, in hotplugDisk
    self._dom.attachDevice(driveXml)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 125, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 586, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 540, in attachDevice
    if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
libvirtError: failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error

Comment 14 Avihai 2017-11-21 16:25:08 UTC
Created attachment 1356852 [details]
engine , vdsm  logs new reproduction

Adding logs .


New reproduction found on builds:
Engine: 4.2.0-0.0.master.20171118160347.git80b3e22.el7.centos
VDSM: 4.20.7-35.git0f9b415.

Comment 15 Allon Mureinik 2017-11-21 16:48:45 UTC
Avihai - can you also attach libvirt/qemu's logs?

Comment 16 Daniel Erez 2017-11-21 16:53:41 UTC
@Denis - can you please take a look at the new logs?

Comment 17 Avihai 2017-11-21 18:32:02 UTC
(In reply to Allon Mureinik from comment #15)
> Avihai - can you also attach libvirt/qemu's logs?

Added relevant logs.

Pay attention that for some reason libvirt.log is empty, unfortunetly only previous libvirt log has content.

Comment 18 Avihai 2017-11-21 18:32:35 UTC
Created attachment 1356930 [details]
libvirt logs

Comment 19 Denis Chaplygin 2017-11-22 09:35:37 UTC
Well, in that case it failed with a clear error:

libvirtError: failed to initialize gluster connection (src=0x7fcd3c000d80 priv=0x7fcd3c001d50): Input/output error


Probably you have an issue with gluster cluster connectivity or the volume itself.

You need to verify, that 
* gluster01.scl.lab.tlv.redhat.com is available from your VM host
* Gluster ports are not filtered on both VM host and all gluster hosts
* gluster peers can reach each other
* Volume storage_local_ge8_volume_0 is started and all bricks are running
* Volume is not under heal/split-brain condition.

Comment 20 Denis Chaplygin 2017-11-22 12:39:31 UTC
There are two problems:

1)Gluster cluster is in inconsistent state and when you try to hot plug a new drive, operation fails due to gluster error. That issue should disappear after gluster cluster repair
2)For some reason, even with libgfapi enabled, first disk of freshly started VM is a 'file' disk, not 'network' disk and i can reproduce it on my environment too. This definitely needs to be fixed urgently and i'm working on that right now. Unfortunately, when it will be fixed, Avihai's VM will not be able to start, as it will requires working gluster cluster at startup phase.

Comment 21 Daniel Erez 2017-11-22 16:30:03 UTC
(In reply to Denis Chaplygin from comment #20)
> There are two problems:
> 
> 1)Gluster cluster is in inconsistent state and when you try to hot plug a
> new drive, operation fails due to gluster error. That issue should disappear
> after gluster cluster repair
> 2)For some reason, even with libgfapi enabled, first disk of freshly started
> VM is a 'file' disk, not 'network' disk and i can reproduce it on my
> environment too. This definitely needs to be fixed urgently and i'm working
> on that right now. Unfortunately, when it will be fixed, Avihai's VM will
> not be able to start, as it will requires working gluster cluster at startup
> phase.

Thanks Denis!
So I'm moving the bug to Gluster team for further investigation.

Comment 22 Denis Chaplygin 2017-11-22 16:38:41 UTC
Leave it on me, i just found a root cause inside of engine and fill make a patch tomorrow.

Comment 23 Avihai 2017-11-27 09:24:27 UTC
(In reply to Denis Chaplygin from comment #19)
> Well, in that case it failed with a clear error:
> 
> libvirtError: failed to initialize gluster connection (src=0x7fcd3c000d80
> priv=0x7fcd3c001d50): Input/output error
> 
> 
> Probably you have an issue with gluster cluster connectivity or the volume
> itself.
> 
> You need to verify, that 
> * gluster01.scl.lab.tlv.redhat.com is available from your VM host
> * Gluster ports are not filtered on both VM host and all gluster hosts
> * gluster peers can reach each other
> * Volume storage_local_ge8_volume_0 is started and all bricks are running
> * Volume is not under heal/split-brain condition.

As root cause is known my input is not necessary.

My gluster cluster is now up & fixed finally.
Please tell me if you are missing any more info .

Comment 24 Avihai 2017-12-13 06:29:28 UTC
Bug verification is blocked due to Bug 1497511.

Comment 25 Avihai 2018-02-01 11:44:30 UTC
Verified on:
Engine 4.2.1.4-0.1.el7
VDSM 4.20.17-1
Libvirt 3.9.0-7
Qemu 2.10.0-18

Comment 26 Sandro Bonazzola 2018-02-12 10:09:51 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.