Bug 1719789

Summary: dynamic_ownership enabled breaks file ownership after virtual machine migration and shutdown for disk images on Gluster SD when libgfapi is enabled
Product: [oVirt] vdsm Reporter: Daniel Milewski <daniel.milewski>
Component: GeneralAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED CURRENTRELEASE QA Contact: Beni Pelled <bpelled>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.30.13CC: budic, bugs, dev, godas, michal.skrivanek, rbarry, sabose, swachira
Target Milestone: ovirt-4.3.6Keywords: ZStream
Target Release: 4.30.29Flags: michal.skrivanek: ovirt-4.3?
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.30.29 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-26 19:42:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daniel Milewski 2019-06-12 14:46:50 UTC
Description of problem:
dynamic_ownership enabled in oVirt 4.3 changes file ownership for machines which are migrated and then shut down, so that they can't be powered on until one fixes the ownership manually. It happens only for images located on Gluster storage domain when libgfapi is enabled. The ownership change is aparrently done by libvirtd. After switching off dynamic_ownership in /etc/libvirtd/qemu.conf on oVirt hosts correct ownership is maintained.

Version-Release number of selected component (if applicable):
oVirt 4.3.3 and VDSM 4.30.13

How reproducible:
Happens every time.

Steps to Reproduce:
1. Power on a virtual machine.
2. Migrate the virtual machine to a different host.
3. Shut down the virtual machine.
4. Try to power on the virtual machine again.

Actual results:
After migration disk image ownership is changed from vdsm:kvm to qemu:qemu. When virtual machine shuts down it is changed again from qemu:qemu to root:root, preventing vdsmd and libvirtd from accessing the disk image. Engine log says:
VM vm-1 is down with error. Exit message: Bad volume specification {'protocol': 'gluster', 'address': {'bus': '0', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'serial': 'cb868474-52fc-46a3-9a5c-c069ba1c0e02', 'index': 0, 'iface': 'scsi', 'apparentsize': '17179869184', 'specParams': {}, 'cache': 'none', 'imageID': 'cb868474-52fc-46a3-9a5c-c069ba1c0e02', 'truesize': '17179869184', 'type': 'disk', 'domainID': '9d33b830-6fc9-4190-a33a-19940a3a8589', 'reqsize': '0', 'format': 'raw', 'poolID': '2ccac895-215d-4883-a353-003d9ea272b1', 'device': 'disk', 'path': 'portal-shared/9d33b830-6fc9-4190-a33a-19940a3a8589/images/cb868474-52fc-46a3-9a5c-c069ba1c0e02/fa4ff8bf-89ef-4ceb-95a1-6d0985f1589f', 'propagateErrors': 'off', 'name': 'sda', 'bootOrder': '1', 'volumeID': 'fa4ff8bf-89ef-4ceb-95a1-6d0985f1589f', 'diskType': 'network', 'alias': 'ua-cb868474-52fc-46a3-9a5c-c069ba1c0e02', 'hosts': [{'name': 'backend-1', 'port': '0'}], 'discard': False}.

Expected results:
libvirtd correctly manages file ownership or doesn't change it.

Additional info:
It looks similar to bug 1666795 and its duplicates/dependent bugs but that one should be already fixed in 4.3.3.

Comment 1 Ryan Barry 2019-06-13 00:10:34 UTC
Gobinda, how is this not already reported on RHHI?

Is there a gluster replication setting getting in the way here?

Comment 2 Michal Skrivanek 2019-06-13 07:23:03 UTC
well, it was in bug 1687126

however the merged solution is to do this on incoming migration:
    for disk_type in (storage.DISK_TYPE.BLOCK, storage.DISK_TYPE.FILE,):
        xpath = "./devices//disk[@type='%s']//source" % (disk_type,)
        for element in tree.findall(xpath):
            storage.disable_dynamic_ownership(element)

...doesn't gluster use the NETWORK type?

Comment 3 Daniel Milewski 2019-06-13 09:37:53 UTC
Yes, I believe that Gluster disk images use the network type when libgfapi is enabled:

<disk type='network' device='disk' snapshot='no'>
  <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
  <source protocol='gluster' name='portal-shared/9d33b830-6fc9-4190-a33a-19940a3a8589/images/cb868474-52fc-46a3-9a5c-c069ba1c0e02/fa4ff8bf-89ef-4ceb-95a1-6d0985f1589f'>
    <host name='backend-1' port='24007'/>
  </source>
  <backingStore/>
  <target dev='sda' bus='scsi'/>
  <serial>cb868474-52fc-46a3-9a5c-c069ba1c0e02</serial>
  <boot order='1'/>
  <alias name='ua-cb868474-52fc-46a3-9a5c-c069ba1c0e02'/>
  <address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>

As far as I can see, it's the only disk type for which dynamic ownership is not disabled in lib/vdsm/virt/vmdevices/storage.py.

Comment 4 Ivan 2019-07-02 15:49:22 UTC
We have the same problem. 

oVirt 4.3.3 - 4.3.5rc2 and VDSM 4.30.20

Comment 5 Sahina Bose 2019-07-19 06:37:48 UTC
(In reply to Ryan Barry from comment #1)
> Gobinda, how is this not already reported on RHHI?
> 
> Is there a gluster replication setting getting in the way here?

in RHHI, we do not use libfapi, and the type = FILE not NETWORK

Comment 6 Sahina Bose 2019-07-19 06:44:02 UTC
Assigning to virt, as the NETWORK disk type would need to be handled similar to bug 1666795?

Comment 7 Michal Skrivanek 2019-07-19 15:34:00 UTC
The problem is in the original code adding seclabel in vdsm which was obsoleted by https://gerrit.ovirt.org/#/c/98088/. VMs started before that(i.e. VMs from 4.2) are wrong and the original code is still needed to prevent libvirt messing ownership

Comment 8 RHV bug bot 2019-08-15 14:05:16 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Tag 'v4.30.27' doesn't contain patch 'https://gerrit.ovirt.org/102604']
gitweb: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/tags/v4.30.27

For more info please contact: infra

Comment 9 Beni Pelled 2019-09-04 10:35:42 UTC
Verified on RHV 4.3.6.3-0.1.el7 with vdsm-4.30.29-1.el7ev.x86_64 and libvirt-4.5.0-23.el7.x86_64

Verification steps:

1. Verify dynamic_ownership is enabled (/etc/libvirt/qemu.conf on hosts contains dynamic_ownership=1)
2. Enabled libgfapi by 'engine-config --set LibgfApiSupported=True' (choose 4.3)
3. Reboot engine by 'systemctl restart ovirt-engine.service'
4. Create vm with disk located on gluster storage domain
5. Power on the vm
6. Migrate the vm to any host
7. Shutdown the vm
8. Power on the vm

Result:

- VM is up and running.
- VM image-file owner remains vdsm:kvm from the creation stage through migration and after the shutdown.

Comment 10 Sandro Bonazzola 2019-09-26 19:42:47 UTC
This bugzilla is included in oVirt 4.3.6 release, published on September 26th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.6 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.