Bug 1774373 - [CNV 2.2] VMI fails to start on "Unable to set XATTR trusted.libvirt.security.dac"
Summary: [CNV 2.2] VMI fails to start on "Unable to set XATTR trusted.libvirt.security...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 2.2.0
Assignee: Roman Mohr
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks: 1783343
TreeView+ depends on / blocked
 
Reported: 2019-11-20 07:54 UTC by Ruth Netser
Modified: 2020-01-31 12:58 UTC (History)
16 users (show)

Fixed In Version: virt-launcher-container-v2.2.0-9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1783343 (view as bug list)
Environment:
Last Closed: 2020-01-30 16:27:33 UTC
Target Upstream Version:
Embargoed:
phoracek: needinfo-


Attachments (Terms of Use)
domain with empty disk (2.99 KB, application/xml)
2019-11-22 08:11 UTC, Roman Mohr
no flags Details
libvirt debug logs of a VM started with libvirt 5.6 inside a kubevirt pod (2.29 MB, text/plain)
2019-11-28 10:06 UTC, Roman Mohr
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:0307 0 None None None 2020-01-30 16:27:48 UTC

Description Ruth Netser 2019-11-20 07:54:27 UTC
Description of problem:

On a newly installed PSI environment with OCP 4.3 + CNV 2.2, VMI remains in Scheduled state.

The VMI fails to start on:

server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-1-default_vm-cirros: Operation not permitted')"


Version-Release number of selected component (if applicable):
OCP 4.3 + CNV 2.2

How reproducible:
100%

Steps to Reproduce:
1. Create a VM from yaml
====================================
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: vm-cirros
    kubevirt.io/svc: "true"
  name: vm-cirros
spec:
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-cirros
    spec:
      evictionStrategy: LiveMigrate
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: disk1
        machine:
          type: ""
        resources:
          requests:
            memory: 64M
      terminationGracePeriodSeconds: 0
      volumes:
      - name: disk1
        containerDisk:
          image: kubevirt/cirros-container-disk-demo:latest

====================================

2. Check the VMI status


Actual results:
VMI fails to start.

Expected results:
VMI should be running.

Additional info:
====================================

$ oc get vmi
NAME        AGE       PHASE       IP             NODENAME
vm-cirros   5m4s      Scheduled   10.129.0.139   host-172-16-0-39
====================================

$ oc describe vmi
Name:         vm-cirros
Namespace:    default
Labels:       kubevirt.io/nodeName=host-172-16-0-39
              kubevirt.io/vm=vm-cirros
Annotations:  kubevirt.io/latest-observed-api-version=v1alpha3
              kubevirt.io/storage-observed-api-version=v1alpha3
API Version:  kubevirt.io/v1alpha3
Kind:         VirtualMachineInstance
Metadata:
  Creation Timestamp:  2019-11-20T07:47:09Z
  Finalizers:
    foregroundDeleteVirtualMachine
  Generate Name:  vm-cirros
  Generation:     1617
  Owner References:
    API Version:           kubevirt.io/v1alpha3
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  VirtualMachine
    Name:                  vm-cirros
    UID:                   07f6a0d0-60c5-4654-bbed-c4163420dbf4
  Resource Version:        1082427
  Self Link:               /apis/kubevirt.io/v1alpha3/namespaces/default/virtualmachineinstances/vm-cirros
  UID:                     c6d9a135-1db5-4c26-a2d2-d87fa3e4cd8b
Spec:
  Domain:
    Devices:
      Disks:
        Disk:
          Bus:  virtio
        Name:   disk1
      Interfaces:
        Bridge:
        Name:  default
    Features:
      Acpi:
        Enabled:  true
    Firmware:
      Uuid:  0d2a2043-41c0-59c3-9b17-025022203668
    Machine:
      Type:  q35
    Resources:
      Requests:
        Cpu:          100m
        Memory:       64M
  Eviction Strategy:  LiveMigrate
  Networks:
    Name:  default
    Pod:
  Termination Grace Period Seconds:  0
  Volumes:
    Container Disk:
      Image:              kubevirt/cirros-container-disk-demo:latest
      Image Pull Policy:  Always
    Name:                 disk1
Status:
  Conditions:
    Last Probe Time:       <nil>
    Last Transition Time:  <nil>
    Message:               cannot migrate VMI with a bridge interface connected to a pod network
    Reason:                InterfaceNotLiveMigratable
    Status:                False
    Type:                  LiveMigratable
  Guest OS Info:
  Interfaces:
    Ip Address:      10.129.0.139
    Mac:             52:54:00:9e:0c:d6
    Name:            default
  Migration Method:  BlockMigration
  Node Name:         host-172-16-0-39
  Phase:             Scheduled
  Qos Class:         Burstable
Events:
  Type     Reason            Age                  From                            Message
  ----     ------            ----                 ----                            -------
  Normal   SuccessfulCreate  5m                   disruptionbudget-controller     Created PodDisruptionBudget kubevirt-disruption-budget-l5jq2
  Normal   SuccessfulCreate  5m                   virtualmachine-controller       Created virtual machine pod virt-launcher-vm-cirros-2w2h9
  Warning  SyncFailed        5m                   virt-handler, host-172-16-0-39  server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-1-default_vm-cirros: Operation not permitted')"
  Warning  SyncFailed        5m                   virt-handler, host-172-16-0-39  server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-2-default_vm-cirros: Operation not permitted')"
  Warning  SyncFailed        5m                   virt-handler, host-172-16-0-39  server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-3-default_vm-cirros: Operation not permitted')"
  Warning  SyncFailed        5m                   virt-handler, host-172-16-0-39  server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-4-default_vm-cirros: Operation not permitted')"
  Warning  SyncFailed        5m                   virt-handler, host-172-16-0-39  server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-5-default_vm-cirros: Operation not permitted')"
  Warning  SyncFailed        5m                   virt-handler, host-172-16-0-39  server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-6-default_vm-cirros: Operation not permitted')"
  Warning  SyncFailed        5m                   virt-handler, host-172-16-0-39  server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-7-default_vm-cirros: Operation not permitted')"
  Warning  SyncFailed        5m                   virt-handler, host-172-16-0-39  server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-8-default_vm-cirros: Operation not permitted')"
  Warning  SyncFailed        5m                   virt-handler, host-172-16-0-39  server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-9-default_vm-cirros: Operation not permitted')"
  Warning  SyncFailed        25s (x1481 over 5m)  virt-handler, host-172-16-0-39  (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-1490-default_vm-cirros: Operation not permitted')"

Comment 1 Yan Du 2019-11-20 08:51:51 UTC
I met the issue on OCP4.2 + CNV2.2 cluster too.

Comment 4 Roman Mohr 2019-11-20 14:59:05 UTC
So far I can only rule out that selinux is the issue. Still investigating.

Comment 6 Roman Mohr 2019-11-21 13:26:03 UTC
This seems to be an issue, introduces with running libvirt 5.6.0 inside a container. Could reproduce this issue with a custom libvirt-5.6.0 container:


```
10s         Warning   SyncFailed                VirtualMachineInstance   server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0, Message='internal error: child reported (status=125): Unable to set XATTR trusted.libvirt.security.dac on /var/lib/libvirt/qemu/domain-9-default_vmi-nocloud: Operation not permitted')"
```

Comment 7 Michal Privoznik 2019-11-21 14:29:05 UTC
(In reply to Ruth Netser from comment #0)
> Description of problem:
> 
> On a newly installed PSI environment with OCP 4.3 + CNV 2.2, VMI remains in
> Scheduled state.
> 
> The VMI fails to start on:
> 
> server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=0,
> Message='internal error: child reported (status=125): Unable to set XATTR
> trusted.libvirt.security.dac on
> /var/lib/libvirt/qemu/domain-1-default_vm-cirros: Operation not permitted')"
> 
> 

This is libvirt trying to remember original owner of the file. It does so by setting some extended attributes. However, before doing that, libvirt tries to read an XATTR to see if it gets ENOSYS or ENOTSUP. We are handling EPERM just like any other errno then - just error out. I can try to cook a patch for libvirt, but I'm not sure about the implications. Anyway, as a workaround you can set remember_owner=0 in qemu.conf.

Comment 8 Roman Mohr 2019-11-21 14:46:56 UTC
> Anyway, as a workaround you can set remember_owner=0 in qemu.conf.

Can confirm that I managed to start a VM now. Running the whole test suite now in https://github.com/kubevirt/kubevirt/pull/2893.

Thanks Michal.

> I can try to cook a patch for libvirt

So this patch would check if getting and setting xattrs works at all and then disable the relevant sections?

Comment 9 Michal Privoznik 2019-11-21 15:29:55 UTC
(In reply to Roman Mohr from comment #8)
> > Anyway, as a workaround you can set remember_owner=0 in qemu.conf.
> 
> Can confirm that I managed to start a VM now. Running the whole test suite
> now in https://github.com/kubevirt/kubevirt/pull/2893.
> 
> Thanks Michal.
> 
> > I can try to cook a patch for libvirt
> 
> So this patch would check if getting and setting xattrs works at all and
> then disable the relevant sections?

No, the code currently looks like this:

https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/security/security_util.c;h=4f661fd75e5380a84db00979f2c526639d4b5a17;hb=HEAD#l360

To write it in more readable language (yes, it resembles python):

def remember_label(path):
  xattr = read_xattrs(path, "libvirt.security.ref_dac");
  if failed then:
    if error is ENOSYS or error is ENOTSUP:
      return 0 /* XATTRs are not supported on @path, claim success */
    if error is ENODATA:
      pass /* cool! filesystem where @path lives does support XATTRs, but has none stored yet. */
    else
      raise("Unable to get XATTR libvirt.security.ref_dac on @path");

  /* At this point, @path MUST be on a FS that supports XATTRs,
     because either we read "ref_dac" successfully, or got ENODATA
     which means "no such attribute". */

  if xattr is None:
    xattr = 0;

  /* The xattr variable is a refcounter */
  xattr = int(xattr) + 1;

  /* This is where we write three XATTRs. Read carefully. */
  if (xattr is 1) then:
    write_xattr(path, "libvirt.security.dac", original_owner)
    write_xattr(path, "libvirt.security.timestamp_dac", host_boot_time)
  write_xattr(path, "libvirt.security.ref_dac, xattr)


Now, the problem is that we successfully got past initial check. Either FS returned ENODATA or indeed some XATTRs are set (can you please share debug logs?). How is the FS mounted anyway?
My idea was to check for the first write_xattr() and if it returns EPERM then don't propagate the error but act like if XATTRs are not supported -> like if ENOTSUP was returned in the first block.

Comment 11 Roman Mohr 2019-11-21 16:17:43 UTC
Many tests work now, but https://github.com/kubevirt/kubevirt/pull/2893 shows that we will most likely need more modifications.

Comment 12 Michal Privoznik 2019-11-21 16:27:03 UTC
(In reply to Roman Mohr from comment #11)
> Many tests work now, but https://github.com/kubevirt/kubevirt/pull/2893
> shows that we will most likely need more modifications.

I haven't seen any libvirt error in failed tests. But then again, I don't understand kubevirt or its testing.
Anyway, back to my original question - can you please share how filesystems are mounted (esp. why they don't have XATTRs)? I guess sharing the mount table from within the container should be enough.

Comment 13 Roman Mohr 2019-11-21 16:36:05 UTC
> I haven't seen any libvirt error in failed tests. But then again, I don't understand kubevirt or its testing.

Sorry for the confusion, but wanted to update QE people. Did not yet look at the detailed log. It is definitely related to then new libvirt container, but does not necessarily mean that they are failing because of libvirt.

> Anyway, back to my original question - can you please share how filesystems are mounted (esp. why they don't have XATTRs)? I guess sharing the mount table from within the container should be enough.

I think that the main issue is that overlay2, aufs and all the other container filesystems just don't support XATTRS.


Here the mount table:

```
2612 1772 0:480 / / rw,relatime - overlay overlay rw,context="system_u:object_r:container_file_t:s0:c375,c542",lowerdir=/var/lib/docker/overlay2/l/PY5W2CNJDFJAHUIMXMTZHUELBW:/var/lib/docker/overlay2/l/4GCDSYWFCV2ZNMTFP4UOLHHOLQ:/var/lib/docker/overlay2/l/577DWR2ZOKLAVLOQWSFGVTCLLS:/var/lib/docker/overlay2/l/4V7FSDH7B6W6XK6YRKUHVGVWBF:/var/lib/docker/overlay2/l/FA24K4GSDMNEI35V4IJSNR2AE4:/var/lib/docker/overlay2/l/TXYYEVRWC7KD7NKYEZTUJYLUIW:/var/lib/docker/overlay2/l/FOJ5WC45ZMPROYLCW37FCDWG4B:/var/lib/docker/overlay2/l/BS5VETE6OSR5GP7VN6WJK5OC6Z:/var/lib/docker/overlay2/l/CUPRASXRTFCH26SGSVMKFUR5AI:/var/lib/docker/overlay2/l/OOVHEVMPRGBFGW2PQSM6JDCRHX:/var/lib/docker/overlay2/l/QGOCKRCZMQGA5PY45QDQPJQLQ6,upperdir=/var/lib/docker/overlay2/7a44a2fa2d473124b0120494703aef9c3bbd04f014c132033e129ab50e983546/diff,workdir=/var/lib/docker/overlay2/7a44a2fa2d473124b0120494703aef9c3bbd04f014c132033e129ab50e983546/work
2613 2612 0:481 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
2614 2612 0:482 / /dev rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c375,c542",mode=755
2615 2614 0:483 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,context="system_u:object_r:container_file_t:s0:c375,c542",gid=5,mode=620,ptmxmode=666
2616 2612 0:475 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs ro,seclabel
2617 2616 0:484 / /sys/fs/cgroup ro,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c375,c542",mode=755
2618 2617 0:22 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3cef3f11_0c7c_11ea_92a7_525500d15501.slice/docker-9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79.scope /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime master:9 - cgroup cgroup rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
2619 2617 0:24 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3cef3f11_0c7c_11ea_92a7_525500d15501.slice/docker-9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79.scope /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime master:10 - cgroup cgroup rw,seclabel,cpuset
2620 2617 0:25 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3cef3f11_0c7c_11ea_92a7_525500d15501.slice/docker-9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79.scope /sys/fs/cgroup/cpuacct,cpu ro,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,seclabel,cpuacct,cpu
2621 2617 0:26 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3cef3f11_0c7c_11ea_92a7_525500d15501.slice/docker-9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime master:12 - cgroup cgroup rw,seclabel,memory
2622 2617 0:27 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3cef3f11_0c7c_11ea_92a7_525500d15501.slice/docker-9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79.scope /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime master:13 - cgroup cgroup rw,seclabel,freezer
2623 2617 0:28 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3cef3f11_0c7c_11ea_92a7_525500d15501.slice/docker-9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79.scope /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime master:14 - cgroup cgroup rw,seclabel,devices
2624 2617 0:29 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3cef3f11_0c7c_11ea_92a7_525500d15501.slice/docker-9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79.scope /sys/fs/cgroup/net_prio,net_cls ro,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,seclabel,net_prio,net_cls
2625 2617 0:30 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3cef3f11_0c7c_11ea_92a7_525500d15501.slice/docker-9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79.scope /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,seclabel,blkio
2626 2617 0:31 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3cef3f11_0c7c_11ea_92a7_525500d15501.slice/docker-9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79.scope /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,seclabel,pids
2627 2617 0:32 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3cef3f11_0c7c_11ea_92a7_525500d15501.slice/docker-9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79.scope /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,seclabel,perf_event
2628 2617 0:33 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod3cef3f11_0c7c_11ea_92a7_525500d15501.slice/docker-9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79.scope /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime master:19 - cgroup cgroup rw,seclabel,hugetlb
2629 2614 0:471 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw,seclabel
2630 2614 253:1 /var/lib/kubelet/pods/3cef3f11-0c7c-11ea-92a7-525500d15501/containers/compute/3006c557 /dev/termination-log rw,relatime - xfs /dev/vda1 rw,seclabel,attr2,inode64,noquota
2631 2612 253:1 /var/lib/docker/containers/2ffd8997b882392bfc2cb897a953c76d94606ebe9841c078b4051f2b4979c501/hostname /etc/hostname rw,relatime - xfs /dev/vda1 rw,seclabel,attr2,inode64,noquota
2632 2612 253:1 /var/lib/kubelet/pods/3cef3f11-0c7c-11ea-92a7-525500d15501/etc-hosts /etc/hosts rw,relatime - xfs /dev/vda1 rw,seclabel,attr2,inode64,noquota
2633 2614 0:470 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw,context="system_u:object_r:container_file_t:s0:c3,c642",size=65536k
2634 2612 253:1 /var/lib/docker/containers/9f757b8c04b84c57b21df5208ada4948287f28fd546966300af3d737910e7c79/secrets /run/secrets rw,relatime - xfs /dev/vda1 rw,seclabel,attr2,inode64,noquota
2635 2612 253:1 /var/lib/docker/containers/2ffd8997b882392bfc2cb897a953c76d94606ebe9841c078b4051f2b4979c501/resolv.conf /etc/resolv.conf rw,relatime - xfs /dev/vda1 rw,seclabel,attr2,inode64,noquota
2636 2612 253:1 /var/lib/kubelet/pods/3cef3f11-0c7c-11ea-92a7-525500d15501/volumes/kubernetes.io~empty-dir/infra-ready-mount /run/kubevirt-infra rw,relatime - xfs /dev/vda1 rw,seclabel,attr2,inode64,noquota
2637 2612 253:1 /var/lib/kubelet/pods/3cef3f11-0c7c-11ea-92a7-525500d15501/volumes/kubernetes.io~empty-dir/ephemeral-disks /run/kubevirt-ephemeral-disks rw,relatime - xfs /dev/vda1 rw,seclabel,attr2,inode64,noquota
2638 2612 0:20 /kubevirt /run/kubevirt rw,nosuid,nodev - tmpfs tmpfs rw,seclabel,mode=755
2639 2612 253:1 /var/lib/kubelet/pods/3cef3f11-0c7c-11ea-92a7-525500d15501/volumes/kubernetes.io~empty-dir/libvirt-runtime /run/libvirt rw,relatime - xfs /dev/vda1 rw,seclabel,attr2,inode64,noquota
2640 2638 0:20 /kubevirt/container-disks/3cec6825-0c7c-11ea-92a7-525500d15501 /run/kubevirt/container-disks rw,nosuid,nodev master:23 - tmpfs tmpfs rw,seclabel,mode=755
1773 2613 0:481 /bus /proc/bus ro,nosuid,nodev,noexec,relatime - proc proc rw
1857 2613 0:481 /fs /proc/fs ro,nosuid,nodev,noexec,relatime - proc proc rw
1858 2613 0:481 /irq /proc/irq ro,nosuid,nodev,noexec,relatime - proc proc rw
1859 2613 0:481 /sys /proc/sys ro,nosuid,nodev,noexec,relatime - proc proc rw
1860 2613 0:481 /sysrq-trigger /proc/sysrq-trigger ro,nosuid,nodev,noexec,relatime - proc proc rw
1861 2613 0:485 / /proc/acpi ro,relatime - tmpfs tmpfs ro,seclabel
1890 2613 0:482 /null /proc/kcore rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c375,c542",mode=755
1891 2613 0:482 /null /proc/timer_list rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c375,c542",mode=755
1892 2613 0:482 /null /proc/timer_stats rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c375,c542",mode=755
1895 2613 0:482 /null /proc/sched_debug rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c375,c542",mode=755
1896 2613 0:486 / /proc/scsi ro,relatime - tmpfs tmpfs ro,seclabel
1897 2616 0:487 / /sys/firmware ro,relatime - tmpfs tmpfs ro,seclabel
1937 2640 0:488 /disk/downloaded /run/kubevirt/container-disks/disk_0.img rw,relatime master:203 - overlay overlay rw,context="system_u:object_r:container_file_t:s0:c393,c983",lowerdir=/var/lib/docker/overlay2/l/7F3MH6DCCFQVDBBM276DZKEMFQ:/var/lib/docker/overlay2/l/7YG2XJCPWYY743T3PN7B3OABLU,upperdir=/var/lib/docker/overlay2/4b3fc399f028701075d61718dff2d2eeda22d8a866fbd53f2e774d67f9efeb1b/diff,workdir=/var/lib/docker/overlay2/4b3fc399f028701075d61718dff2d2eeda22d8a866fbd53f2e774d67f9efeb1b/work

```

Comment 14 Roman Mohr 2019-11-22 08:11:24 UTC
Created attachment 1638634 [details]
domain with empty disk

Ok, so the remaing issue seems to be regarding to permissions to files.

The attached domain with an empty disk at "/var/run/libvirt/empty-disks/emptydisk.qcow2" gives us 

```
{"component":"virt-launcher","kind":"","level":"error","msg":"Starting the VirtualMachineInstance failed.","name":"vmi-nocloud","namespace":"default","pos":"manager.go:1051","reason":"virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2019-11-22T08:03:54.181600Z qemu-system-x86_64: -drive file=/var/run/libvirt/empty-disks/emptydisk.qcow2,format=qcow2,if=none,id=drive-ua-emptydisk,cache=none: Could not reopen file: Permission denied')","timestamp":"2019-11-22T08:03:54.385299Z","uid":"9cd516e6-0cfe-11ea-bc4d-525500d15501"}
```

It could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1762178

Comment 15 Roman Mohr 2019-11-22 08:28:36 UTC
> It could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1762178

I will now also make the disks explicitly owned by `qemu`, like we do for all other disks already anyway. Still something seems to have changed. Maybe related to dynamic_ownership?

Comment 16 Michal Privoznik 2019-11-25 15:04:14 UTC
(In reply to Roman Mohr from comment #13)
> > I haven't seen any libvirt error in failed tests. But then again, I don't understand kubevirt or its testing.
> 
> Sorry for the confusion, but wanted to update QE people. Did not yet look at
> the detailed log. It is definitely related to then new libvirt container,
> but does not necessarily mean that they are failing because of libvirt.
> 
> > Anyway, back to my original question - can you please share how filesystems are mounted (esp. why they don't have XATTRs)? I guess sharing the mount table from within the container should be enough.
> 
> I think that the main issue is that overlay2, aufs and all the other
> container filesystems just don't support XATTRS.

I've just tested overlayfs (the one that comes with kernel) and it does support XATTRs:

From my mount table:
none on /mnt/cdrom type tmpfs (rw,relatime,size=10240k)
none on /mnt/floppy type overlay (rw,relatime,lowerdir=/etc,upperdir=/mnt/cdrom/a,workdir=/mnt/cdrom/b/)

# setfattr -n trusted.libvirt.security.dac -v "+0:+0" /mnt/floppy/rhashrc
# setfattr -n trusted.libvirt.security.dac -v "+0:+0" /mnt/floppy/tigrc

# getfattr -d -m - /etc/rhashrc /mnt/cdrom/a/rhashrc /mnt/floppy/rhashrc /etc/tigrc /mnt/cdrom/a/tigrc /mnt/floppy/tigrc 
getfattr: Removing leading '/' from absolute path names
# file: mnt/cdrom/a/rhashrc
trusted.libvirt.security.dac="+0:+0"
trusted.overlay.origin=0sAPshAIF0aLAjFMJK2pktLYyDjyyjbWwAEAAAAAA+8Imp

# file: mnt/floppy/rhashrc
trusted.libvirt.security.dac="+0:+0"

# file: mnt/cdrom/a/tigrc
trusted.libvirt.security.dac="+0:+0"
trusted.overlay.origin=0sAPshAIF0aLAjFMJK2pktLYyDjyyjbe8EAAAAAABpW1NG

# file: mnt/floppy/tigrc
trusted.libvirt.security.dac="+0:+0"

I haven't tested aufs though. So I'm not sure how to address this.

Comment 17 Michal Privoznik 2019-11-25 15:41:29 UTC
Roman, are you sure that libvirtd is running as root (and with CAP_SYS_ADMIN)? Also, the man page of setxattr(2) suggests that the error might be because the file is marked as immutable:

  EPERM  The file is marked immutable or append-only.  (See ioctl_iflags(2).)

Can you confirm this is not the case?

Comment 18 Roman Mohr 2019-11-25 15:48:04 UTC
> Roman, are you sure that libvirtd is running as root (and with CAP_SYS_ADMIN)? Also, the man page of setxattr(2) suggests that the error might be because the file is marked as immutable:

Yep that is probably it: CAP_SYS_ADMIN. We don't grant that. It was not needed for 5.0. Do you know why it is needed now? Does that also explain why dynamic_ownership does not work anymore?

Comment 19 Roman Mohr 2019-11-25 15:59:24 UTC
Will provide debug logs, to dive deeper into the dynamic_owner issue.

Comment 27 Michal Privoznik 2019-11-27 16:47:03 UTC
(In reply to Roman Mohr from comment #19)
> Will provide debug logs, to dive deeper into the dynamic_owner issue.

Does the container run with CAP_CHOWN? Thing is, libvirt defaults to dynamic_ownership = 1 and remember_owner = 1 if geteuid() == 0. I'm guessing the container runs also with an UID mapping, doesn't it? We might need to tweak those defaults (i.e. enabled dynamic_ownership if CAP_CHOWN is present and remember_owner if CAP_SYS_ADMIN is present). What I still don't get is how come SELinux work? It also uses XATTRs (although different namespace - security.* rather than trusted.* but man xattr(7) says both need CAP_SYS_ADMIN).

Anyway, to debug the dynamic_ownership not working I'd need to see debug logs please.

https://wiki.libvirt.org/page/DebugLogs

Comment 31 Roman Mohr 2019-11-28 08:49:48 UTC
(In reply to Michal Privoznik from comment #27)
> (In reply to Roman Mohr from comment #19)
> > Will provide debug logs, to dive deeper into the dynamic_owner issue.
> 
> Does the container run with CAP_CHOWN? Thing is, libvirt defaults to
> dynamic_ownership = 1 and remember_owner = 1 if geteuid() == 0. I'm guessing
> the container runs also with an UID mapping, doesn't it?

So the libvirt container runs right now as root inside the container and we start the VMs by running the qemu processes as "qemu" user.
Not running libvirt as root is a goal, but not something we do right now.


> We might need to
> tweak those defaults (i.e. enabled dynamic_ownership if CAP_CHOWN is present

CAP_CHOWN should be present. I will attach the capabilities of the libvirt and qemu processes to the debug logs.

> and remember_owner if CAP_SYS_ADMIN is present). What I still don't get is
> how come SELinux work? It also uses XATTRs (although different namespace -
> security.* rather than trusted.* but man xattr(7) says both need
> CAP_SYS_ADMIN).
> 
> Anyway, to debug the dynamic_ownership not working I'd need to see debug
> logs please.


Working on it now.

> 
> https://wiki.libvirt.org/page/DebugLogs

Comment 32 Roman Mohr 2019-11-28 10:06:45 UTC
Created attachment 1640327 [details]
libvirt debug logs of a VM started with libvirt 5.6 inside a kubevirt pod

Attached the logs.

The libvirt process capabilities: 

```
Capabilities for `4716': = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_admin,cap_net_raw,cap_sys_chroot,cap_sys_nice,cap_mknod,cap_audit_write,cap_setfcap+eip
```

The qemu process capabilities:

```
Capabilities for `5018': = cap_net_bind_service+ep

```

Comment 33 Roman Mohr 2019-11-29 08:56:00 UTC
Moving to modified, since the issues for the reporter are now resolved.

Comment 34 Dan Kenigsberg 2019-12-01 12:29:40 UTC
By now, CPaaS must have built this and attached it to Errata.

Comment 35 Ruth Netser 2019-12-02 09:17:06 UTC
Manually verified that a VMI can be started.
Due to the nature of the fix, will close the bug once we complete regression on the version.

Comment 36 Israel Pinto 2019-12-04 10:06:14 UTC
Start vm [1]
VM is running:
$oc get vmi 
NAME        AGE   PHASE     IP            NODENAME
vm-cirros   17m   Running   10.131.0.84   host-172-16-0-48


[1]
$ cat cirros.yaml 
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: vm-cirros
  name: vm-cirros
spec:
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-cirros
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: containerdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
        machine:
          type: ""
        resources:
          requests:
            memory: 64M
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: kubevirt/cirros-container-disk-demo:latest
        name: containerdisk
      - cloudInitNoCloud:
          userData: |
            #!/bin/sh
            
             echo 'printed from cloud-init userdata'
        name: cloudinitdisk

Comment 38 errata-xmlrpc 2020-01-30 16:27:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0307


Note You need to log in before you can comment on or make changes to this bug.