Description of problem: Migration fails if 'Desktop' workload profile was selected Version-Release number of selected component (if applicable): CNV 2.3.0 latest How reproducible: 100% Steps to Reproduce: 1. Start migration wizzard 2. Connect to VMWare (doesn't matter if that is existing connection or a new one) 3. Select 'Desktop' workload profile 4. Fill in the rest of required fields as usual Actual results: Migration will fail, the only error in the log of conversion pod: libguestfs: error: security: cached appliance /var/tmp/.guestfs-0 is not owned by UID 0 Expected results: Conversion should succeed Additional info:
@Pino, is this a SELinux issue? @Igor, can you please attach the virt-v2v and virt-v2v-wrapper logs?
(In reply to Brett Thurber from comment #3) > @Pino, is this a SELinux issue? No, this is an error reported by libguestfs. From what I know, libguestfs is run during the build of the UCI, so there is a cached appliance already available. If that is the case, then there is something wrong in the way that command execution is run, so that the "root" user during the build does not really have UID 0. Tomáš, do you have any idea about this? > @Igor, can you please attach the virt-v2v and virt-v2v-wrapper logs? Yes, we definitely need them.
Below is the log I could grab. Please explain how to grab something more if you need that. # oc logs kubevirt-v2v-conversion-v2v-rhel7-igor-imported-wxfgx -n test-llicx + VDDK=/opt/vmware-vix-disklib-distrib/ + ls -l /usr/lib64/nbdkit/plugins/nbdkit-vddk-plugin.so -rwxr-xr-x. 1 root root 23344 Dec 20 02:10 /usr/lib64/nbdkit/plugins/nbdkit-vddk-plugin.so + ls -ld /opt/vmware-vix-disklib-distrib/ drwxrwxrwx. 7 root root 84 Mar 18 10:56 /opt/vmware-vix-disklib-distrib/ ++ find /opt/vmware-vix-disklib-distrib/ -name libvixDiskLib.so.6 + lib=/opt/vmware-vix-disklib-distrib/lib64/libvixDiskLib.so.6 ++ dirname /opt/vmware-vix-disklib-distrib/lib64/libvixDiskLib.so.6 + LD_LIBRARY_PATH=/opt/vmware-vix-disklib-distrib/lib64 + nbdkit --dump-plugin vddk path=/usr/lib64/nbdkit/plugins/nbdkit-vddk-plugin.so name=vddk version=1.12.5 api_version=2 struct_size=288 thread_model=serialize_all_requests errno_is_preserved=0 has_longname=1 has_load=1 has_unload=1 has_dump_plugin=1 has_config=1 has_config_complete=1 has_config_help=1 has_open=1 has_close=1 has_get_size=1 has_pread=1 has_pwrite=1 has_flush=1 has_can_extents=1 has_extents=1 vddk_default_libdir=/usr/lib64/vmware-vix-disklib vddk_has_nfchostport=1 vddk_dll=/data/vddklib/vmware-vix-disklib-distrib/lib64/libvixDiskLib.so.6.7.0 + LIBGUESTFS_BACKEND=direct + libguestfs-test-tool -t 0 ************************************************************ * IMPORTANT NOTICE * * When reporting bugs, include the COMPLETE, UNEDITED * output below in your bug report. * ************************************************************ LIBGUESTFS_BACKEND=direct PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin SELinux: Disabled guestfs_get_append: (null) guestfs_get_autosync: 1 guestfs_get_backend: direct guestfs_get_backend_settings: [] guestfs_get_cachedir: /var/tmp guestfs_get_hv: /usr/libexec/qemu-kvm guestfs_get_memsize: 768 guestfs_get_network: 0 guestfs_get_path: /usr/lib64/guestfs guestfs_get_pgroup: 0 guestfs_get_program: libguestfs-test-tool guestfs_get_recovery_proc: 1 guestfs_get_smp: 1 guestfs_get_sockdir: /tmp guestfs_get_tmpdir: /tmp guestfs_get_trace: 0 guestfs_get_verbose: 1 host_cpu: x86_64 Launching appliance, timeout set to 0 seconds. libguestfs: launch: program=libguestfs-test-tool libguestfs: launch: version=1.40.2rhel=8,release=16.module+el8.1.1+5309+6d656f05,libvirt libguestfs: launch: backend registered: unix libguestfs: launch: backend registered: uml libguestfs: launch: backend registered: libvirt libguestfs: launch: backend registered: direct libguestfs: launch: backend=direct libguestfs: launch: tmpdir=/tmp/libguestfsOA4muL libguestfs: launch: umask=0022 libguestfs: launch: euid=0 libguestfs: begin testing qemu features libguestfs: error: security: cached appliance /var/tmp/.guestfs-0 is not owned by UID 0 libguestfs: closing guestfs handle 0x55d15c2b8e50 (state 0) libguestfs: command: run: rm libguestfs: command: run: \ -rf /tmp/libguestfsOA4muL
> Version-Release number of selected component (if applicable): CNV 2.3.0 latest Igor, could you be more specific please? What is the exact version of the image?
# oc get kubevirt -o yaml -n openshift-cnv|grep -i kubevirtversion observedDeploymentConfig: '{"id":"e32b89d9b82fa001bf5b9efc5712ea70174041c9","namespace":"openshift-cnv","registry":"registry-proxy.engineering.redhat.com/rh-osbs","imagePrefix":"container-native-virtualization-","kubeVirtVersion":"sha256:efa5e6bb3db2d3352913dcbe09424d7d1f634a475653bf6b6c07623c43442702","virtOperatorSha":"sha256:efa5e6bb3db2d3352913dcbe09424d7d1f634a475653bf6b6c07623c43442702","virtApiSha":"sha256:d6f1e5d26c9a62b345946f2d2bc58b5118c68b572dfbb0247435cc1e0a49e2ef","virtControllerSha":"sha256:d15ddf31cbd82184146d1400ba53365342eee7a17d97a9ebd5807651632dcadd","virtHandlerSha":"sha256:af73bb2f4c41028cb4f62f81fa8113ee499e6e2ea3eb97095a3dc9f95ec995e8","virtLauncherSha":"sha256:035856c249cb310760f75f852452b71a2f5f337be6f9ec6901e33a49aa9c4946","additionalProperties":{"ImagePullPolicy":"","MonitorAccount":"","MonitorNamespace":"","UninstallStrategy":"BlockUninstallIfWorkloadsExist"}}' observedKubeVirtVersion: sha256:efa5e6bb3db2d3352913dcbe09424d7d1f634a475653bf6b6c07623c43442702 targetDeploymentConfig: '{"id":"e32b89d9b82fa001bf5b9efc5712ea70174041c9","namespace":"openshift-cnv","registry":"registry-proxy.engineering.redhat.com/rh-osbs","imagePrefix":"container-native-virtualization-","kubeVirtVersion":"sha256:efa5e6bb3db2d3352913dcbe09424d7d1f634a475653bf6b6c07623c43442702","virtOperatorSha":"sha256:efa5e6bb3db2d3352913dcbe09424d7d1f634a475653bf6b6c07623c43442702","virtApiSha":"sha256:d6f1e5d26c9a62b345946f2d2bc58b5118c68b572dfbb0247435cc1e0a49e2ef","virtControllerSha":"sha256:d15ddf31cbd82184146d1400ba53365342eee7a17d97a9ebd5807651632dcadd","virtHandlerSha":"sha256:af73bb2f4c41028cb4f62f81fa8113ee499e6e2ea3eb97095a3dc9f95ec995e8","virtLauncherSha":"sha256:035856c249cb310760f75f852452b71a2f5f337be6f9ec6901e33a49aa9c4946","additionalProperties":{"ImagePullPolicy":"","MonitorAccount":"","MonitorNamespace":"","UninstallStrategy":"BlockUninstallIfWorkloadsExist"}}' targetKubeVirtVersion: sha256:efa5e6bb3db2d3352913dcbe09424d7d1f634a475653bf6b6c07623c43442702 This command stopped returning build version. How else I can get version number of installed build?
This is what I could find regarding image version: # oc get pod kubevirt-v2v-conversion-v2v-rhel7-igor-mdk9j -o yaml|grep -i image - image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubevirt-v2v-conversion@sha256:3c043a2c6def14845a1e9c4594541c854d166d76f65f11e8e3e99c57a0e2b90d
The error is from: https://github.com/libguestfs/libguestfs/blob/088b6d1c80e4bd9347c6350927e1d0ad347ba18f/lib/tmpdirs.c#L287 and is a security check to make sure that libguestfs doesn't use a cached appliance which was maliciously created by another user. In this case it's likely that the original ownership of /var/tmp/.guestfs-0 has been changed by something (or as Pino says, UID 0 is no longer really UID 0 because of the mysteries of user namespaces). I would question why the UCI contains this directory at all. Let it be created when virt-v2v runs the first time after the appliance has been deployed, and this problem cannot arise.
(In reply to Igor Braginsky from comment #10) > This is what I could find regarding image version: > # oc get pod kubevirt-v2v-conversion-v2v-rhel7-igor-mdk9j -o yaml|grep -i > image > - image: > registry-proxy.engineering.redhat.com/rh-osbs/container-native- > virtualization-kubevirt-v2v-conversion@sha256: > 3c043a2c6def14845a1e9c4594541c854d166d76f65f11e8e3e99c57a0e2b90d So this is kubevirt-v2v-conversion:v2.3.0-10 (In reply to Richard W.M. Jones from comment #11) > The error is from: > In this case it's likely that the original ownership of /var/tmp/.guestfs-0 > has been changed by something (or as Pino says, UID 0 is no longer really > UID 0 because of the mysteries of user namespaces). I don't see anything wrong with the image. Everything is as it used to be, so it will be some change in the namespace magic that Openshift does. We could remove the directory, it contains only some leftower qemu logs and state files, but we may run into the same issue with the real (fixed) appliance directory which is in /usr/lib64/guestfs/. > I would question why the UCI contains this directory at all. Let it be > created when virt-v2v runs the first time after the appliance has been > deployed, and this problem cannot arise. It doesn't make much sense to rebuild the libguestfs appliance at the beginning of every conversion. Removing the appliance would make the image much smaller. The disadvantages would be: - increased startup time (~1 minute) - increased requirements on temp space (~300 MB) Igor, could you start a pod with the image and run the following commands? $ id $ ls -lnR /var/run/.guestfs-0 $ ls -lnd /usr/lib64/guestfs/
> Igor, could you start a pod with the image and run the following commands? > > $ id > $ ls -lnR /var/run/.guestfs-0 this should be: $ ls -lnaR /var/tmp/.guestfs-0 > $ ls -lnd /usr/lib64/guestfs/
I have just installed latest build (March 25) and faced the same error regardless using `server` workload profile.
(In reply to Igor Braginsky from comment #14) > I have just installed latest build (March 25) and faced the same error > regardless using `server` workload profile. Thanks Igor.
(In reply to Tomáš Golembiovský from comment #13) > > Igor, could you start a pod with the image and run the following commands? > > > > $ id > > $ ls -lnR /var/run/.guestfs-0 > > this should be: $ ls -lnaR /var/tmp/.guestfs-0 > > > $ ls -lnd /usr/lib64/guestfs/ Tomas, I tried to run `# oc rsh kubevirt-v2v-conversion-v2v-rhel7-igor2-kt92v` I've got `error: cannot exec into a container in a completed pod; current phase is Failed` Any ideas how I can get inside the pod and run commands you are asking about?
reseting priority & status
I've got inside one of those shares on NFS, that are used as PVs. igor@localhost:~/NFS_CNV/ibragins.cnv-qe.rhcloud.com.pvs/pv1$ ls -la total 12 drwxr-xr-x. 3 nobody nobody 4096 Mar 26 09:17 . drwxr-xr-x. 22 nobody nobody 4096 Mar 25 14:20 .. drwxr-xr-x. 2 nobody nobody 4096 Mar 26 09:17 .guestfs-0
We're still waiting for answers to comment 12 & comment 13. Unfortunately NFS will squash root permissions to nobody.nobody so it's not very helpful. We need the actual permissions when run from inside the UCI as it's running in the end user environment.
Please don't modify the priority/severity fields. They are for the developers to use to prioritise bugs. I can also tell you that absolutely no work will take place on this bug unless you can provide the requested information.
I can't provide that info, I can share my env details so one can connect to it and take all required data himself. And I didn't change any fields on purpose
(In reply to Tomáš Golembiovský from comment #13) > > Igor, could you start a pod with the image and run the following commands? > > > > $ id > > $ ls -lnR /var/run/.guestfs-0 > > this should be: $ ls -lnaR /var/tmp/.guestfs-0 > > > $ ls -lnd /usr/lib64/guestfs/ $ id uid=0(root) gid=0(root) groups=0(root) $ ls -lnaZR /var/tmp/.guestfs-0 /var/tmp/.guestfs-0: total 8 drwxr-xr-x. 2 99 99 system_u:object_r:nfs_t:s0 4096 Mar 26 07:17 . drwxr-xr-x. 3 99 99 system_u:object_r:nfs_t:s0 4096 Mar 26 07:17 .. $ ls -lnaZR /usr/lib64/guestfs /usr/lib64/guestfs: total 319308 drwxr-xr-x. 2 0 0 system_u:object_r:container_file_t:s0:c843,c873 66 Mar 27 22:07 . dr-xr-xr-x. 1 0 0 system_u:object_r:container_file_t:s0:c843,c873 20480 Mar 27 22:07 .. -rw-r--r--. 1 0 0 system_u:object_r:container_file_t:s0:c843,c873 0 Mar 27 22:07 README.fixed -rw-r--r--. 1 0 0 system_u:object_r:container_file_t:s0:c843,c873 4261888 Mar 27 22:07 initrd -rwxr-xr-x. 1 0 0 system_u:object_r:container_file_t:s0:c843,c873 8106848 Jan 14 16:01 kernel -rw-r--r--. 1 0 0 system_u:object_r:container_file_t:s0:c843,c873 314572800 Mar 27 22:07 root
Thanks Igor, just one more check: $ ls -la /var/tmp/ At least from the log it seems that /var/tmp/.guestfs-0 is owned by the UID 99 and GID 99 (which IIRC is nobody:nobody in RHEL). With the above command we can know that for sure. ---- If I get the whole situation correct, we have: - a fixed appliance directly in /usr/lib64/guestfs (README.fixed, initrd, kernel, root) - the supermin appliance in /usr/lib64/guestfs/supermin.d - the default libguestfs appliance path (LIBGUESTFS_PATH) is /usr/lib64/guestfs - the default libguestfs cache directory (LIBGUESTFS_CACHEDIR) is /var/tmp What happens is that libguestfs first finds a supermin appliance under $LIBGUESTFS_PATH/supermin.d so it tries to rebuild it: one of the first steps is to create $LIBGUESTFS_CACHEDIR/.guestfs-$UID, and ensuring it is actually owned by UID, and writable only by that UID. Apparently this fails because $LIBGUESTFS_CACHEDIR/.guestfs-$UID exists already with a different owner. Also, this step (trying to rebuild a supermin appliance) comes *before* the lookup of a fixed appliance, so the fixed appliance will not be used, even libguestfs is able to create $LIBGUESTFS_CACHEDIR/.guestfs-$UID with the right permissions. If the goal is to use a fixed appliance, then what I suggest are the following steps: - relocate the fixed appliance on a different directory: one usual path that distros without supermin support use is $LIBDIR/guestfs/appliance, which would be /usr/lib64/guestfs/appliance in our case; this is needed so the fixed appliance is not in the same place where a supermin appliance is - export LIBGUESTFS_PATH to the path where the fixed appliance is; this is needed so libguestfs finds the fixed appliance As it is more or less explicit from the steps above: *both* are needed.
I agree with Pino, with the addition that I really don't think using the fixed appliance is a good idea at all. If optimization is really a concern then we can look into that later. Tomas is it possible you could scratch-build a UCI without the fixed appliance at all, just to see if that fixes the bug?
(In reply to Richard W.M. Jones from comment #25) > with the addition that I really don't think using > the fixed appliance is a good idea at all. If optimization is really > a concern then we can look into that later. This is a POD (container), which is instantiated anew every time it is spawned. Theoretically speaking, you can mount directories (e.g. with shared storage) in PODs, however mounting /var/tmp/ is highly impractical (it is a semi-temporary directory, after all). Also, it would be hard to properly share it across PODs in multiple namespaces/projects (IIRC, I don't remember the exact terminology). So in practice shipping a fixed appliance from the same content of the container is not that bad idea: - avoids the need to rebuild the supermin appliance at the first startup, which would be every time the POD is started - the content of the container is immutable anyway, so the content of the fixed appliance will not be invalidated by updates on the OS - whenever the container is rebuilt, the fixed appliance is rebuilt too, still matching the container > Tomas is it possible you could scratch-build a UCI without the fixed appliance > at all, just to see if that fixes the bug? I'm not Tomáš, however I think this will not make a difference, as the fixed appliance is ignored at the moment due (the supermin appliance is found first and used) to the directory layout.
Oh I misread comment 23, I thought the fixed appliance had been placed in /var/tmp/.guestfs-0. So what creates /var/tmp/.guestfs-0 ? Is /var/tmp mounted over NFS? NFS for /var/tmp isn't going to work.
(In reply to Pino Toscano from comment #24) > What happens is that libguestfs first finds a supermin appliance under > $LIBGUESTFS_PATH/supermin.d so it tries to rebuild it: one of the first > steps is to create $LIBGUESTFS_CACHEDIR/.guestfs-$UID, and ensuring it is > actually owned by UID, and writable only by that UID. Apparently this fails > because $LIBGUESTFS_CACHEDIR/.guestfs-$UID exists already with a different > owner. This is not true. I asked Igor to test a modified image and the problem occurs also when there is no /var/tmp/.guestfs-0 before. Also what I wrote... > ..., it contains only some leftover qemu logs and state files ... is not completely true either. It is true that the directory is in the image, but there's a volume mounted over it so it is invisible to libguestfs. I have to agree with other here. It looks to me like a missconfigured storage. My guess is NFS with squash_root option that strips the user and turns owner into 'nobody'. Igor, is it NFS or block storage?
This is CNV version 2.3, so we are testing NFS, and NFS share is configured in this environment as backend for PVs
Created attachment 1674727 [details] Conversion log, new conversion POD
The updated log that Tomas asked for and Igor provided does appear to indicate that /var/tmp is an NFS filesystem with root_squash set, and virt-v2v cannot run in such a configuration. You'll have to make /var/tmp be a real local filesystem, or use the fixed appliance by setting up LIBGUESTFS_PATH, or put the cachedir into another directory by setting LIBGUESTFS_CACHEDIR. The environment variables are documented here: http://libguestfs.org/guestfs.3.html#environment-variables
Since /var/tmp is temporary storage, the natural kubernetes match would be emptyDir [0]. This would not require us to have a separate local storage class available for temporary storage. We can also set a sizeLimit on emptyDir to help the scheduler determine which node to use for conversion. It is also local storage to the node, and it is cleaned up after the pod exits, so we wouldn't have to worry about clean up after the conversion. [0] https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
I'm not sure if emptyDir as described there will work. If it is tmpfs then we'll consume hundreds of megabytes of RAM. If it's NFS, it also won't work. And also it's described as being shared between containers in the POD - do multiple copies of virt-v2v run in different containers in the same POD? If they do that's another reason it couldn't be used.
(In reply to Richard W.M. Jones from comment #33) > I'm not sure if emptyDir as described there will work. If it is tmpfs then > we'll consume hundreds of megabytes of RAM. As I understand it it is not tmpfs unless you explicitly ask for it. > If it's NFS, it also won't work. If you mean NFS share mounted on the host (and not provisioned NFS volume) then it is in theory possible, but I cannot say how common is this configuration in the field. Alexander, is this configuration used? > And also it's described as being shared between containers in the POD - do > multiple copies of virt-v2v run in different containers in the same POD? If > they do that's another reason it couldn't be used. The only containers in our POD is VDDK container and the conversion container. There's no risk of multiple virt-v2v instances sharing the space.
(In reply to Tomáš Golembiovský from comment #34) > (In reply to Richard W.M. Jones from comment #33) > > I'm not sure if emptyDir as described there will work. If it is tmpfs then > > we'll consume hundreds of megabytes of RAM. > > As I understand it it is not tmpfs unless you explicitly ask for it. > Correct emptyDir does not use tmpfs unless you explicitly ask for it with medium: memory. Some documentation on emptyDir: https://kubernetes.io/docs/concepts/storage/volumes/#emptydir > > If it's NFS, it also won't work. > > If you mean NFS share mounted on the host (and not provisioned NFS volume) > then it is in theory possible, but I cannot say how common is this > configuration in the field. Alexander, is this configuration used? > I have never seen it, but admittedly I haven't seen many production clusters. > > And also it's described as being shared between containers in the POD - do > > multiple copies of virt-v2v run in different containers in the same POD? If > > they do that's another reason it couldn't be used. > > The only containers in our POD is VDDK container and the conversion > container. There's no risk of multiple virt-v2v instances sharing the space.
I forgot to link the upstream discussion back in this bug. Here it is: https://www.redhat.com/archives/libguestfs/2020-June/msg00052.html https://www.redhat.com/archives/libguestfs/2020-June/msg00065.html https://www.redhat.com/archives/libguestfs/2020-June/msg00066.html
We don't need to expose it to the user if it is alright to hardcode the value to always use emptyDir. In that case we don't even need to show the temp disk in the import wizard as it is an implementation detail and there is nothing to be customized.
(In reply to Filip Krepinsky from comment #52) > We don't need to expose it to the user if it is alright to hardcode the > value to always use emptyDir. In that case we don't even need to show the > temp disk in the import wizard as it is an implementation detail and there > is nothing to be customized. Ack
This bug is verified on CNV 2.5, currently there is no need in using conversion PV and this bug is not relevant any more
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196