Created attachment 1976227 [details] vm with 2 disks Description of problem: Can't create a snapshot for VM with containerDisk and *mounted* PVC disk > $ oc get vmsnapshot --watch > NAME SOURCEKIND SOURCENAME PHASE READYTOUSE CREATIONTIME ERROR > snapshot-uninterested-cheetah VirtualMachine vm-fedora-with-pvc InProgress false > snapshot-uninterested-cheetah VirtualMachine vm-fedora-with-pvc Failed false > $ oc get vmsnapshot snapshot-uninterested-cheetah -o json | jq .status.conditions >[ >. > { > "lastProbeTime": null, > "lastTransitionTime": "2023-07-17T17:32:10Z", > "reason": "snapshot deadline exceeded", > "status": "True", > "type": "Failure" > } >] With unmounted second disk the snapshot succesfully completed: > $ oc get vmsnapshot > NAME SOURCEKIND SOURCENAME PHASE READYTOUSE CREATIONTIME ERROR > snapshot-recent-flyingfish VirtualMachine vm-fedora-with-pvc Succeeded true 8s Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. create VM with containerDisk (root) and PVC (second) 2. go to the VM console 3. create ext4 filesystem on the second disk - mkfs.ext4 /dev/vdb 4. mount second disk - mkdir /mnt/test mount /dev/vdb /mnt/test 5. create new file inside that folder with some text - e.g. vi /mnt/test/TEST_FILE 6. try to make a snapshot Actual results: failed to create snapshot Expected results: snapshot created succesfully Additional info:
for info - see same behavior with Fedora38 and RHEL9.2
Summarizing offline chats: The underlying issue is a failure in guest agent freeze: {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to freeze vmi","name":"vm-fedora-with-pvc","namespace":"test-clone","pos":"server.go:269","reason":"virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': failed to open /mnt/test: Permission denied')","timestamp":"2023-07-17T18:23:58.258646Z","uid":"04ce94f3-5f77-472a-9c61-21eb0f0fb41f"} The corresponding bug for this scenario, and its conclusion is here: https://bugzilla.redhat.com/show_bug.cgi?id=1747960#c35 Some comments on the bug suggest qemu-ga cannot do anything more than expose this (off by default) boolean: https://bugzilla.redhat.com/show_bug.cgi?id=1747960#c20 https://bugzilla.redhat.com/show_bug.cgi?id=1747960#c22 So I am not sure if there's anything we can do on the CNV side, But I am curious about how this has not bugged other users before
Thanks for the explanation Alex. I think single disk VMs are overwhelmingly the norm in the field. Also, I wonder if this would reproduce if the second disk is block and initialized with LVM. In any case, I think we should have a KCS article for this topic. Adding Jean-Francois: What do you think?
(In reply to Adam Litke from comment #3) > I think single disk VMs are overwhelmingly the norm in the field. From a quick look at hotplug tests, this looks like a common pattern (minus taking a snapshot at the end), but yeah I agree about single-disk VMs being the norm
Whoops messed up the needinfo. Michael, I was about to ask if you there is anything we can do from our side like: - Integrate this selinux bool in our golden images - Change the boolean before calling freeze Both seem risky to me, as this should be something that is consciously done by the VM owner
I don't think we should change any VM settings.