Bug 1805627 - VM cannot be accessed with console
Summary: VM cannot be accessed with console
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 2.3.0
Assignee: sgott
QA Contact: zhe peng
URL:
Whiteboard:
: 1803107 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-21 08:31 UTC by Radim Hrazdil
Modified: 2023-09-15 01:02 UTC (History)
10 users (show)

Fixed In Version: hco-bundle-registry-container-v2.2.0-445 virt-operator-container-v2.3.0-33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 19:10:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vm yaml (3.06 KB, text/plain)
2020-02-21 08:31 UTC, Radim Hrazdil
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-32998 0 None None None 2023-09-15 00:40:39 UTC
Red Hat Product Errata RHEA-2020:2011 0 None None None 2020-05-04 19:11:06 UTC

Description Radim Hrazdil 2020-02-21 08:31:00 UTC
Created attachment 1664652 [details]
vm yaml

Description of problem:
When I create a Ferora/RHEL VM, I'm unable to access the VM with UI console or virtctl console <vm-name>

Error returned by virtctl command:
Successfully connected to fedoracloudbase console. The escape sequence is ^]

You were disconnected from the console. This has one of the following reasons:
 - another user connected to the console of the target vm
 - network issues
websocket: close 1006 (abnormal closure): unexpected EOF

Although it says successfully at first connected, the connection is closed immediately.



Version-Release number of selected component (if applicable):
4.4.0-0.nightly-2020-02-17-022408
CNV 2.3

How reproducible:
100%

Steps to Reproduce:
1. Create VM from the attached yaml
2. Start it
3. Open console, either in UI of virtctl console fedoracloudbase

Actual results:
Error stated above

Expected results:


Additional info:
I have also tried the same with cirros VM created from container, where the console works, fine. So I suspected it could be storage related, however, creating the fedora VM with kubevirt/fedora-cloud-container-disk-demo
container leads to the same error.

Comment 1 Radim Hrazdil 2020-02-21 11:12:27 UTC
I have also noticed that the very VM I have atteched to this BZ is being restarted in regular intervals (15-20 minutes). 

Events:
  Type    Reason                      Age                    From                       Message
  ----    ------                      ----                   ----                       -------
  Normal  SuccessfulDataVolumeCreate  177m                   virtualmachine-controller  Created DataVolume fedoracloudbase-rootdisk
  Normal  SuccessfulDelete            168m                   virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance d52fc09e-2e7c-444e-bb8d-5819c2eda94d
  Normal  SuccessfulDelete            162m                   virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance 45f48e99-0202-4a55-82c5-2b98e289c683
  Normal  SuccessfulDelete            156m                   virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance 69a325b8-99ef-42c5-a480-36e412f90789
  Normal  SuccessfulDelete            149m                   virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance 5de46652-c263-4a01-ac73-8fe934fdb9bb
  Normal  SuccessfulDelete            143m                   virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance 9bb77304-78b5-4dcc-bbfe-21ed5d6b8b3f
  Normal  SuccessfulDelete            136m                   virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance 500f65e8-8c08-461c-ab60-5bef21ed8539
  Normal  SuccessfulDelete            130m                   virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance 2d073947-3201-4e91-8196-0cc6e31eb90f
  Normal  SuccessfulDelete            123m                   virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance b1f8c4ec-668f-4eaa-bbb1-d86ba6794a03
  Normal  SuccessfulDelete            117m                   virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance a2068a74-63b5-48e1-bcd8-9606143388f5
  Normal  SuccessfulCreate            3m15s (x28 over 175m)  virtualmachine-controller  Started the virtual machine by creating the new virtual machine instance fedoracloudbase
  Normal  SuccessfulDelete            3m15s (x18 over 111m)  virtualmachine-controller  (combined from similar events): Stopped the virtual machine by deleting the virtual machine instance f939f550-624f-4ba4-a161-1b72f7322512

Comment 2 Radim Hrazdil 2020-02-24 12:51:37 UTC
Retested on freshly deployed environment today.
Still observing the same issue, tested both nfs and standard (cinder) storage classes.


Noticed this error in VM events:
Generated from virt-handler
server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-02-24T12:45:38.282601Z qemu-kvm: -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=drive-ua-rootdisk,id=ua-rootdisk,bootindex=1,write-cache=on: Could not reopen file: Permission denied')"

Comment 3 Radim Hrazdil 2020-02-24 15:11:05 UTC
I think I have figured it out.

When I create a VM in a way that I first create a DV with the imported OS image, wait for the import to finish and then create the VM using the existing PVC, everything works fine.
However, when the VM manifest contains definition of the datavolume in dataVolumeTemplates, I encounter all the issues stated in this bug description.

Discussed with Ruth Netser and it seems that similar issue has been already encountered with rook-ceph in BZ1796342.
It seems that the same issue may be happening with NFS storage.

Comment 4 Alexander Wels 2020-02-24 18:38:09 UTC
I just tried to replicate this on an NFS server with CDI 1.13 and KUBEVIRT 0.26 and I was unable to reproduce. The VM started as soon as the importer was and it booted successfully.

Comment 5 Alexander Wels 2020-02-24 19:13:50 UTC
I am able to reproduce on the provided system, this is what is in the virt-launcher log. It then repeats the last few lines over and over. @Stu maybe you can shed some light on it. To me it looks like its failing starting libvirt.

{"component":"virt-launcher","level":"info","msg":"Collected all requested hook sidecar sockets","pos":"manager.go:68","timestamp":"2020-02-24T19:06:14.357247Z"}
{"component":"virt-launcher","level":"info","msg":"Sorted all collected sidecar sockets per hook point based on their priority and name: map[]","pos":"manager.go:71","timestamp":"2020-02-24T19:06:14.357325Z"}
{"component":"virt-launcher","level":"info","msg":"Watchdog file created at /var/run/kubevirt/watchdog-files/default_test-import","pos":"virt-launcher.go:143","timestamp":"2020-02-24T19:06:14.358637Z"}
{"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon: qemu:///system","pos":"libvirt.go:355","timestamp":"2020-02-24T19:06:14.372098Z"}
{"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon failed: virError(Code=38, Domain=7, Message='Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory')","pos":"libvirt.go:363","timestamp":"2020-02-24T19:06:14.372695Z"}
{"component":"virt-launcher","level":"info","msg":"libvirt version: 5.6.0, package: 10.module+el8.1.1+5309+6d656f05 (Red Hat, Inc. \u003chttp://bugzilla.redhat.com/bugzilla\u003e, 2019-12-20-01:08:55, )","subcomponent":"libvirt","thread":"55","timestamp":"2020-02-24T19:06:14.398000Z"}
{"component":"virt-launcher","level":"info","msg":"hostname: test-import","subcomponent":"libvirt","thread":"55","timestamp":"2020-02-24T19:06:14.398000Z"}
{"component":"virt-launcher","level":"error","msg":"internal error: Unable to get DBus system bus connection: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory","pos":"virDBusGetSystemBus:110","subcomponent":"libvirt","thread":"55","timestamp":"2020-02-24T19:06:14.398000Z"}
{"component":"virt-launcher","level":"error","msg":"internal error: Child process (/usr/sbin/dmidecode -q -t 0,1,2,3,4,17) unexpected exit status 1: /dev/mem: No such file or directory","pos":"virCommandWait:2796","subcomponent":"libvirt","thread":"55","timestamp":"2020-02-24T19:06:14.401000Z"}
{"component":"virt-launcher","level":"info","msg":"Connected to libvirt daemon","pos":"libvirt.go:371","timestamp":"2020-02-24T19:06:14.874779Z"}
{"component":"virt-launcher","level":"info","msg":"Registered libvirt event notify callback","pos":"client.go:294","timestamp":"2020-02-24T19:06:14.879435Z"}
{"component":"virt-launcher","level":"info","msg":"Marked as ready","pos":"virt-launcher.go:73","timestamp":"2020-02-24T19:06:14.879788Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Executing PreStartHook on VMI pod environment","name":"test-import","namespace":"default","pos":"manager.go:775","timestamp":"2020-02-24T19:06:18.450354Z","uid":"8f410d55-0423-4b50-9394-bab936061f15"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Starting PreCloudInitIso hook","name":"test-import","namespace":"default","pos":"manager.go:794","timestamp":"2020-02-24T19:06:18.450437Z","uid":"8f410d55-0423-4b50-9394-bab936061f15"}
{"component":"virt-launcher","kind":"","level":"info","msg":"masquerade pod interface: VIF: { Name: eth0, IP: 10.0.2.2, Mask: ffffff00, MAC: 02:e4:00:00:00:00, Gateway: 10.0.2.1, MTU: 1400}","name":"test-import","namespace":"default","pos":"podinterface.go:446","timestamp":"2020-02-24T19:06:18.491965Z","uid":"8f410d55-0423-4b50-9394-bab936061f15"}
{"component":"virt-launcher","level":"info","msg":"Found nameservers in /etc/resolv.conf: \ufffd\u001e\u0000\n","pos":"converter.go:1518","timestamp":"2020-02-24T19:06:18.492113Z"}
{"component":"virt-launcher","level":"info","msg":"Found search domains in /etc/resolv.conf: default.svc.cluster.local svc.cluster.local cluster.local rhrazdil42.cnv-qe.rhcloud.com","pos":"converter.go:1519","timestamp":"2020-02-24T19:06:18.492156Z"}
{"component":"virt-launcher","level":"info","msg":"Starting SingleClientDHCPServer","pos":"dhcp.go:64","timestamp":"2020-02-24T19:06:18.492325Z"}
{"component":"virt-launcher","level":"info","msg":"Driver cache mode for /var/run/kubevirt-private/vmi-disks/rootdisk/disk.img set to none","pos":"converter.go:176","timestamp":"2020-02-24T19:06:18.495773Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain defined.","name":"test-import","namespace":"default","pos":"manager.go:1054","timestamp":"2020-02-24T19:06:18.500589Z","uid":"8f410d55-0423-4b50-9394-bab936061f15"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 0 with reason 0 received","pos":"client.go:259","timestamp":"2020-02-24T19:06:18.500739Z"}
{"component":"virt-launcher","level":"error","msg":"internal error: Unable to get DBus system bus connection: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory","pos":"virDBusGetSystemBus:110","subcomponent":"libvirt","thread":"40","timestamp":"2020-02-24T19:06:18.503000Z"}
{"component":"virt-launcher","level":"error","msg":"At least one cgroup controller is required: No such device or address","pos":"virCgroupDetect:420","subcomponent":"libvirt","thread":"40","timestamp":"2020-02-24T19:06:18.522000Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Paused(3):StartingUp(11)","pos":"client.go:180","timestamp":"2020-02-24T19:06:18.525248Z"}
{"component":"virt-launcher","level":"info","msg":"Detected domain with UUID c6f91c69-9dcc-5c33-bec3-febfe3ccf2ed","pos":"virt-launcher.go:234","timestamp":"2020-02-24T19:06:18.527403Z"}
{"component":"virt-launcher","level":"info","msg":"Monitoring loop: rate 1s start timeout 5m0s","pos":"monitor.go:237","timestamp":"2020-02-24T19:06:18.527473Z"}
{"component":"virt-launcher","level":"error","msg":"Unable to read from monitor: Connection reset by peer","pos":"qemuMonitorIORead:604","subcomponent":"libvirt","thread":"39","timestamp":"2020-02-24T19:06:18.608000Z"}
{"component":"virt-launcher","level":"error","msg":"internal error: qemu unexpectedly closed the monitor: 2020-02-24T19:06:18.607246Z qemu-kvm: -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=drive-ua-rootdisk,id=ua-rootdisk,bootindex=1,write-cache=on: Could not reopen file: Permission denied","pos":"qemuProcessReportLogError:2098","subcomponent":"libvirt","thread":"39","timestamp":"2020-02-24T19:06:18.608000Z"}
{"component":"virt-launcher","kind":"","level":"error","msg":"Starting the VirtualMachineInstance failed.","name":"test-import","namespace":"default","pos":"manager.go:1082","reason":"virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-02-24T19:06:18.607246Z qemu-kvm: -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=drive-ua-rootdisk,id=ua-rootdisk,bootindex=1,write-cache=on: Could not reopen file: Permission denied')","timestamp":"2020-02-24T19:06:18.810464Z","uid":"8f410d55-0423-4b50-9394-bab936061f15"}
{"component":"virt-launcher","kind":"","level":"error","msg":"Failed to sync vmi","name":"test-import","namespace":"default","pos":"server.go:161","reason":"virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-02-24T19:06:18.607246Z qemu-kvm: -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=drive-ua-rootdisk,id=ua-rootdisk,bootindex=1,write-cache=on: Could not reopen file: Permission denied')","timestamp":"2020-02-24T19:06:18.810572Z","uid":"8f410d55-0423-4b50-9394-bab936061f15"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 0 with reason 1 received","pos":"client.go:259","timestamp":"2020-02-24T19:06:18.843910Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"test-import","namespace":"default","pos":"server.go:167","timestamp":"2020-02-24T19:06:18.844704Z","uid":"8f410d55-0423-4b50-9394-bab936061f15"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Shutoff(5):Failed(6)","pos":"client.go:180","timestamp":"2020-02-24T19:06:18.845589Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 0 with reason 1 received","pos":"client.go:259","timestamp":"2020-02-24T19:06:18.851279Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"test-import","namespace":"default","pos":"server.go:167","timestamp":"2020-02-24T19:06:18.852291Z","uid":"8f410d55-0423-4b50-9394-bab936061f15"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Shutoff(5):Failed(6)","pos":"client.go:180","timestamp":"2020-02-24T19:06:18.852781Z"}

Comment 6 sgott 2020-02-24 19:27:15 UTC
What version of HCO/virt-operator is deployed on this system?

Comment 7 sgott 2020-02-24 21:10:00 UTC
The root cause of this issue appears to be file ownership. On the system exhibiting this issue, /var/run/kubevirt-private/vmi-disks/rootdisk/disk.img is owned by root.

10106136 -rwxr-xr-x. 1 root root 10447220736 Feb 24 20:48 /var/run/kubevirt-private/vmi-disks/rootdisk/disk.img

we changed the file ownership to be qemu (uid/gid 107) and virt-launcher was able to run properly:

10106136 -rwxr-xr-x. 1 qemu qemu 10447220736 Feb 24 20:48 /var/run/kubevirt-private/vmi-disks/rootdisk/disk.img

As an experiment, we added a securityContext of "fsGroup: 107", and that caused openshift to alter the permissions of the file:

10106136 -rwxrwxr-x. 1 root qemu 10447220736 Feb 24 20:53 /var/run/kubevirt-private/vmi-disks/rootdisk/disk.img

Comment 8 sgott 2020-02-24 22:52:14 UTC
KubeVirt-side fix implemented here: https://github.com/kubevirt/kubevirt/pull/3100

It's possible we might also want to address ownership in CDI at some point, but the above is sufficient to address this for now.

Comment 10 Qixuan Wang 2020-03-05 14:24:30 UTC
*** Bug 1803107 has been marked as a duplicate of this bug. ***

Comment 11 zhe peng 2020-03-12 08:53:23 UTC
I can reproduce this.
verefy with build:
$ oc version
Client Version: 4.4.0-0.nightly-2020-02-17-022408
Server Version: 4.4.0-0.nightly-2020-03-06-170328
Kubernetes Version: v1.17.1
$ virtctl version
Client Version: version.Info{GitVersion:"v0.26.1", GitCommit:"e40ff7965e2aadbf21131626dfa3be85524e3a2c", GitTreeState:"clean", BuildDate:"2020-02-19T16:16:36Z", GoVersion:"go1.12.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{GitVersion:"v0.26.3", GitCommit:"e053fc2fe81102215a5e0cac2fbb705348f52ff1", GitTreeState:"clean", BuildDate:"2020-03-11T13:14:00Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

step: 
1 create a vm with yaml file
2 start it
3 use virtctl console to connect it, 
$ virtctl console fedoracloudbase
Successfully connected to fedoracloudbase console. The escape sequence is ^]

fedoracloudbase login:

use UI to connect it, worked.

check disk.img, it's Qemu:qemu

ls /var/run/kubevirt-private/vmi-disks/rootdisk/disk.img -l
-rw-r--r--. 1 qemu qemu 10737418240 Mar 12 08:49 /var/run/kubevirt-private/vmi-disks/rootdisk/disk.img

move to verified.

Comment 14 errata-xmlrpc 2020-05-04 19:10:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011

Comment 15 Vladik Romanovsky 2020-12-14 16:50:04 UTC
Alexander, Adam,

Was this permissions issue addressed in CDI somehow?

Comment 16 Alexander Wels 2020-12-14 17:39:32 UTC
On the CDI side we added fsGroup to the importer pods, which should set the group to the correct user for KubeVirt. I also believe KubeVirt did some work to set it properly if needed.

Comment 17 Red Hat Bugzilla 2023-09-15 00:29:46 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.