Bug 1849561 - VM cannot be accessed with console
Summary: VM cannot be accessed with console
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 2.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Adam Litke
QA Contact: Ying Cui
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-22 08:52 UTC by Qixuan Wang
Modified: 2020-07-07 11:19 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-07 11:19:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1845901 0 high CLOSED Filesystem corruption related to smart clone 2021-02-22 00:41:40 UTC

Description Qixuan Wang 2020-06-22 08:52:16 UTC
Description of problem:
Clone a Fedora or RHEL disk image from source PVC then create a VM with this DV, after VM running, access it via console, it's repeating the following in a loop. (Cirros works well)

[    4.407476] Code: 02 00 73 0c 65 48 0f b3 1c 25 d6 08 02 00 eb 05 49 0f ba ee 3f 41 0f 22 de e9 cb f7 ff ff fc 56 48 8b 74 24 08 48 89 7c 24 08 <52> 31 d2 51 31 c9 50 41 50 45 31 c0 41 51 45 31 c9 41 52 45 31 d2
[    4.407476] RSP: 0000:fffffe0000006000 EFLAGS: 00010086
[    4.407476] RAX: aaaaaaaaaaaaaaab RBX: 0000000000000000 RCX: 000000000000000c
[    4.407476] RDX: 0000000000000728 RSI: ffffffff99a00ffa RDI: fffffe0000006050
[    4.407476] RBP: ffffffff994b6411 R08: ffffffff9996d010 R09: 0000000000000000
[    4.407476] R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000000
[    4.407476] R13: 0000000000000003 R14: ffff8b430318be80 R15: 0000000000000000
[    4.407476] FS:  0000000000000000(0000) GS:ffff8b4303600000(0000) knlGS:0000000000000000
[    4.407476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.407476] CR2: fffffe0000005ff8 CR3: 000000000220e001 CR4: 0000000000160ef0
[    4.407476] Call Trace:
[    4.407476] PANIC: double fault, error_code: 0x0
[    4.407476] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.9-301.fc30.x86_64 #1
[    4.407476] Hardware name: Red Hat Container-native virtualization/RHEL-AV, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014
[    4.407476] RIP: 0010:error_entry+0xc/0x100
[    4.407476] Code: 02 00 73 0c 65 48 0f b3 1c 25 d6 08 02 00 eb 05 49 0f ba ee 3f 41 0f 22 de e9 cb f7 ff ff fc 56 48 8b 74 24 08 48 89 7c 24 08 <52> 31 d2 51 31 c9 50 41 50 45 31 c0 41 51 45 31 c9 41 52 45 31 d2
[    4.407476] RSP: 0000:fffffe0000006000 EFLAGS: 00010086
[    4.407476] RAX: aaaaaaaaaaaaaaab RBX: 0000000000000000 RCX: 000000000000000c
[    4.407476] RDX: 0000000000000728 RSI: ffffffff99a00ffa RDI: fffffe0000006050
[    4.407476] RBP: ffffffff994b6411 R08: ffffffff9996d010 R09: 0000000000000000
[    4.407476] R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000000
[    4.407476] R13: 0000000000000003 R14: ffff8b430318be80 R15: 0000000000000000
[    4.407476] FS:  0000000000000000(0000) GS:ffff8b4303600000(0000) knlGS:0000000000000000
[    4.407476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.407476] CR2: fffffe0000005ff8 CR3: 000000000220e001 CR4: 0000000000160ef0
[    4.407476] Call Trace:
[    4.407476] PANIC: double fault, error_code: 0x0
[    4.407476] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.9-301.fc30.x86_64 #1
[    4.407476] Hardware name: Red Hat Container-native virtualization/RHEL-AV, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014
[    4.407476] RIP: 0010:error_entry+0xc/0x100
[    4.407476] Code: 02 00 73 0c 65 48 0f b3 1c 25 d6 08 02 00 eb 05 49 0f ba ee 3f 41 0f 22 de e9 cb f7 ff ff fc 56 48 8b 74 24 08 48 89 7c 24 08 <52> 31 d2 51 31 c9 50 41 50 45 31 c0 41 51 45 31 c9 41 52 45 31 d2

You were disconnected from the console. This has one of the following reasons:
 - another user connected to the console of the target vm
 - network issues
websocket: close 1006 (abnormal closure): unexpected EOF


Version-Release number of selected component (if applicable):
OCP 4.6.0-0.nightly-2020-06-09-081928
CNV 2.5.0-2
CDI 2.5.0-2


How reproducible:
Sometimes


Steps to Reproduce:
1. Create a source DV
2. Clone disk image from source DV to target DV
3. Create a VM with the target DV
4. Run the VM and access it 
$ virtctl console <vm>


Actual results:
4. See description.


Expected results:
4. Can log in to the fedora VM.


Additional info:

[cloud-user@ocp-psi-executor ~]$ oc -n test-hostpath  get pod -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP            NODE                           NOMINATED NODE   READINESS GATES
virt-launcher-fedora-vm-qklbf   1/1     Running   0          81s   10.131.0.95   verify-25-82nmv-worker-45xm4   <none>           <none>


[cloud-user@ocp-psi-executor ~]$ oc -n test-hostpath  logs -f virt-launcher-fedora-vm-qklbf
{"component":"virt-launcher","level":"info","msg":"Collected all requested hook sidecar sockets","pos":"manager.go:68","timestamp":"2020-06-22T07:47:05.662853Z"}
{"component":"virt-launcher","level":"info","msg":"Sorted all collected sidecar sockets per hook point based on their priority and name: map[]","pos":"manager.go:71","timestamp":"2020-06-22T07:47:05.662992Z"}
{"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon: qemu:///system","pos":"libvirt.go:356","timestamp":"2020-06-22T07:47:05.683997Z"}
{"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon failed: virError(Code=38, Domain=7, Message='Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory')","pos":"libvirt.go:364","timestamp":"2020-06-22T07:47:05.684854Z"}
{"component":"virt-launcher","level":"info","msg":"libvirt version: 6.0.0, package: 17.2.module+el8.2.0+6629+3fc0f2c2 (Red Hat, Inc. \u003chttp://bugzilla.redhat.com/bugzilla\u003e, 2020-05-14-11:03:02, )","subcomponent":"libvirt","thread":"45","timestamp":"2020-06-22T07:47:05.722000Z"}
{"component":"virt-launcher","level":"info","msg":"hostname: fedora-vm","subcomponent":"libvirt","thread":"45","timestamp":"2020-06-22T07:47:05.722000Z"}
{"component":"virt-launcher","level":"error","msg":"internal error: Child process (/usr/sbin/dmidecode -q -t 0,1,2,3,4,17) unexpected exit status 1: /dev/mem: No such file or directory","pos":"virCommandWait:2713","subcomponent":"libvirt","thread":"45","timestamp":"2020-06-22T07:47:05.722000Z"}
{"component":"virt-launcher","level":"info","msg":"Connected to libvirt daemon","pos":"libvirt.go:372","timestamp":"2020-06-22T07:47:06.186985Z"}
{"component":"virt-launcher","level":"info","msg":"Registered libvirt event notify callback","pos":"client.go:372","timestamp":"2020-06-22T07:47:06.190826Z"}
{"component":"virt-launcher","level":"info","msg":"Marked as ready","pos":"virt-launcher.go:72","timestamp":"2020-06-22T07:47:06.191089Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Executing PreStartHook on VMI pod environment","name":"fedora-vm","namespace":"test-hostpath","pos":"manager.go:922","timestamp":"2020-06-22T07:47:09.561643Z","uid":"f71bc953-2dcd-400a-bb09-f06dc0d4d14f"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Starting PreCloudInitIso hook","name":"fedora-vm","namespace":"test-hostpath","pos":"manager.go:941","timestamp":"2020-06-22T07:47:09.561725Z","uid":"f71bc953-2dcd-400a-bb09-f06dc0d4d14f"}
{"component":"virt-launcher","kind":"","level":"info","msg":"bridge pod interface: VIF: { Name: eth0, IP: 10.131.0.95, Mask: fffffe00, MAC: 0a:58:0a:83:00:5f, Gateway: 10.131.0.1, MTU: 1400, IPAMDisabled: false} \u0026{vmi:0xc000178800 vif:0xc001ae6fc0 iface:0xc001ab0460 virtIface:0xc0002b6f00 podNicLink:\u003cnil\u003e domain:0xc000343400 podInterfaceName:eth0 bridgeInterfaceName:k6t-eth0}","name":"fedora-vm","namespace":"test-hostpath","pos":"podinterface.go:317","timestamp":"2020-06-22T07:47:09.562377Z","uid":"f71bc953-2dcd-400a-bb09-f06dc0d4d14f"}
{"component":"virt-launcher","level":"info","msg":"Found nameservers in /etc/resolv.conf: \ufffd\u001e\u0000\n","pos":"converter.go:1548","timestamp":"2020-06-22T07:47:09.562520Z"}
{"component":"virt-launcher","level":"info","msg":"Found search domains in /etc/resolv.conf: test-hostpath.svc.cluster.local svc.cluster.local cluster.local verify-25.cnv-qe.rhcloud.com","pos":"converter.go:1549","timestamp":"2020-06-22T07:47:09.562560Z"}
{"component":"virt-launcher","level":"info","msg":"Driver cache mode for /var/run/kubevirt-private/vmi-disks/dv-disk/disk.img set to none","pos":"converter.go:184","timestamp":"2020-06-22T07:47:09.562709Z"}
{"component":"virt-launcher","level":"info","msg":"Starting SingleClientDHCPServer","pos":"dhcp.go:64","timestamp":"2020-06-22T07:47:09.563315Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain defined.","name":"fedora-vm","namespace":"test-hostpath","pos":"manager.go:1200","timestamp":"2020-06-22T07:47:09.994327Z","uid":"f71bc953-2dcd-400a-bb09-f06dc0d4d14f"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 0 with reason 0 received","pos":"client.go:337","timestamp":"2020-06-22T07:47:09.994797Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d vm-ctx=+107:+107 img-ctx=+107:+107 model=dac: Permission denied","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:09.999000Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm resrc=net reason=open vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d net=0a:58:0a:83:00:5f path=\"/dev/net/tun\" rdev=0A:C8: Permission denied\ufffd\ufffd\ufffd\u0026","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.020000Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm resrc=net reason=open vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d net=0a:58:0a:83:00:5f path=\"/dev/vhost-net\" rdev=0A:EE: Permission denied","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.020000Z"}
{"component":"virt-launcher","level":"error","msg":"At least one cgroup controller is required: No such device or address","pos":"virCgroupDetect:417","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.031000Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Shutoff(5):Unknown(0)","pos":"client.go:231","timestamp":"2020-06-22T07:47:10.046121Z"}
{"component":"virt-launcher","level":"info","msg":"7) OSINFO IN EVENT CALLBACK: \u003cnil\u003e","pos":"client.go:253","timestamp":"2020-06-22T07:47:10.050762Z"}
{"component":"virt-launcher","level":"info","msg":"Detected domain with UUID bb0bebea-15a4-55d1-8a61-c5e10c294b1d","pos":"virt-launcher.go:213","timestamp":"2020-06-22T07:47:10.050880Z"}
{"component":"virt-launcher","level":"info","msg":"Monitoring loop: rate 1s start timeout 5m0s","pos":"monitor.go:164","timestamp":"2020-06-22T07:47:10.050916Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: test-hostpath_fedora-vm","pos":"client.go:312","timestamp":"2020-06-22T07:47:10.051751Z"}
{"component":"virt-launcher","level":"info","msg":"GuestAgentLifecycle event state 2 with reason 1 received","pos":"client.go:355","timestamp":"2020-06-22T07:47:10.113333Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Paused(3):StartingUp(11)","pos":"client.go:231","timestamp":"2020-06-22T07:47:10.118186Z"}
{"component":"virt-launcher","level":"info","msg":"7) OSINFO IN EVENT CALLBACK: \u003cnil\u003e","pos":"client.go:253","timestamp":"2020-06-22T07:47:10.118237Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: test-hostpath_fedora-vm","pos":"client.go:312","timestamp":"2020-06-22T07:47:10.120100Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm resrc=disk reason=start vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d old-disk=\"?\" new-disk=\"/var/run/kubevirt-private/vmi-disks/dv-disk/disk.img\": Permission denied\ufffd\ufffd\ufffd\u0026","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.563000Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm resrc=net reason=start vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d old-net=\"?\" new-net=\"0a:58:0a:83:00:5f\": Permission denied\ufffdC\ufffd\u0026","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.563000Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm resrc=chardev reason=start vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d old-chardev=\"?\" new-chardev=\"/var/run/kubevirt-private/f71bc953-2dcd-400a-bb09-f06dc0d4d14f/virt-serial0\": Permission denied\ufffd\ufffd\ufffd\u0026","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.563000Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm resrc=chardev reason=start vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d old-chardev=\"?\" new-chardev=\"/var/lib/libvirt/qemu/channel/target/domain-1-test-hostpath_fedora/org.qemu.guest_agent.0\": Permission denied\ufffd\ufffd\ufffd\u0026","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.563000Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm resrc=rng reason=start vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d old-rng=\"?\" new-rng=\"/dev/urandom\": Permission denied\ufffd\ufffd\ufffd\u0026","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.563000Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm resrc=mem reason=start vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d old-mem=0 new-mem=63488: Permission denied\ufffd\ufffd\ufffd\u0026","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.563000Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm resrc=vcpu reason=start vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d old-vcpu=0 new-vcpu=1: Permission denied\ufffd\ufffd\ufffd\u0026","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.564000Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm resrc=iothread reason=start vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d old-iothread=0 new-iothread=1: Permission denied\ufffd\ufffd\ufffd\u0026","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.564000Z"}
{"component":"virt-launcher","level":"warning","msg":"Failed to send audit message virt=kvm op=start reason=booted vm=\"test-hostpath_fedora-vm\" uuid=bb0bebea-15a4-55d1-8a61-c5e10c294b1d vm-pid=74: Permission denied\ufffd@\ufffd\u0026","pos":"virAuditSend:141","subcomponent":"libvirt","thread":"33","timestamp":"2020-06-22T07:47:10.564000Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 4 with reason 0 received","pos":"client.go:337","timestamp":"2020-06-22T07:47:10.564458Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 2 with reason 0 received","pos":"client.go:337","timestamp":"2020-06-22T07:47:10.564651Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain started.","name":"fedora-vm","namespace":"test-hostpath","pos":"manager.go:1235","timestamp":"2020-06-22T07:47:10.565476Z","uid":"f71bc953-2dcd-400a-bb09-f06dc0d4d14f"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"fedora-vm","namespace":"test-hostpath","pos":"server.go:167","timestamp":"2020-06-22T07:47:10.567140Z","uid":"f71bc953-2dcd-400a-bb09-f06dc0d4d14f"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Running(1):Unknown(1)","pos":"client.go:231","timestamp":"2020-06-22T07:47:10.568420Z"}
{"component":"virt-launcher","level":"info","msg":"7) OSINFO IN EVENT CALLBACK: \u003cnil\u003e","pos":"client.go:253","timestamp":"2020-06-22T07:47:10.568479Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: test-hostpath_fedora-vm","pos":"client.go:312","timestamp":"2020-06-22T07:47:10.571432Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Running(1):Unknown(1)","pos":"client.go:231","timestamp":"2020-06-22T07:47:10.573070Z"}
{"component":"virt-launcher","level":"info","msg":"7) OSINFO IN EVENT CALLBACK: \u003cnil\u003e","pos":"client.go:253","timestamp":"2020-06-22T07:47:10.573109Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: test-hostpath_fedora-vm","pos":"client.go:312","timestamp":"2020-06-22T07:47:10.574422Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"fedora-vm","namespace":"test-hostpath","pos":"server.go:167","timestamp":"2020-06-22T07:47:10.661584Z","uid":"f71bc953-2dcd-400a-bb09-f06dc0d4d14f"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"fedora-vm","namespace":"test-hostpath","pos":"server.go:167","timestamp":"2020-06-22T07:47:10.703675Z","uid":"f71bc953-2dcd-400a-bb09-f06dc0d4d14f"}
{"component":"virt-launcher","level":"info","msg":"Found PID for bb0bebea-15a4-55d1-8a61-c5e10c294b1d: 74","pos":"monitor.go:133","timestamp":"2020-06-22T07:47:11.053726Z"}



On the node where disk image is
sh-4.4# ls -lsh
total 892M
892M -rw-r--r--. 1 qemu qemu 10G Jun 22 07:47 disk.img



[cloud-user@ocp-psi-executor ~]$ oc -n test-hostpath  get vm
NAME        AGE   RUNNING   VOLUME
fedora-vm   12s   true

[cloud-user@ocp-psi-executor ~]$ oc -n test-hostpath  get vm -o yaml
apiVersion: v1
items:
- apiVersion: kubevirt.io/v1alpha3
  kind: VirtualMachine
  metadata:
    annotations:
      kubevirt.io/latest-observed-api-version: v1alpha3
      kubevirt.io/storage-observed-api-version: v1alpha3
    creationTimestamp: "2020-06-22T08:00:14Z"
    finalizers:
    - k8s.v1.cni.cncf.io/kubeMacPool
    generation: 4
    managedFields:
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:template:
            .: {}
            f:metadata:
              .: {}
              f:labels:
                .: {}
                f:kubevirt.io/domain: {}
                f:kubevirt.io/vm: {}
            f:spec:
              .: {}
              f:domain:
                .: {}
                f:devices:
                  .: {}
                  f:disks: {}
                  f:rng: {}
                f:resources:
                  .: {}
                  f:requests:
                    .: {}
                    f:memory: {}
              f:nodeSelector:
                .: {}
                f:kubernetes.io/hostname: {}
              f:volumes: {}
      manager: Swagger-Codegen
      operation: Update
      time: "2020-06-22T08:00:14Z"
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:running: {}
      manager: virt-api
      operation: Update
      time: "2020-06-22T08:00:14Z"
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubevirt.io/latest-observed-api-version: {}
            f:kubevirt.io/storage-observed-api-version: {}
        f:status:
          f:conditions: {}
          f:created: {}
          f:ready: {}
      manager: virt-controller
      operation: Update
      time: "2020-06-22T08:00:24Z"
    name: fedora-vm
    namespace: test-hostpath
    resourceVersion: "9427870"
    selfLink: /apis/kubevirt.io/v1alpha3/namespaces/test-hostpath/virtualmachines/fedora-vm
    uid: 9c1930cc-ee6f-4712-b74f-184c592edecc
  spec:
    running: true
    template:
      metadata:
        creationTimestamp: null
        labels:
          kubevirt.io/domain: fedora-vm
          kubevirt.io/vm: fedora-vm
      spec:
        domain:
          devices:
            disks:
            - disk:
                bus: virtio
              name: dv-disk
            rng: {}
          machine:
            type: pc-q35-rhel8.2.0
          resources:
            requests:
              memory: 64M
        nodeSelector:
          kubernetes.io/hostname: verify-25-82nmv-worker-45xm4
        volumes:
        - dataVolume:
            name: cnv-3516-target-dv
          name: dv-disk
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: "2020-06-22T08:00:22Z"
      status: "True"
      type: Ready
    created: true
    ready: true
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""



[cloud-user@ocp-psi-executor ~]$ oc -n test-hostpath  get vmi -o yaml
apiVersion: v1
items:
- apiVersion: kubevirt.io/v1alpha3
  kind: VirtualMachineInstance
  metadata:
    annotations:
      kubevirt.io/latest-observed-api-version: v1alpha3
      kubevirt.io/storage-observed-api-version: v1alpha3
    creationTimestamp: "2020-06-22T08:00:14Z"
    finalizers:
    - foregroundDeleteVirtualMachine
    generateName: fedora-vm
    generation: 8
    labels:
      kubevirt.io/domain: fedora-vm
      kubevirt.io/nodeName: verify-25-82nmv-worker-45xm4
      kubevirt.io/vm: fedora-vm
    managedFields:
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubevirt.io/latest-observed-api-version: {}
            f:kubevirt.io/storage-observed-api-version: {}
          f:generateName: {}
          f:labels:
            .: {}
            f:kubevirt.io/domain: {}
            f:kubevirt.io/nodeName: {}
            f:kubevirt.io/vm: {}
          f:ownerReferences: {}
        f:spec:
          .: {}
          f:domain:
            .: {}
            f:devices:
              .: {}
              f:disks: {}
              f:rng: {}
            f:firmware:
              .: {}
              f:uuid: {}
            f:machine:
              .: {}
              f:type: {}
            f:resources:
              .: {}
              f:requests:
                .: {}
                f:memory: {}
          f:nodeSelector:
            .: {}
            f:kubernetes.io/hostname: {}
          f:volumes: {}
        f:status:
          .: {}
          f:activePods:
            .: {}
            f:00d60d74-0a69-4f91-9d61-cc336781132e: {}
          f:conditions: {}
          f:guestOSInfo: {}
          f:nodeName: {}
          f:qosClass: {}
      manager: virt-controller
      operation: Update
      time: "2020-06-22T08:00:24Z"
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:interfaces: {}
          f:migrationMethod: {}
          f:phase: {}
      manager: virt-handler
      operation: Update
      time: "2020-06-22T08:00:24Z"
    name: fedora-vm
    namespace: test-hostpath
    ownerReferences:
    - apiVersion: kubevirt.io/v1alpha3
      blockOwnerDeletion: true
      controller: true
      kind: VirtualMachine
      name: fedora-vm
      uid: 9c1930cc-ee6f-4712-b74f-184c592edecc
    resourceVersion: "9427869"
    selfLink: /apis/kubevirt.io/v1alpha3/namespaces/test-hostpath/virtualmachineinstances/fedora-vm
    uid: a0595468-9f3b-4476-95a0-cac47f4ebace
  spec:
    domain:
      devices:
        disks:
        - disk:
            bus: virtio
          name: dv-disk
        interfaces:
        - bridge: {}
          name: default
        rng: {}
      features:
        acpi:
          enabled: true
      firmware:
        uuid: bb0bebea-15a4-55d1-8a61-c5e10c294b1d
      machine:
        type: pc-q35-rhel8.2.0
      resources:
        requests:
          cpu: 100m
          memory: 64M
    networks:
    - name: default
      pod: {}
    nodeSelector:
      kubernetes.io/hostname: verify-25-82nmv-worker-45xm4
    volumes:
    - dataVolume:
        name: cnv-3516-target-dv
      name: dv-disk
  status:
    activePods:
      00d60d74-0a69-4f91-9d61-cc336781132e: verify-25-82nmv-worker-45xm4
    conditions:
    - lastProbeTime: null
      lastTransitionTime: null
      message: cannot migrate VMI with non-shared PVCs
      reason: DisksNotLiveMigratable
      status: "False"
      type: LiveMigratable
    - lastProbeTime: null
      lastTransitionTime: null
      message: cannot migrate VMI which does not use masquerade to connect to the
        pod network
      reason: InterfaceNotLiveMigratable
      status: "False"
      type: LiveMigratable
    - lastProbeTime: null
      lastTransitionTime: "2020-06-22T08:00:22Z"
      status: "True"
      type: Ready
    guestOSInfo: {}
    interfaces:
    - ipAddress: 10.131.0.99
      mac: 0a:58:0a:83:00:63
      name: default
    migrationMethod: BlockMigration
    nodeName: verify-25-82nmv-worker-45xm4
    phase: Running
    qosClass: Burstable
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 1 Fabian Deutsch 2020-06-22 14:09:26 UTC
Can you please confirm that you were using an 8.2 RHCOS?

Comment 2 sgott 2020-06-22 21:26:42 UTC
Adam, Since DVs were observed as a possible root cause, I'm wondering if your team could help with an initial investigation/triage. Storage team might be able to provide valuable insights.

Comment 3 Qixuan Wang 2020-06-23 08:20:01 UTC
(In reply to Fabian Deutsch from comment #1)
> Can you please confirm that you were using an 8.2 RHCOS?

It is an cluster with OCP 4.6.0-0.nightly-2020-06-09-081928, RHCOS 4.5, el8.2 packages:
[core@verify-25-82nmv-master-1 ~]$ cat /etc/os-release
NAME="Red Hat Enterprise Linux CoreOS"
VERSION="4.5"
VERSION_ID="4.5"
OPENSHIFT_VERSION="4.5"
RHEL_VERSION="8.2"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 4.5 (Ootpa)"
ID="rhcos"
ID_LIKE="rhel fedora"
ANSI_COLOR="0;31"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.5"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.5"
OSTREE_VERSION='46.81.202006031446-0'

It can also be reproduced on a cluster with OCP 4.5.0-rc.2, RHCOS 4.5, el8.2:
[core@storage-mw8pp-master-0 ~]$  cat /etc/os-release
NAME="Red Hat Enterprise Linux CoreOS"
VERSION="45.82.202006190229-0"
VERSION_ID="4.5"
OPENSHIFT_VERSION="4.5"
RHEL_VERSION="8.2"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 45.82.202006190229-0 (Ootpa)"
ID="rhcos"
ID_LIKE="rhel fedora"
ANSI_COLOR="0;31"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.5"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.5"
OSTREE_VERSION='45.82.202006190229-0'

Comment 4 Adam Litke 2020-06-24 14:01:07 UTC
This could be related to 1845901.  We are investigating some fs corruption issues where cloning commences before the data on the source PVC has settled.

Comment 5 sgott 2020-06-30 11:44:35 UTC
Qixuan, assuming that Adam's assessment is correct this would essentially mean that the DV hadn't yet successfully quiesced when the VM was started. Thus the observed panic is essentially filesystem corruption.

Would you be able to repeat the steps to reproduce, only ensure that the clone process was really completed before starting a VM?

Comment 6 sgott 2020-06-30 12:02:03 UTC
Actually, my instructions in Comment #5 are slightly incorrect. It's important to ensure the source disk has been fully populated before starting the clone operation. This will help us determine if it's a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1845901

Also, to ensure you have a correct environment, can you please repeat these steps on a released version of OCP 4.5.x instead of a nightly snapshot?

Comment 7 Alexander Wels 2020-06-30 12:03:39 UTC
Can you run an md5sum on the source disk.img and the target disk.img? Does the source boot?

Comment 9 Qixuan Wang 2020-07-02 16:33:37 UTC
Tested with OCP 4.5.0-rc.4, CNV v2.4.0-52 and CDI v2.4.0-25

I can NOT reproduce the bug when I did the whole steps manually. The content, size, user/group of the source image and target(cloned) image are the same, and VM can boot with source DV or target DV and then be accessed normally.

Source DV
[cloud-user@ocp-psi-executor qwang]$ oc debug node/gouyang0701-7ppwh-worker-45nvt -t
Starting pod/gouyang0701-7ppwh-worker-45nvt-debug ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.3.67
If you don't see a command prompt, try pressing enter.
sh-4.2# ls -lsh /host/var/hpvolumes/pvc-d0612139-757e-4d09-9c97-66295d6f451f/disk.img
892M -rw-r--r--. 1 root root 10G Jul  2 14:47 /host/var/hpvolumes/pvc-d0612139-757e-4d09-9c97-66295d6f451f/disk.img
sh-4.2# md5sum /host/var/hpvolumes/pvc-d0612139-757e-4d09-9c97-66295d6f451f/disk.img
995bc410a630f08a8b525ab7b685f783  /host/var/hpvolumes/pvc-d0612139-757e-4d09-9c97-66295d6f451f/disk.img

Target DV
[cloud-user@ocp-psi-executor qwang]$ oc debug node/gouyang0701-7ppwh-worker-7dp8c -t
Starting pod/gouyang0701-7ppwh-worker-7dp8c-debug ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.1.69
If you don't see a command prompt, try pressing enter.
sh-4.2# ls -lsh /host/var/hpvolumes/pvc-16dedc5d-7f26-442e-abfa-cc9038cf2fac/disk.img
892M -rw-r--r--. 1 root root 10G Jul  2 14:47 /host/var/hpvolumes/pvc-16dedc5d-7f26-442e-abfa-cc9038cf2fac/disk.img
sh-4.2# md5sum /host/var/hpvolumes/pvc-16dedc5d-7f26-442e-abfa-cc9038cf2fac/disk.img
995bc410a630f08a8b525ab7b685f783  /host/var/hpvolumes/pvc-16dedc5d-7f26-442e-abfa-cc9038cf2fac/disk.img


==============================
I can reproduce the bug during Tier-2 automation ran every time. The content of source image and cloned image are different meanwhile the size is the same. The user/group of source DV is root, while target DV is qemu(107). VM can boot and be accessed normally with the source image, but can't be accessed with the cloned image although the cloned DV showed 100% complete. Here are the details:

[cloud-user@ocp-psi-executor qwang]$ oc get pvc
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS           AGE
cnv-3516-source-dv   Bound    pvc-545d4b33-9897-4f3c-8975-ea63442f7b79   48Gi       RWO            hostpath-provisioner   95s
cnv-3516-target-dv   Bound    pvc-a2e95960-ac9a-4959-bf77-0e2c6e614619   48Gi       RWO            hostpath-provisioner   53s

[cloud-user@ocp-psi-executor qwang]$ oc get dv
NAME                 PHASE       PROGRESS   RESTARTS   AGE
cnv-3516-source-dv   Succeeded   100.0%     0          3m55s
cnv-3516-target-dv   Succeeded   100.0%     0          3m13s

Source DV. The md5sum result was 995bc410a630f08a8b525ab7b685f783 each time.
[cloud-user@ocp-psi-executor qwang]$ oc debug node/gouyang0701-7ppwh-worker-45nvt -t
Starting pod/gouyang0701-7ppwh-worker-45nvt-debug ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.3.67
If you don't see a command prompt, try pressing enter.
sh-4.2# ls -ls /host/var/hpvolumes/pvc-545d4b33-9897-4f3c-8975-ea63442f7b79/disk.img
913032 -rw-r--r--. 1 root root 10737418240 Jul  2 15:37 /host/var/hpvolumes/pvc-545d4b33-9897-4f3c-8975-ea63442f7b79/disk.img
sh-4.2# md5sum /host/var/hpvolumes/pvc-545d4b33-9897-4f3c-8975-ea63442f7b79/disk.img
995bc410a630f08a8b525ab7b685f783  /host/var/hpvolumes/pvc-545d4b33-9897-4f3c-8975-ea63442f7b79/disk.img

Target DV. The md5sum result was fafb16b77c764a8ad0786bf86264741c each time.
[cloud-user@ocp-psi-executor qwang]$ oc debug node/gouyang0701-7ppwh-worker-45nvt -t
Starting pod/gouyang0701-7ppwh-worker-45nvt-debug ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.3.67
If you don't see a command prompt, try pressing enter.
sh-4.2# ls -ls /host/var/hpvolumes/pvc-a2e95960-ac9a-4959-bf77-0e2c6e614619/disk.img
913932 -rw-r--r--. 1 107 107 10737418240 Jul  2 15:39 /host/var/hpvolumes/pvc-a2e95960-ac9a-4959-bf77-0e2c6e614619/disk.img
sh-4.2# md5sum /host/var/hpvolumes/pvc-a2e95960-ac9a-4959-bf77-0e2c6e614619/disk.img
fafb16b77c764a8ad0786bf86264741c  /host/var/hpvolumes/pvc-a2e95960-ac9a-4959-bf77-0e2c6e614619/disk.img

Hope that helps.

Comment 11 sgott 2020-07-06 18:01:52 UTC
As per comment #9, There is a consistent correlation between cases where the md5sums between source and cloned DV don't match and there being problems with the running VM. Thus I believe this is a storage issue (possibly even a dup of 1845901). Re-assigning to Storage team for triage.

Comment 13 Adam Litke 2020-07-07 11:19:00 UTC
Marking as Not a bug because the VM did not have enough memory defined.  The mismatching md5 issue brought up is being addressed in Bug  1845901.


Note You need to log in before you can comment on or make changes to this bug.