Created attachment 1687282 [details] VM manifests Description of problem: VMs created with any provision method get always stuck in Starting state and never become Running. Version-Release number of selected component (if applicable): CNV 2.4 4.5.0-0.nightly-2020-05-04-113741 How reproducible: 100% Steps to Reproduce: 1. create any VM (example VM will do) 2. Start the VM Actual results: VM is in Starting Expected results: VM is in Starting Additional info:
After discussion with Tomas, I'm moving this to CNV Virt. The issue seems similar to BZ1832179, except that the VMIs are in Scheduled PHASE, not Running and I'm hitting this issue without setting runStrategy=Always. $ oc get vm,vmi -A NAMESPACE NAME AGE RUNNING VOLUME default virtualmachine.kubevirt.io/test 3m7s true default virtualmachine.kubevirt.io/vm-example 130m true test-vfwta virtualmachine.kubevirt.io/vm-list-actions-test-vfwta 120m true NAMESPACE NAME AGE PHASE IP NODENAME default virtualmachineinstance.kubevirt.io/test 3m5s Scheduled 10.128.2.106 rhrazdil24-bs7br-worker-hl2cm default virtualmachineinstance.kubevirt.io/vm-example 107s Scheduled 10.129.2.72 rhrazdil24-bs7br-worker-dljvg test-vfwta virtualmachineinstance.kubevirt.io/vm-list-actions-test-vfwta 107s Scheduled 10.129.2.71 rhrazdil24-bs7br-worker-dljvg Indeed, the VM doesn't seem to be running: virtctl vnc vm-example Can't access VMI vm-example: Unable to connect to VirtualMachineInstance because phase is Scheduled instead of Running Status in UI is sometimes Starting, sometimes Stopping, based on the virt-launcher status - it's sometimes in Error state and recreated, which is the cause of the VM status changes.
Created attachment 1687354 [details] vmi
This looks like your VMs are really stuck in scheduling. Please provide must-gather output in order to help us diagnose the root cause.
Running must-gather failed, further inspection revealed that two master nodes are in NotReady state NAME STATUS ROLES AGE VERSION rhrazdil24-bs7br-master-0 NotReady master 31h v1.18.0-rc.1 rhrazdil24-bs7br-master-1 Ready master 31h v1.18.0-rc.1 rhrazdil24-bs7br-master-2 NotReady master 31h v1.18.0-rc.1 rhrazdil24-bs7br-worker-dljvg Ready worker 31h v1.18.0-rc.1 rhrazdil24-bs7br-worker-hl2cm Ready worker 31h v1.18.0-rc.1 rhrazdil24-bs7br-worker-xvsd9 Ready worker 31h v1.18.0-rc.1 Moreover, several kubevirt resources were crashlooping: kube-apiserver, virt-handler, nmstate-handler Apparently, the cluster is not in healthy state, which may have been caused by PSI networking issue. Apologies for the false positive, I got confused with BZ1832179 for the similarity in behaviour. Closing as notabug.
Re-opening. This BZ is reproducible. Radim re-deployed the cluster and ran into the same issue. virt-handler will crash after a VMI is deployed with this traceback: {"component":"virt-handler","level":"info","msg":"Processing event default/vm-example","pos":"vm.go:860","timestamp":"2020-05-12T15:39:41.279009Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Scheduled\n","name":"vm-example","namespace":"default","pos":"vm.go:862","timestamp":"2020-05-12T15:39:41.279337Z ","uid":"15d575dd-2e81-49ec-ba43-95baae6609cf"} {"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:869","timestamp":"2020-05-12T15:39:41.279395Z"} {"component":"virt-handler","level":"error","msg":"failed to create nat rules for vm error: failed to load nftable ipv4-nat error ","pos":"podinterface.go:582","timestamp":"2020-05-12T15:39: 41.510071Z"} {"component":"virt-handler","level":"fatal","msg":"failed to prepare pod networking","pos":"podinterface.go:109","reason":"failed to load nftable ipv4-nat error ","timestamp":"2020-05-12T15: 39:41.510140Z"} panic: failed to load nftable ipv4-nat error goroutine 668 [running, locked to thread]: kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network.(*PodInterface).PlugPhase1(0x2c3e9b8, 0xc000ae2400, 0xc000d16990, 0xc0011c8240, 0x19b97a3, 0x4, 0x3b196d, 0xc000c9af78, 0xc000d16990) /go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/podinterface.go:110 +0x903 kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network.SetupNetworkInterfacesPhase1(0xc000ae2400, 0x3b196d, 0xc001333ce0, 0x21) /go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/network.go:111 +0x267 kubevirt.io/kubevirt/pkg/virt-handler.(*VirtualMachineController).setPodNetworkPhase1.func1(0x0, 0x0) /go/src/kubevirt.io/kubevirt/pkg/virt-handler/vm.go:277 +0x3a kubevirt.io/kubevirt/pkg/virt-handler/isolation.(*realIsolationResult).DoNetNS.func1(0x1c82320, 0xc000ccc1c0, 0x0, 0x0) /go/src/kubevirt.io/kubevirt/pkg/virt-handler/isolation/isolation.go:254 +0x26 github.com/containernetworking/plugins/pkg/ns.(*netNS).Do.func1(0x1c82320, 0xc000ccc1c0, 0x0, 0x0) /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:183 +0x2c1 github.com/containernetworking/plugins/pkg/ns.(*netNS).Do.func2(0xc0005a0b70, 0xc000e15f60, 0x1c82320, 0xc000ccc1c0, 0xc000ccc1d0) /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:200 +0x76 created by github.com/containernetworking/plugins/pkg/ns.(*netNS).Do /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:197 +0x224
I see the same issue on 2.4, the logs are a bit different from above but very similar with bz 1743250: $ oc logs virt-launcher-vm-cirros-hdg5h -c compute {"component":"virt-launcher","level":"info","msg":"Collected all requested hook sidecar sockets","pos":"manager.go:68","timestamp":"2020-05-13T05:16:30.676549Z"} {"component":"virt-launcher","level":"info","msg":"Sorted all collected sidecar sockets per hook point based on their priority and name: map[]","pos":"manager.go:71","timestamp":"2020-05-13T05:16:30.676865Z"} {"component":"virt-launcher","level":"info","msg":"Watchdog file created at /var/run/kubevirt/watchdog-files/default_vm-cirros","pos":"virt-launcher.go:155","timestamp":"2020-05-13T05:16:30.680029Z"} {"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon: qemu:///system","pos":"libvirt.go:355","timestamp":"2020-05-13T05:16:30.682169Z"} {"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon failed: virError(Code=38, Domain=7, Message='Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory')","pos":"libvirt.go:363","timestamp":"2020-05-13T05:16:30.683294Z"} {"component":"virt-launcher","level":"info","msg":"libvirt version: 6.0.0, package: 19.module+el8.2.1+6538+c148631f (Red Hat, Inc. \u003chttp://bugzilla.redhat.com/bugzilla\u003e, 2020-05-07-17:48:29, )","subcomponent":"libvirt","thread":"44","timestamp":"2020-05-13T05:16:30.734000Z"} {"component":"virt-launcher","level":"info","msg":"hostname: vm-cirros","subcomponent":"libvirt","thread":"44","timestamp":"2020-05-13T05:16:30.734000Z"} {"component":"virt-launcher","level":"error","msg":"internal error: Child process (/usr/sbin/dmidecode -q -t 0,1,2,3,4,17) unexpected exit status 1: /dev/mem: No such file or directory","pos":"virCommandWait:2713","subcomponent":"libvirt","thread":"44","timestamp":"2020-05-13T05:16:30.734000Z"} {"component":"virt-launcher","level":"info","msg":"Connected to libvirt daemon","pos":"libvirt.go:371","timestamp":"2020-05-13T05:16:31.194884Z"} {"component":"virt-launcher","level":"info","msg":"Registered libvirt event notify callback","pos":"client.go:315","timestamp":"2020-05-13T05:16:31.230093Z"} {"component":"virt-launcher","level":"info","msg":"Marked as ready","pos":"virt-launcher.go:73","timestamp":"2020-05-13T05:16:31.230580Z"} panic: timed out waiting for domain to be defined goroutine 1 [running]: main.waitForDomainUUID(0x45d964b800, 0xc0000f2fc0, 0xc001a7a000, 0x19ca940, 0xc0000c4370, 0xc0000f2fc0) /go/src/kubevirt.io/kubevirt/cmd/virt-launcher/virt-launcher.go:238 +0x3de main.main() /go/src/kubevirt.io/kubevirt/cmd/virt-launcher/virt-launcher.go:449 +0x115f {"component":"virt-launcher","level":"error","msg":"dirty virt-launcher shutdown","pos":"virt-launcher.go:531","reason":"exit status 2","timestamp":"2020-05-13T05:21:31.250249Z"}
The traceback in Comment #6 is an after-effect of the traceback in Comment #5. Because virt-handler panics and CrashLoops, it never sends the domain to virt-launcher, which results in the observed timeout. It is worth noting that a second panic can occur in virt-handler, but it is likely masking the first one. No matter what happens with a VMI, virt-handler should not panic. ------ {"component":"virt-handler","level":"info","msg":"Starting device plugin controller","pos":"device_controller.go:92","timestamp":"2020-05-12T15:31:05.598088Z"} {"component":"virt-handler","level":"info","msg":"kvm device plugin started","pos":"generic_device.go:157","timestamp":"2020-05-12T15:31:05.609547Z"} {"component":"virt-handler","level":"info","msg":"vhost-net device plugin started","pos":"generic_device.go:157","timestamp":"2020-05-12T15:31:05.610529Z"} {"component":"virt-handler","level":"info","msg":"tun device plugin started","pos":"generic_device.go:157","timestamp":"2020-05-12T15:31:05.611529Z"} {"component":"virt-handler","level":"info","msg":"Processing event default/vm-example","pos":"vm.go:860","timestamp":"2020-05-12T15:31:05.698813Z"} {"component":"virt-handler","kind":"VirtualMachineInstance","level":"info","msg":"VMI is in phase: Scheduled\n","name":"vm-example","namespace":"default","pos":"vm.go:862","timestamp":"2020- 05-12T15:31:05.698987Z","uid":"46331abb-248b-4567-8b05-bfb3304350f8"} {"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:869","timestamp":"2020-05-12T15:31:05.699085Z"} {"component":"virt-handler","level":"info","msg":"Updating cluster config to resource version '40015'","pos":"config-map.go:443","timestamp":"2020-05-12T15:31:05.748644Z"} {"component":"virt-handler","level":"error","msg":"failed to create an interface: k6t-eth0-nic","pos":"podinterface.go:558","reason":"file exists","timestamp":"2020-05-12T15:31:05.765450Z"} {"component":"virt-handler","level":"fatal","msg":"failed to prepare pod networking","pos":"podinterface.go:109","reason":"file exists","timestamp":"2020-05-12T15:31:05.765517Z"} panic: file exists goroutine 611 [running, locked to thread]: kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network.(*PodInterface).PlugPhase1(0x2c3e9b8, 0xc00244a000, 0xc00221d200, 0xc0027cc180, 0x19b97a3, 0x4, 0x39b930, 0xc00240ef78, 0xc00221d200) /go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/podinterface.go:110 +0x903 kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network.SetupNetworkInterfacesPhase1(0xc00244a000, 0x39b930, 0xc00258d380, 0x21) /go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/network.go:111 +0x267 kubevirt.io/kubevirt/pkg/virt-handler.(*VirtualMachineController).setPodNetworkPhase1.func1(0x0, 0x0) /go/src/kubevirt.io/kubevirt/pkg/virt-handler/vm.go:277 +0x3a kubevirt.io/kubevirt/pkg/virt-handler/isolation.(*realIsolationResult).DoNetNS.func1(0x1c82320, 0xc001a6fa10, 0x0, 0x0) /go/src/kubevirt.io/kubevirt/pkg/virt-handler/isolation/isolation.go:254 +0x26 github.com/containernetworking/plugins/pkg/ns.(*netNS).Do.func1(0x1c82320, 0xc001a6fa10, 0x0, 0x0) /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:183 +0x2c1 github.com/containernetworking/plugins/pkg/ns.(*netNS).Do.func2(0xc0010172b0, 0xc001a729c0, 0x1c82320, 0xc001a6fa10, 0xc001a6fa20) /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:200 +0x76 created by github.com/containernetworking/plugins/pkg/ns.(*netNS).Do /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:197 +0x224
Thanks Stu for the close analysis, I think that the comment #5 takes us really close to the source of the issue. Appreciated!
(In reply to sgott from comment #5) > Re-opening. This BZ is reproducible. Radim re-deployed the cluster and ran > into the same issue. > > virt-handler will crash after a VMI is deployed with this traceback: > > {"component":"virt-handler","level":"info","msg":"Processing event > default/vm-example","pos":"vm.go:860","timestamp":"2020-05-12T15:39:41. > 279009Z"} > {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: > Scheduled\n","name":"vm-example","namespace":"default","pos":"vm.go:862", > "timestamp":"2020-05-12T15:39:41.279337Z > ","uid":"15d575dd-2e81-49ec-ba43-95baae6609cf"} > {"component":"virt-handler","level":"info","msg":"Domain does not > exist","pos":"vm.go:869","timestamp":"2020-05-12T15:39:41.279395Z"} > {"component":"virt-handler","level":"error","msg":"failed to create nat > rules for vm error: failed to load nftable ipv4-nat error There is something strange here. When introducing ipv6 this error message was changed to - "failed to create ipv4 nat rules for vm error" instead of "failed to create nat rules for vm error". Are you sure that's a knightly build of 2.4 cluster? > ","pos":"podinterface.go:582","timestamp":"2020-05-12T15:39: > 41.510071Z"} > {"component":"virt-handler","level":"fatal","msg":"failed to prepare pod > networking","pos":"podinterface.go:109","reason":"failed to load nftable > ipv4-nat error ","timestamp":"2020-05-12T15: > 39:41.510140Z"} > panic: failed to load nftable ipv4-nat error > > goroutine 668 [running, locked to thread]: > kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network.(*PodInterface). > PlugPhase1(0x2c3e9b8, 0xc000ae2400, 0xc000d16990, 0xc0011c8240, 0x19b97a3, > 0x4, 0x3b196d, 0xc000c9af78, 0xc000d16990) > > /go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/podinterface. > go:110 +0x903 > kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network. > SetupNetworkInterfacesPhase1(0xc000ae2400, 0x3b196d, 0xc001333ce0, 0x21) > > /go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/network.go: > 111 +0x267 > kubevirt.io/kubevirt/pkg/virt-handler.(*VirtualMachineController). > setPodNetworkPhase1.func1(0x0, 0x0) > /go/src/kubevirt.io/kubevirt/pkg/virt-handler/vm.go:277 +0x3a > kubevirt.io/kubevirt/pkg/virt-handler/isolation.(*realIsolationResult). > DoNetNS.func1(0x1c82320, 0xc000ccc1c0, 0x0, 0x0) > > /go/src/kubevirt.io/kubevirt/pkg/virt-handler/isolation/isolation.go:254 > +0x26 > github.com/containernetworking/plugins/pkg/ns.(*netNS).Do.func1(0x1c82320, > 0xc000ccc1c0, 0x0, 0x0) > > /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/ > pkg/ns/ns_linux.go:183 +0x2c1 > github.com/containernetworking/plugins/pkg/ns.(*netNS).Do.func2(0xc0005a0b70, > 0xc000e15f60, 0x1c82320, 0xc000ccc1c0, 0xc000ccc1d0) > > /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/ > pkg/ns/ns_linux.go:200 +0x76 > created by github.com/containernetworking/plugins/pkg/ns.(*netNS).Do > > /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/ > pkg/ns/ns_linux.go:197 +0x224
I collected virt-handler logs from a recent run. It seems to panic for a different reason now, still networking though. It seems like an issue in idempotency of our code. I don't see how it could be related to the previous nftables issue. Maybe we failed due to nftables issue, that restarted virt-handler and on the second run we got into this issue? {"component":"virt-handler","level":"info","msg":"Starting device plugin controller","pos":"device_controller.go:92","timestamp":"2020-05-16T01:51:03.470937Z"} {"component":"virt-handler","level":"info","msg":"vhost-net device plugin started","pos":"generic_device.go:157","timestamp":"2020-05-16T01:51:03.487833Z"} {"component":"virt-handler","level":"info","msg":"kvm device plugin started","pos":"generic_device.go:157","timestamp":"2020-05-16T01:51:03.488828Z"} {"component":"virt-handler","level":"info","msg":"tun device plugin started","pos":"generic_device.go:157","timestamp":"2020-05-16T01:51:03.489232Z"} {"component":"virt-handler","level":"info","msg":"Processing event kubevirt-test-alternative/testvmijfjpzl6hknw9wssq8r799ht7rsrjf4pgzsshrck2tdmhgkrv","pos":"vm.go:860","timestamp":"2020-05-16T01:51:03.572725Z"} {"component":"virt-handler","kind":"VirtualMachineInstance","level":"info","msg":"VMI is in phase: Scheduled\n","name":"testvmijfjpzl6hknw9wssq8r799ht7rsrjf4pgzsshrck2tdmhgkrv","namespace":"kubevirt-test-alternative","pos":"vm.go:862","timestamp":"2020-05-16T01:51:03.572857Z","uid":"9a277791-de2b-4fd1-afaa-e24f03ee8e4f"} {"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:869","timestamp":"2020-05-16T01:51:03.572958Z"} {"component":"virt-handler","level":"info","msg":"Processing event kubevirt-test-default/testvmi6zqfspm9c7h7c4f8579b8222ghrb4fslzk57kr8w5glqc77w","pos":"vm.go:860","timestamp":"2020-05-16T01:51:03.572602Z"} {"component":"virt-handler","kind":"VirtualMachineInstance","level":"info","msg":"VMI is in phase: Scheduled\n","name":"testvmi6zqfspm9c7h7c4f8579b8222ghrb4fslzk57kr8w5glqc77w","namespace":"kubevirt-test-default","pos":"vm.go:862","timestamp":"2020-05-16T01:51:03.573823Z","uid":"6a565390-12ce-44b3-b669-3471d3bed4f5"} {"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:869","timestamp":"2020-05-16T01:51:03.573870Z"} {"component":"virt-handler","level":"error","msg":"failed to create an interface: k6t-eth0-nic","pos":"podinterface.go:558","reason":"file exists","timestamp":"2020-05-16T01:51:03.629576Z"} {"component":"virt-handler","level":"fatal","msg":"failed to prepare pod networking","pos":"podinterface.go:109","reason":"file exists","timestamp":"2020-05-16T01:51:03.630434Z"} panic: file exists goroutine 659 [running, locked to thread]: kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network.(*PodInterface).PlugPhase1(0x2c3e9b8, 0xc0015fc000, 0xc000ac9d40, 0xc001c3e9f0, 0x19b97a3, 0x4, 0xe5d3f, 0xc001ae2348, 0xc000ac9d40) /go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/podinterface.go:110 +0x903 kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network.SetupNetworkInterfacesPhase1(0xc0015fc000, 0xe5d3f, 0xc000a51760, 0x1f) /go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/network.go:111 +0x267 kubevirt.io/kubevirt/pkg/virt-handler.(*VirtualMachineController).setPodNetworkPhase1.func1(0x0, 0x0) /go/src/kubevirt.io/kubevirt/pkg/virt-handler/vm.go:277 +0x3a kubevirt.io/kubevirt/pkg/virt-handler/isolation.(*realIsolationResult).DoNetNS.func1(0x1c82320, 0xc0013f36a0, 0x0, 0x0) /go/src/kubevirt.io/kubevirt/pkg/virt-handler/isolation/isolation.go:254 +0x26 github.com/containernetworking/plugins/pkg/ns.(*netNS).Do.func1(0x1c82320, 0xc0013f36a0, 0x0, 0x0) /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:183 +0x2c1 github.com/containernetworking/plugins/pkg/ns.(*netNS).Do.func2(0xc00112a380, 0xc000c57880, 0x1c82320, 0xc0013f36a0, 0xc0013f36b0) /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:200 +0x76 created by github.com/containernetworking/plugins/pkg/ns.(*netNS).Do /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:197 +0x224
Petr, the log you attached seems to be the log from the second run. It fails since "k6t-eth0-nic" already exists (was created on the first run). Anyway, PR - https://github.com/kubevirt/kubevirt/pull/3128 removes the 'panic' from those two failures (the nftables and the already existing nic). Looking at the log from the first run (attached below). The nfttable unsuccessful load caused the panic. But the virt-handler wasn't 'healthy' even before this error. So since the panic is already gone from the code, I think we should put our efforts to understand why we have those "context deadline exceeded" errors. The nftable is not the issue here. BTW, Radim, do you have the same failures when running a vm with bridge binding (there we don't use iptables/nftables)? {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"prometheus.go:469","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:24.568976Z"} {"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: /var/run/kubevirt/sockets/3895ca29-d8a3-48c8-8c2b-51bc008eedf0_sock","pos":"client.go:135","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:24. 824147Z"} {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"cache.go:131","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:24.824497Z"} {"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: /var/run/kubevirt/sockets/49c851a2-d732-4560-ad4d-1041da663b32_sock","pos":"client.go:135","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:26. 825351Z"} {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"cache.go:131","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:26.825498Z"} {"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: /var/run/kubevirt/sockets/590e13e6-1f35-4517-8066-6b67d78e5493_sock","pos":"client.go:135","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:28. 825771Z"} {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"cache.go:131","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:28.826051Z"} {"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: /var/run/kubevirt/sockets/684cac8a-af3f-4777-beb5-bb9c28914abf_sock","pos":"client.go:135","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:30. 826483Z"} {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"cache.go:131","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:30.826685Z"} {"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: /var/run/kubevirt/sockets/86145cd0-b69c-4923-ab5e-355ca3d9a599_sock","pos":"client.go:135","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:32. 828264Z"} {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"cache.go:131","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:32.829149Z"} {"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: /var/run/kubevirt/sockets/921d19ff-e436-40b2-bc46-f9e2cf6e1099_sock","pos":"client.go:135","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:34. 829939Z"} {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"cache.go:131","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:34.830359Z"} {"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for domain","name":"vm-example","namespace":"default","pos":"cache.go:114","timestamp":"2020-05-14T20:20:35.823281Z","uid":""} {"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for domain","name":"vmi-masquerade","namespace":"default","pos":"cache.go:114","timestamp":"2020-05-14T20:20:35.823524Z","uid":""} {"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: /var/run/kubevirt/sockets/9e6ff55c-81fc-4284-a015-6ecac35ade0a_sock","pos":"client.go:135","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:36. 831055Z"} {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"cache.go:131","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:36.831300Z"} {"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: /var/run/kubevirt/sockets/9fc7297f-00c7-4ee5-bf1a-7fe5d71649d3_sock","pos":"client.go:135","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:38. 832277Z"} {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"cache.go:131","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:38.832391Z"} {"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: /var/run/kubevirt/sockets/daaf555e-b65c-49ce-9dd3-65fdb2052934_sock","pos":"client.go:135","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:40. 832680Z"} {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"cache.go:131","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:40.832844Z"} {"component":"virt-handler","level":"info","msg":"Starting device plugin controller","pos":"device_controller.go:92","timestamp":"2020-05-14T20:20:40.921636Z"} {"component":"virt-handler","level":"info","msg":"kvm device plugin started","pos":"generic_device.go:157","timestamp":"2020-05-14T20:20:40.933315Z"} {"component":"virt-handler","level":"info","msg":"tun device plugin started","pos":"generic_device.go:157","timestamp":"2020-05-14T20:20:40.934935Z"} {"component":"virt-handler","level":"info","msg":"vhost-net device plugin started","pos":"generic_device.go:157","timestamp":"2020-05-14T20:20:40.938356Z"} {"component":"virt-handler","level":"info","msg":"Processing event default/vmi-masquerade","pos":"vm.go:860","timestamp":"2020-05-14T20:20:41.022274Z"} {"component":"virt-handler","kind":"VirtualMachineInstance","level":"info","msg":"VMI is in phase: Failed\n","name":"vmi-masquerade","namespace":"default","pos":"vm.go:862","timestamp":"2020-05-14T20:20:41.022463Z","uid":"9e6ff55c-81fc-42 84-a015-6ecac35ade0a"} {"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:869","timestamp":"2020-05-14T20:20:41.022636Z"} {"component":"virt-handler","level":"info","msg":"Updating cluster config to resource version '353569'","pos":"config-map.go:443","timestamp":"2020-05-14T20:20:41.145806Z"} {"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: /var/run/kubevirt/sockets/9e6ff55c-81fc-4284-a015-6ecac35ade0a_sock","pos":"client.go:135","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:42. 923121Z"} {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"prometheus.go:469","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:42.923216Z"} {"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for domain","name":"vm-example","namespace":"default","pos":"cache.go:114","timestamp":"2020-05-14T20:20:50.823080Z","uid":""} {"component":"virt-handler","kind":"Domain","level":"warning","msg":"detected expired watchdog for domain","name":"vmi-masquerade","namespace":"default","pos":"cache.go:114","timestamp":"2020-05-14T20:20:50.823253Z","uid":""} {"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: /var/run/kubevirt/sockets/9e6ff55c-81fc-4284-a015-6ecac35ade0a_sock","pos":"client.go:135","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:54.508496Z"} {"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"prometheus.go:469","reason":"context deadline exceeded","timestamp":"2020-05-14T20:20:54.509665Z"} {"component":"virt-handler","level":"info","msg":"Processing event default/vm-example","pos":"vm.go:860","timestamp":"2020-05-14T20:20:58.961270Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Scheduled\n","name":"vm-example","namespace":"default","pos":"vm.go:862","timestamp":"2020-05-14T20:20:58.961412Z","uid":"92478755-d503-4cd9-aa8f-42b2e99b7251"} {"component":"virt-handler","level":"info","msg":"Domain does not exist","pos":"vm.go:869","timestamp":"2020-05-14T20:20:58.961480Z"} {"component":"virt-handler","level":"error","msg":"failed to create nat rules for vm error: failed to load nftable ipv4-nat error ","pos":"podinterface.go:582","timestamp":"2020-05-14T20:20:59.075115Z"} {"component":"virt-handler","level":"fatal","msg":"failed to prepare pod networking","pos":"podinterface.go:109","reason":"failed to load nftable ipv4-nat error ","timestamp":"2020-05-14T20:20:59.075215Z"} panic: failed to load nftable ipv4-nat error goroutine 544 [running, locked to thread]: kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network.(*PodInterface).PlugPhase1(0x2c3e9b8, 0xc001d28c00, 0xc00155c5a0, 0xc001bd1aa0, 0x19b97a3, 0x4, 0x30d0fe, 0xc001614278, 0xc00155c5a0) /go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/podinterface.go:110 +0x903 kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network.SetupNetworkInterfacesPhase1(0xc001d28c00, 0x30d0fe, 0xc001ae57a0, 0x21) /go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/network/network.go:111 +0x267 kubevirt.io/kubevirt/pkg/virt-handler.(*VirtualMachineController).setPodNetworkPhase1.func1(0x0, 0x0) /go/src/kubevirt.io/kubevirt/pkg/virt-handler/vm.go:277 +0x3a kubevirt.io/kubevirt/pkg/virt-handler/isolation.(*realIsolationResult).DoNetNS.func1(0x1c82320, 0xc0016121e0, 0x0, 0x0) /go/src/kubevirt.io/kubevirt/pkg/virt-handler/isolation/isolation.go:254 +0x26 github.com/containernetworking/plugins/pkg/ns.(*netNS).Do.func1(0x1c82320, 0xc0016121e0, 0x0, 0x0) /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:183 +0x2c1 github.com/containernetworking/plugins/pkg/ns.(*netNS).Do.func2(0xc000fa78b0, 0xc001bdf840, 0x1c82320, 0xc0016121e0, 0xc0016121f0) /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:200 +0x76 created by github.com/containernetworking/plugins/pkg/ns.(*netNS).Do /go/src/kubevirt.io/kubevirt/vendor/github.com/containernetworking/plugins/pkg/ns/ns_linux.go:197 +0x224
With the latest build, we see: {"component":"virt-launcher","level":"info","msg":"Collected all requested hook sidecar sockets","pos":"manager.go:68","timestamp":"2020-05-16T01:53:29.347653Z"} {"component":"virt-launcher","level":"info","msg":"Sorted all collected sidecar sockets per hook point based on their priority and name: map[]","pos":"manager.go:71","timestamp":"2020-05-16T01:53:29.347987Z"} {"component":"virt-launcher","level":"info","msg":"Watchdog file created at /var/run/kubevirt/watchdog-files/kubevirt-test-default_testvmi7w54k772bfhd5wmp4xh2vxpg25bx8jsdw9r9jqct88x4bmlq","pos":"virt-launcher.go:155","timestamp":"2020-05-16T01:53:29.350645Z"} {"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon: qemu:///system","pos":"libvirt.go:355","timestamp":"2020-05-16T01:53:29.368118Z"} {"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon failed: virError(Code=38, Domain=7, Message='Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory')","pos":"libvirt.go:363","timestamp":"2020-05-16T01:53:29.373286Z"} {"component":"virt-launcher","level":"info","msg":"libvirt version: 6.0.0, package: 17.module+el8.2.0+6257+0d066c28 (Red Hat, Inc. \u003chttp://bugzilla.redhat.com/bugzilla\u003e, 2020-04-08-16:41:56, )","subcomponent":"libvirt","thread":"43","timestamp":"2020-05-16T01:53:29.484000Z"} {"component":"virt-launcher","level":"info","msg":"hostname: testvmi7w54k772bfhd5wmp4xh2vxpg25bx8jsdw9r9jqct88x4bmlq","subcomponent":"libvirt","thread":"43","timestamp":"2020-05-16T01:53:29.484000Z"} {"component":"virt-launcher","level":"error","msg":"internal error: Child process (/usr/sbin/dmidecode -q -t 0,1,2,3,4,17) unexpected exit status 1: /dev/mem: No such file or directory","pos":"virCommandWait:2713","subcomponent":"libvirt","thread":"43","timestamp":"2020-05-16T01:53:29.484000Z"} {"component":"virt-launcher","level":"info","msg":"Connected to libvirt daemon","pos":"libvirt.go:371","timestamp":"2020-05-16T01:53:29.885439Z"} {"component":"virt-launcher","level":"info","msg":"Registered libvirt event notify callback","pos":"client.go:315","timestamp":"2020-05-16T01:53:29.976887Z"} {"component":"virt-launcher","level":"info","msg":"Marked as ready","pos":"virt-launcher.go:73","timestamp":"2020-05-16T01:53:29.977635Z"} panic: timed out waiting for domain to be defined goroutine 1 [running]: main.waitForDomainUUID(0x45d964b800, 0xc000254c60, 0xc0000aa480, 0x19ca940, 0xc0000e4870, 0xc000254c60) /go/src/kubevirt.io/kubevirt/cmd/virt-launcher/virt-launcher.go:238 +0x3de main.main() /go/src/kubevirt.io/kubevirt/cmd/virt-launcher/virt-launcher.go:449 +0x115f {"component":"virt-launcher","level":"error","msg":"dirty virt-launcher shutdown","pos":"virt-launcher.go:531","reason":"exit status 2","timestamp":"2020-05-16T01:58:30.015708Z"}
Nelly, can you please attach the virt-handler log?
Created attachment 1689452 [details] virt-handler-qpb9t.log
Created attachment 1689453 [details] virt-handler-n6k5j.log
Created attachment 1689454 [details] virt-handler-h4knk.log
I uploaded logs from all virt-handlers. FYI: All are plain text, except virt-handler-n6k5j.log, it was too large I had to compress it with tar.gz.
Hello Alona, I confirm that a VM with bridge binding NIC starts without issue.
I deployed 2.4 via IPI couple of minutes ago and my VM started successfully. I will let to to QE to double-check, but it seems that the issue may be gone.
It started successfully and couple of minutes later it completely crashed. Attaching logs. panic: timed out waiting for domain to be defined goroutine 1 [running]: main.waitForDomainUUID(0x45d964b800, 0xc0003fb5c0, 0xc00197f380, 0x19ca940, 0xc00056c140, 0xc0003fb5c0) /go/src/kubevirt.io/kubevirt/cmd/virt-launcher/virt-launcher.go:238 +0x3de main.main() /go/src/kubevirt.io/kubevirt/cmd/virt-launcher/virt-launcher.go:449 +0x115f {"component":"virt-launcher","level":"error","msg":"dirty virt-launcher shutdown","pos":"virt-launcher.go:531","reason":"exit status 2","timestamp":"2020-05-21T16:58:22.218450Z"}
Created attachment 1690725 [details] virt-handler.log
Created attachment 1690726 [details] virt-launcher.log
Created attachment 1690727 [details] virt-handler.prev.log It seems to be the same issue as previously. Maybe we don't have the latest U/S version yet.
Root cause has been identified. Packages were missing in the virt-handler container as part of shifting iptables/sysctl responsibility to virt-handler. The packages were: iptables procps-ng
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3194