Bug 1852786
| Summary: | VM fails to start | launcher-sock: connect: connection refused | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Yossi Segev <ysegev> | ||||||||||
| Component: | Networking | Assignee: | Petr Horáček <phoracek> | ||||||||||
| Status: | CLOSED DEFERRED | QA Contact: | Meni Yakove <myakove> | ||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 2.4.0 | CC: | cnv-qe-bugs, ellorent, myakove, phoracek | ||||||||||
| Target Milestone: | --- | ||||||||||||
| Target Release: | 2.5.0 | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2020-07-05 14:11:28 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Yossi Segev
2020-07-01 10:18:04 UTC
Created attachment 1699465 [details]
bridge-nncp.yaml
Created attachment 1699466 [details]
bridge-nad.yaml
Created attachment 1699467 [details]
dhco-server-vm.yaml
Got some logs
oc logs --tail=1000 -n yoss-ns -l kubevirt.io=virt-launcher -c compute
{"component":"virt-launcher","level":"info","msg":"Found nameservers in /etc/resolv.conf: \ufffd\u001e\u0000\n","pos":"converter.go:1564","timestamp":"2020-07-01T11:19:21.072418Z"}
{"component":"virt-launcher","level":"info","msg":"Found search domains in /etc/resolv.conf: yoss-ns.svc.cluster.local svc.cluster.local cluster.local openstacklocal","pos":"converter.go:1565","timestamp":"2020-07-01T11:19:21.072457Z"}
{"component":"virt-launcher","level":"info","msg":"Starting SingleClientDHCPServer","pos":"dhcp.go:64","timestamp":"2020-07-01T11:19:21.072563Z"}
{"component":"virt-launcher","level":"info","msg":"/var/run/kubevirt/container-disks/disk_0.img backing file system does not support direct I/O","pos":"converter.go:168","timestamp":"2020-07-01T11:19:21.086636Z"}
{"component":"virt-launcher","level":"info","msg":"Driver cache mode for /var/run/kubevirt-ephemeral-disks/disk-data/containerdisk/disk.qcow2 set to none","pos":"converter.go:187","timestamp":"2020-07-01T11:19:21.086740Z"}
{"component":"virt-launcher","level":"info","msg":"Driver cache mode for /var/run/kubevirt-ephemeral-disks/cloud-init-data/yoss-ns/dhcp-server-vm/noCloud.iso set to none","pos":"converter.go:187","timestamp":"2020-07-01T11:19:21.086774Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain defined.","name":"dhcp-server-vm","namespace":"yoss-ns","pos":"manager.go:1204","timestamp":"2020-07-01T11:19:21.440929Z","uid":"ea6c40dd-6a12-4693-af95-0e97669de101"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event 0 with reason 0 received","pos":"client.go:337","timestamp":"2020-07-01T11:19:21.441187Z"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x142a95c]
Stack trace
oroutine 54 [running]:
kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap.(*LibvirtDomainManager).buildDevicesMetadata(0xc0019bc070, 0xc001ae0000, 0x19ee4a0, 0xc001b76218, 0xc0000d2eb0, 0xc001b7a490, 0xc001b7a494, 0xc000339310, 0x1541140)
/go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/manager.go:1615 +0x27c
kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap.(*LibvirtDomainManager).generateCloudInitISO(0xc0019bc070, 0xc001ae0000, 0xc001af0bf0, 0x0, 0x0)
/go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/manager.go:888 +0x173
kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap.(*LibvirtDomainManager).SyncVMI(0xc0019bc070, 0xc001ae0000, 0xc0019a6400, 0xc000010030, 0x0, 0x0, 0x0)
/go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/manager.go:1230 +0x1116
kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/cmd-server.(*Launcher).SyncVirtualMachine(0xc0019480a0, 0x19c2940, 0xc000286270, 0xc000392520, 0xc0019480a0, 0xc000286270, 0xc00197db30)
/go/src/kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/cmd-server/server.go:160 +0x9b
kubevirt.io/kubevirt/pkg/handler-launcher-com/cmd/v1._Cmd_SyncVirtualMachine_Handler(0x16c13a0, 0xc0019480a0, 0x19c2940, 0xc000286270, 0xc0001007e0, 0x0, 0x19c2940, 0xc000286270, 0xc000162000, 0xea6)
/go/src/kubevirt.io/kubevirt/pkg/handler-launcher-com/cmd/v1/cmd.pb.go:556 +0x21a
kubevirt.io/kubevirt/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc0002df200, 0x19ddbe0, 0xc00012d200, 0xc001adc000, 0xc0002bc4e0, 0x27978a0, 0x0, 0x0, 0x0)
/go/src/kubevirt.io/kubevirt/vendor/google.golang.org/grpc/server.go:1024 +0x4f4
kubevirt.io/kubevirt/vendor/google.golang.org/grpc.(*Server).handleStream(0xc0002df200, 0x19ddbe0, 0xc00012d200, 0xc001adc000, 0x0)
/go/src/kubevirt.io/kubevirt/vendor/google.golang.org/grpc/server.go:1313 +0xd97
kubevirt.io/kubevirt/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc00192cc30, 0xc0002df200, 0x19ddbe0, 0xc00012d200, 0xc001adc000)
/go/src/kubevirt.io/kubevirt/vendor/google.golang.org/grpc/server.go:722 +0xbb
created by kubevirt.io/kubevirt/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
/go/src/kubevirt.io/kubevirt/vendor/google.golang.org/grpc/server.go:720 +0xa1
{"component":"virt-launcher","level":"error","msg":"dirty virt-launcher shutdown","pos":"virt-launcher.go:510","reason":"exit status 2","timestamp":"2020-07-01T11:19:21.450017Z"}
I also tested a similar scenario, only this time I configured the bridge over one of the node's physical interfaces, rather than over VLAN interface.
This time the VMI started successfully, with the secondary NIC seen when entering the VM console.
[fedora@multus-vm ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc fq_codel state UP group default qlen 1000
link/ether 02:7f:3d:00:01:d4 brd ff:ff:ff:ff:ff:ff
altname enp1s0
inet 10.0.2.2/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0
valid_lft 86312782sec preferred_lft 86312782sec
inet6 fe80::7f:3dff:fe00:1d4/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 02:7f:3d:00:01:d5 brd ff:ff:ff:ff:ff:ff
altname enp2s0
inet 10.200.1.1/24 brd 10.200.1.255 scope global noprefixroute eth1
valid_lft forever preferred_lft forever
inet6 fe80::7f:3dff:fe00:1d5/64 scope link
valid_lft forever preferred_lft forever
Quique found what needed to be changed in order for that to work properly: The network name of the secondary interface in the VM spec could not be consumed. The original network name was "eno4.1001-br". When we changed that to "blah" (and in the corresponding interface entry, of course) the VMI manged to start successfully. So now the issue is that if there's a problem with the naming - it must be reflected to the user as an error/event in the VMI description. Quique - do we know who has problems consuming this name? Is it virt-launcher pod, multus CNI, or any other entity? I have managed to pinpoint the problem in the network name - it's the dot. When I replace the dot with a dash, i.e. "eno4-1001-br" instead of "eno4.1001-br", the VMI starts running successfully. Hilarious... And another point worth emphasizing, although it seems to be quite obvious now: This has nothing to with the VM intended to serve as a DHCP server, nor with the bridge's port being a VLAN interface. I have verified that by testing the same setup I used in comment#6 - where the bridge is configured over a physical node interface (and not over a VLAN interface). The VMI failed to start when the network name was "eno4.1001-br", and ran successfully with "eno4-1001-br" Another resolution about this: This issue also occurs the default network name includes a dot for example "d.efault"). So I assume the problem is not in multus, but elsewhere (k8s, openshift, CNV... ?). Closing this one, and opened a more general and relevant ticket for this issue (https://bugzilla.redhat.com/show_bug.cgi?id=1853911) |