Description of problem: The mdns-publisher worker pods are crashing and restarting due to panics in the go code. ~~~ # oc get pod mdns-publisher-name-xn2w4-worker-0-tkpmc -o jsonpath='{.status}' |jq . [...] "containerStatuses": [ { "containerID": "cri-o://d58725c656d0cf9f8c83dfb54eb03b8b06513461ccc6c73a0392d64375897b25", "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:582087fd639adc5ed9064c32ff3891babad6d339f8046fd217a69e2f5caef9dc", "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:582087fd639adc5ed9064c32ff3891babad6d339f8046fd217a69e2f5caef9dc", "lastState": { "terminated": { "containerID": "cri-o://edf27522aea137b84a3671401f51457ea8d222034d80b0603925a1408b3c86b0", "exitCode": 2, "finishedAt": "2021-07-05T00:06:41Z", "message": " 0x10, 0x2, 0x5a2, 0xc000024a1c, 0x4, 0xc0001b8610, 0x6, 0x1470, ...)\n\t/go/src/github.com/openshift/mdns-publisher/pkg/publisher/publisher.go:95 +0x45\ngithub.com/openshift/mdns-publisher/pkg/publisher. IfaceCheck(0xc0000249c0, 0x10, 0x10, 0x2, 0x5a2, 0xc000024a1c, 0x4, 0xc0001b8610, 0x6, 0x1470, ...)\n\t/go/src/github.com/openshift/mdns-publisher/pkg/publisher/publisher.go:101 +0xa5\ncreated by github.com/openshift/mdns-publisher/cmd.glob..func1\n\t/go/src/github.com/openshift/mdns-publisher/cmd/publish.go:93 +0xc6a\n\ngoroutine 22 [IO wait]:\ninternal/poll.runtime_pollWait(0x7efe9e7c4e90, 0x72, 0xc0004803c0)\n\t/usr/lib/golang/src/runtime/netpoll.go:222 +0x55\ninternal/poll.(*pollDesc).wait(0xc000198818, 0x72, 0x0, 0x0, 0x0)\n\t/usr/lib/golang/src/internal/poll/fd_poll_runtime.go:87 +0x45\ninternal/poll.(*pollDesc).waitRead(...)\n\t/usr/lib/golang/src/internal/poll/f d_poll_runtime.go:92\ninternal/poll.(*FD).RawRead(0xc000198800, 0xc00017b4a0, 0x0, 0x0)\n\t/usr/lib/golang/src/internal/poll/fd_unix.go:533 +0xfc\nnet.(*rawConn).Read(0xc000012150, 0xc00017b4a0, 0x1, 0x1)\n\t/usr/lib/golang/src/net/rawconn.go:43 +0x68\ngolang.org/x/net/internal/socket.(*Conn).recvMsg(0xc000010e20, 0xc000208ee0, 0x0, 0x28, 0xc000208f38)\n\t/go/src/github.com/openshift/mdns-publisher/vendor/golang.org/x/net/internal/socket/rawconn_ms g.go:31 +0x20e\ngolang.org/x/net/internal/socket.(*Conn).RecvMsg(...)\n\t/go/src/github.com/openshift/mdns-publisher/vendor/golang.org/x/net/internal/socket/socket.go:255\ngolang.org/x/net/ipv6.(*payloadHandler).ReadFrom(0xc0000a2e70, 0xc00015a000, 0x10000, 0x10000, 0x2, 0x94f740, 0xc00017b440, 0x0, 0x0, 0x0)\n\t/go/src/github.com/openshift/mdns-publisher/vendor/golang.org/x/net/ipv6/payload_cmsg.go:31 +0x1df\ngithub.com/celebdor/zeroconf.(*Server).recv6(0xc000099260, 0xc0000a2e60)\n\t/go/src/github.com/openshift/mdns-publisher/vendor/github.com/celebdor/zeroconf/server.go:302 +0x109\ncreated by github.com/celebdor/zeroconf.(*Server).mainloop\n\t/go/src/github.com/opens hift/mdns-publisher/vendor/github.com/celebdor/zeroconf/server.go:211 +0x65\n", "reason": "Error", "startedAt": "2021-07-04T17:28:19Z" } [...] # oc logs mdns-publisher-name-xn2w4-worker-0-tkpmc -p time="2021-07-04T17:28:19Z" level=info msg="Publishing with settings" collision_avoidance=hostname ip=x.x.x.x time="2021-07-04T17:28:19Z" level=info msg="Binding interface" name=ens3 time="2021-07-04T17:28:19Z" level=debug msg="Changing service name" new="name Workstation-name-xn2w4-worker-0-tkpmc" original="name Workstation" time="2021-07-04T17:28:19Z" level=info msg="Publishing service" domain=local. hostname=name-xn2w4-worker-0-tkpmc.local. name="name Workstation-name-xn2w4-worker-0-tkpmc" port=42424 ttl=3200 type=_workstation._tcp time="2021-07-04T17:28:19Z" level=info msg="Zeroconf registering service" name="name Workstation-name-xn2w4-worker-0-tkpmc" time="2021-07-04T17:28:19Z" level=info msg="Zeroconf setting service ttl" name="name Workstation-name-xn2w4-worker-0-tkpmc" ttl=3200 fatal error: concurrent map read and map write goroutine 21 [running]: runtime.throw(0x8d6800, 0x21) /usr/lib/golang/src/runtime/panic.go:1116 +0x72 fp=0xc000202d18 sp=0xc000202ce8 pc=0x43b952 runtime.mapaccess2_fast64(0x85d0a0, 0xc0001c2660, 0x2, 0xc00001000c, 0xc000202de0) /usr/lib/golang/src/runtime/map_fast64.go:61 +0x1ac fp=0xc000202d40 sp=0xc000202d18 pc=0x41832c github.com/celebdor/zeroconf.(*Server).handleQuery(0xc000099260, 0xc000202eb0, 0x2, 0x94f740, 0xc0004269c0, 0x0, 0x0) /go/src/github.com/openshift/mdns-publisher/vendor/github.com/celebdor/zeroconf/server.go:366 +0x2f3 fp=0xc000202e78 sp=0xc000202d40 pc=0x693cd3 github.com/celebdor/zeroconf.(*Server).parsePacket(0xc000099260, 0xc00028e000, 0x29, 0x10000, 0x2, 0x94f740, 0xc0004269c0, 0xc0004269c0, 0x0) /go/src/github.com/openshift/mdns-publisher/vendor/github.com/celebdor/zeroconf/server.go:323 +0xf9 fp=0xc000202f48 sp=0xc000202e78 pc=0x693999 github.com/celebdor/zeroconf.(*Server).recv4(0xc000099260, 0xc0000a2e10) /go/src/github.com/openshift/mdns-publisher/vendor/github.com/celebdor/zeroconf/server.go:281 +0x16f fp=0xc000202fd0 sp=0xc000202f48 pc=0x69364f runtime.goexit() /usr/lib/golang/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000202fd8 sp=0xc000202fd0 pc=0x470401 created by github.com/celebdor/zeroconf.(*Server).mainloop /go/src/github.com/openshift/mdns-publisher/vendor/github.com/celebdor/zeroconf/server.go:208 +0x89 goroutine 1 [select, 396 minutes]: github.com/openshift/mdns-publisher/cmd.glob..func1(0xbbeaa0, 0xc00007b280, 0x0, 0x1) /go/src/github.com/openshift/mdns-publisher/cmd/publish.go:95 +0xcfa github.com/spf13/cobra.(*Command).execute(0xbbeaa0, 0xc000010090, 0x1, 0x1, 0xbbeaa0, 0xc000010090) /go/src/github.com/openshift/mdns-publisher/vendor/github.com/spf13/cobra/command.go:830 +0x2c2 github.com/spf13/cobra.(*Command).ExecuteC(0xbbeaa0, 0x44b24a, 0xb84160, 0xc000000180) /go/src/github.com/openshift/mdns-publisher/vendor/github.com/spf13/cobra/command.go:914 +0x30b github.com/spf13/cobra.(*Command).Execute(...) /go/src/github.com/openshift/mdns-publisher/vendor/github.com/spf13/cobra/command.go:864 github.com/openshift/mdns-publisher/cmd.Execute() /go/src/github.com/openshift/mdns-publisher/cmd/publish.go:139 +0x31 main.main() /go/src/github.com/openshift/mdns-publisher/main.go:8 +0x25 goroutine 18 [syscall, 396 minutes]: os/signal.signal_recv(0x0) /usr/lib/golang/src/runtime/sigqueue.go:147 +0x9d os/signal.loop() /usr/lib/golang/src/os/signal/signal_unix.go:23 +0x25 created by os/signal.Notify.func1.1 /usr/lib/golang/src/os/signal/signal.go:150 +0x45 goroutine 19 [chan receive, 396 minutes]: github.com/openshift/mdns-publisher/pkg/publisher.Publish(0xc0000249c0, 0x10, 0x10, 0x2, 0x5a2, 0xc000024a1c, 0x4, 0xc0001b8610, 0x6, 0x1470, ...) /go/src/github.com/openshift/mdns-publisher/pkg/publisher/publisher.go:41 +0x6ac created by github.com/openshift/mdns-publisher/cmd.glob..func1 /go/src/github.com/openshift/mdns-publisher/cmd/publish.go:90 +0xba5 goroutine 20 [sleep]: time.Sleep(0x12a05f200) /usr/lib/golang/src/runtime/time.go:188 +0xbf github.com/openshift/mdns-publisher/pkg/publisher.ifaceCheck(0xc0000249c0, 0x10, 0x10, 0x2, 0x5a2, 0xc000024a1c, 0x4, 0xc0001b8610, 0x6, 0x1470, ...) /go/src/github.com/openshift/mdns-publisher/pkg/publisher/publisher.go:95 +0x45 github.com/openshift/mdns-publisher/pkg/publisher.IfaceCheck(0xc0000249c0, 0x10, 0x10, 0x2, 0x5a2, 0xc000024a1c, 0x4, 0xc0001b8610, 0x6, 0x1470, ...) /go/src/github.com/openshift/mdns-publisher/pkg/publisher/publisher.go:101 +0xa5 created by github.com/openshift/mdns-publisher/cmd.glob..func1 /go/src/github.com/openshift/mdns-publisher/cmd/publish.go:93 +0xc6a goroutine 22 [IO wait]: internal/poll.runtime_pollWait(0x7efe9e7c4e90, 0x72, 0xc0004803c0) /usr/lib/golang/src/runtime/netpoll.go:222 +0x55 internal/poll.(*pollDesc).wait(0xc000198818, 0x72, 0x0, 0x0, 0x0) /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(*pollDesc).waitRead(...) /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(*FD).RawRead(0xc000198800, 0xc00017b4a0, 0x0, 0x0) /usr/lib/golang/src/internal/poll/fd_unix.go:533 +0xfc net.(*rawConn).Read(0xc000012150, 0xc00017b4a0, 0x1, 0x1) /usr/lib/golang/src/net/rawconn.go:43 +0x68 golang.org/x/net/internal/socket.(*Conn).recvMsg(0xc000010e20, 0xc000208ee0, 0x0, 0x28, 0xc000208f38) /go/src/github.com/openshift/mdns-publisher/vendor/golang.org/x/net/internal/socket/rawconn_msg.go:31 +0x20e golang.org/x/net/internal/socket.(*Conn).RecvMsg(...) /go/src/github.com/openshift/mdns-publisher/vendor/golang.org/x/net/internal/socket/socket.go:255 golang.org/x/net/ipv6.(*payloadHandler).ReadFrom(0xc0000a2e70, 0xc00015a000, 0x10000, 0x10000, 0x2, 0x94f740, 0xc00017b440, 0x0, 0x0, 0x0) /go/src/github.com/openshift/mdns-publisher/vendor/golang.org/x/net/ipv6/payload_cmsg.go:31 +0x1df github.com/celebdor/zeroconf.(*Server).recv6(0xc000099260, 0xc0000a2e60) /go/src/github.com/openshift/mdns-publisher/vendor/github.com/celebdor/zeroconf/server.go:302 +0x109 created by github.com/celebdor/zeroconf.(*Server).mainloop /go/src/github.com/openshift/mdns-publisher/vendor/github.com/celebdor/zeroconf/server.go:211 +0x65 ~~~ There is no Dualstack in use: ~~~ # oc describe network.config/cluster Name: cluster Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: Network Metadata: Creation Timestamp: 2021-06-21T11:23:30Z Generation: 2 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:spec: .: f:clusterNetwork: f:externalIP: .: f:policy: f:networkType: f:serviceNetwork: f:status: Manager: cluster-bootstrap Operation: Update Time: 2021-06-21T11:23:30Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: f:clusterNetwork: f:networkType: f:serviceNetwork: Manager: cluster-network-operator Operation: Update Time: 2021-06-21T11:32:03Z Resource Version: 3631 Self Link: /apis/config.openshift.io/v1/networks/cluster UID: eb3a9750-6040-4be4-8ac4-ef9b4e325eea Spec: Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 External IP: Policy: Network Type: Kuryr Service Network: 172.30.0.0/16 Status: Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 Network Type: Kuryr Service Network: 172.30.0.0/16 Events: <none> ~~~ Version-Release number of selected component (if applicable): How reproducible: This was observed in SoS and kuryr enabled but also with IPI installer in non RHOSP environments. Steps to Reproduce: Actual results: mdns-publisher worker pods gets restarted often. Expected results: - mdns-publisher worker pods are not restarting often. Additional info:
I found these issue related to it: * https://github.com/celebdor/zeroconf/pull/3 * https://github.com/dcbw/zeroconf/commit/cf83d55efa2450344cb81a395c7ba439a001f6ca
It turns out this was already fixed by https://github.com/dcbw/zeroconf/commit/cf83d55efa2450344cb81a395c7ba439a001f6ca but we need to re-vendor zeroconf in mdns-publisher to make it effective.
can't really verify on this build, since mDNS pods were removed from 4.8, will do real verification on 4.7 backport
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759
automation not relevant for this version, only 4.7 and 4.6