Description of problem: I've set up a cluster with IPv6 and installed the cluster-nfd-operator. All of the pods come OK but the nfd-worker can't connect and continually restarts. It looks like the IPv6 address is not formatted properly inside '[]' brackets and the port is treated as part of the IPv6 address: "address fd02::77ec:12000: too many colons in address" [stack@openshift-master-0 cluster-nfd-operator]$ oc get pod -A | grep nfd openshift-nfd nfd-master-55mhq 1/1 Running 0 178m openshift-nfd nfd-master-96pqt 1/1 Running 0 178m openshift-nfd nfd-master-w8fwn 1/1 Running 0 178m openshift-nfd nfd-operator-58cdbbb559-9gcgf 1/1 Running 0 178m openshift-nfd nfd-worker-5qbg2 1/1 Running 33 178m openshift-nfd nfd-worker-jkjwz 1/1 Running 33 178m This is on the worker node, the nfd-worker has exited: [core@worker-0 ~]$ sudo crictl ps -a | grep nfd-worker 2ef3ac57c9d8b virthost.ostest.test.metalkube.org:5000/localimages/origin-node-feature-discovery@sha256:75929c498301af285a8dcca4b17a45d5b53062c28b3a672a07b791be371757a1 4 minutes ago Exited nfd-worker 32 8b4166c54dbae [core@worker-0 ~]$ sudo crictl logs 2ef3ac57c9d8b 2021/01/07 18:59:08 Node Feature Discovery Worker 1.15 2021/01/07 18:59:08 NodeName: 'worker-0.ostest.test.metalkube.org' INFO: 2021/01/07 18:59:08 parsed scheme: "" INFO: 2021/01/07 18:59:08 scheme "" not registered, fallback to default scheme INFO: 2021/01/07 18:59:08 ccResolverWrapper: sending update to cc: {[{fd02::77ec:12000 <nil> 0 <nil>}] <nil> <nil>} INFO: 2021/01/07 18:59:08 ClientConn switching balancer to "pick_first" WARNING: 2021/01/07 18:59:08 grpc: addrConn.createTransport failed to connect to {fd02::77ec:12000 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: address fd02::77ec:12000: too many colons in address". Reconnecting... WARNING: 2021/01/07 18:59:09 grpc: addrConn.createTransport failed to connect to {fd02::77ec:12000 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: address fd02::77ec:12000: too many colons in address". Reconnecting... WARNING: 2021/01/07 18:59:10 grpc: addrConn.createTransport failed to connect to {fd02::77ec:12000 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: address fd02::77ec:12000: too many colons in address". Reconnecting... WARNING: 2021/01/07 18:59:13 grpc: addrConn.createTransport failed to connect to {fd02::77ec:12000 <nil> 0 <nil>}. Err :connection error: desc = "transport: ... 2021/01/07 19:00:08 ERROR: failed to connect: context deadline exceeded Version-Release number of selected component (if applicable): Using these images: quay.io/openshift-psap/cluster-nfd-operator BZ1906129 fb5bfca77c05 3 weeks ago 292 MB quay.io/openshift/origin-node-feature-discovery 4.7 211b098df529 3 weeks ago 256 MB How reproducible: Happens every time. Steps to Reproduce: 1. Download images as above using podman pull 2. Use a local image store so that the master can access the images locally instead of attempting to reach quay at and IPv4 address $ sudo podman push --tls-verify=false --authfile /opt/dev-scripts/pull_secret.json fb5bfca77c05 virthost.ostest.test.metalkube.org:5000/localimages/origin-cluster-nfd-operator:master $ sudo podman push --tls-verify=false --authfile /opt/dev-scripts/pull_secret.json 211b098df529 virthost.ostest.test.metalkube.org:5000/localimages/origin-node-feature-discovery:4.7 3. Update Makefile and manifests/0700_cr.yaml to use the local images 4. In cluster-nfd-operator run 'make deploy' Actual results: nfd-worker continuously restarts with connection errors Expected results: Can view labels updated by nfd-worker Additional info:
I think the issue is here: https://github.com/openshift/cluster-nfd-operator/blob/master/assets/worker/0700_worker_daemonset.yaml#L44 If NFD_MASTER_SERVICE_HOST is an IPv6 address it needs to be encoded in brackets to separate it from the ":$(NFD_MASTER_SERVICE_PORT)"
*** This bug has been marked as a duplicate of bug 1823765 ***