Created attachment 1729053 [details] NetworkManager.log Description of problem: Baremetal IPI deployment with IPv6 control plane fails when the nodes obtain both SLAAC and DHCPv6 addresses as they set their hostname to 'localhost' Version-Release number of selected component (if applicable): 4.6.3 How reproducible: 100% Steps to Reproduce: 1. Deploy baremetal setup via IPI flow with IPv6 control plane 2. Make sure that the control plane NICs obtain both SLAAC and DHCPv6 addresses, e.g: ip a s dev br-ex 15: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 48:df:37:c7:75:d8 brd ff:ff:ff:ff:ff:ff inet6 2620:52:0:2e39::20/128 scope global tentative dynamic noprefixroute valid_lft 3600sec preferred_lft 3600sec inet6 2620:52:0:2e39:2506:eb60:1bb9:8bb3/64 scope global tentative dynamic noprefixroute valid_lft 86400sec preferred_lft 14400sec inet6 fe80::1981:fdfa:87d7:7643/64 scope link noprefixroute valid_lft forever preferred_lft forever 2620:52:0:2e39::20 is provided via DHCPv6: dnsmasq-dhcp[1283315]: 10225194 DHCPSOLICIT(baremetal) 00:03:00:01:48:df:37:c7:75:d8 dnsmasq-dhcp[1283315]: 10225194 DHCPREPLY(baremetal) 2620:52:0:2e39::20 00:03:00:01:48:df:37:c7:75:d8 openshift-master-0 dnsmasq-dhcp[1283315]: 10225194 requested options: 23:dns-server, 24:domain-search, 56:ntp-server, dnsmasq-dhcp[1283315]: 10225194 requested options: 31:sntp-server dnsmasq-dhcp[1283315]: 10225194 tags: known, dhcpv6, baremetal dnsmasq-dhcp[1283315]: 10225194 sent size: 10 option: 1 client-id 00:03:00:01:48:df:37:c7:75:d8 dnsmasq-dhcp[1283315]: 10225194 sent size: 14 option: 2 server-id 00:01:00:01:27:3e:c2:7a:94:40:c9:f8:24:2a dnsmasq-dhcp[1283315]: 10225194 sent size: 0 option: 14 rapid-commit dnsmasq-dhcp[1283315]: 10225194 sent size: 40 option: 3 ia-na IAID=935818712 T1=1800 T2=3150 dnsmasq-dhcp[1283315]: 10225194 nest size: 24 option: 5 iaaddr 2620:52:0:2e39::20 PL=3600 VL=3600 dnsmasq-dhcp[1283315]: 10225194 sent size: 9 option: 13 status 0 success dnsmasq-dhcp[1283315]: 10225194 sent size: 1 option: 7 preference 0 dnsmasq-dhcp[1283315]: 10225194 sent size: 16 option: 23 dns-server 2620:52:0:aa0::dead:beef dnsmasq-dhcp[1283315]: 10225194 sent size: 55 option: 39 FQDN openshift-master-0.ocp-edge1.lab.eng.tlv2.redhat.com 2620:52:0:2e39:2506:eb60:1bb9:8bb3 is SLAAC address. In this environment RAs were provided by radvd with AdvAutonomous on options, below is the radvd.conf: interface baremetal { AdvManagedFlag on; AdvOtherConfigFlag on; AdvSendAdvert on; MinRtrAdvInterval 30; MaxRtrAdvInterval 100; AdvDefaultLifetime 100; prefix 2620:52:0:2e39::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr off; }; route ::/0 { AdvRoutePreference medium; RemoveRoute off; }; }; 3. SSH to one of the nodes and check the hostname: Actual results: [root@localhost core]# hostname -f localhost Expected results: hostname is set according to the option provided by DHCPv6 server, e.g: option: 39 FQDN openshift-master-0.ocp-edge1.lab.eng.tlv2.redhat.com Additional info: Attaching NetworkManager log from one of the master nodes.
Is there a DNS AAAA record for 2620:52:0:2e39::20 from which to obtain a name? Is there a DHCPv6 provided hostname?
(In reply to Antoni Segura Puimedon from comment #1) > Is there a DNS AAAA record for 2620:52:0:2e39::20 from which to obtain a > name? Is there a DHCPv6 provided hostname? Yes, there's a PTR record: dig -x 2620:52:0:2e39::20 +short openshift-master-0.ocp-edge1.lab.eng.tlv2.redhat.com. DHCPv6 also provides the hostname, pasting below the dnsmasq config: cat /etc/dnsmasq.d/baremetal.conf strict-order local=/ocp-edge1.lab.eng.tlv2.redhat.com/ domain=ocp-edge1.lab.eng.tlv2.redhat.com expand-hosts pid-file=/var/run/dnsmasq.pid except-interface=lo bind-dynamic interface=baremetal dhcp-option=option6:dns-server,[2620:52:0:aa0::dead:beef] dhcp-range=2620:52:0:2e39::d1,2620:52:0:2e39::f4,64 dhcp-lease-max=81 dhcp-hostsfile=/var/lib/dnsmasq/baremetal.hostsfile log-dhcp cat /var/lib/dnsmasq/baremetal.hostsfile id:00:03:00:01:48:df:37:c7:75:d8,openshift-master-0.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::20] id:00:03:00:01:48:df:37:c7:76:48,openshift-master-1.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::21] id:00:03:00:01:48:df:37:c7:76:18,openshift-master-2.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::22] id:00:03:00:01:48:df:37:c6:39:f8,openshift-worker-0.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::23] id:00:03:00:01:48:df:37:c7:76:b8,openshift-worker-1.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::24] id:00:03:00:01:BC:97:E1:69:DA:81,openshift-worker-2.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::25] id:00:03:00:01:BC:97:E1:29:9C:81,openshift-worker-3.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::26] RAs providing SLAAC address were originating from radvd: interface baremetal { AdvManagedFlag on; AdvOtherConfigFlag on; AdvSendAdvert on; MinRtrAdvInterval 30; MaxRtrAdvInterval 100; AdvDefaultLifetime 100; prefix 2620:52:0:2e39::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr off; }; route ::/0 { AdvRoutePreference medium; RemoveRoute off; }; };
I think the SLAAC aspect is a red herring. I am able to deploy with both DHCPv6 and SLAAC addresses locally. Looking through the logs I see a couple of issues. First, the hostname is not in the dhcp6 options reported by NM. I thought that was fixed for 4.6, but I could be mistaken. Maybe it wasn't fixed yet in 4.6.3? It should definitely be fixed in 4.7. Second, the resolv-prepender script is failing. Possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1905233 since the subnets overlap? I think we'd need to see the output from the resolv-prepender in order to figure out what's going on there. I should note that I've done several deployments with SLAAC addresses in my dev environment and haven't been able to reproduce this problem. Even using overlapping subnets has not been an issue. Tomorrow I might try deploying 4.6.3 specifically just to see if this is something that was accidentally fixed since then.
I was able to reproduce this bug in 4.6.3 but not in 4.6.9, so I believe this was fixed since the bug was opened.
Verified on OCP 4.7.0-fc.1 [core@master-0-0 ~]$ hostname master-0-0.ocp-edge-cluster-0.qe.lab.redhat.com
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633