Bug 1897516 - Baremetal IPI deployment with IPv6 control plane fails when the nodes obtain both SLAAC and DHCPv6 addresses as they set their hostname to 'localhost'
Summary: Baremetal IPI deployment with IPv6 control plane fails when the nodes obtain ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.7.0
Assignee: Antoni Segura Puimedon
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-13 10:08 UTC by Marius Cornea
Modified: 2021-02-24 15:33 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:33:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
NetworkManager.log (48.78 KB, text/plain)
2020-11-13 10:08 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:33:49 UTC

Description Marius Cornea 2020-11-13 10:08:33 UTC
Created attachment 1729053 [details]
NetworkManager.log

Description of problem:

Baremetal IPI deployment with IPv6 control plane fails when the nodes obtain both SLAAC and DHCPv6 addresses as they set their hostname to 'localhost'


Version-Release number of selected component (if applicable):
4.6.3

How reproducible:
100%

Steps to Reproduce:

1. Deploy baremetal setup via IPI flow with IPv6 control plane

2. Make sure that the control plane NICs obtain both SLAAC and DHCPv6 addresses, e.g:

ip a s dev br-ex

15: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 48:df:37:c7:75:d8 brd ff:ff:ff:ff:ff:ff
    inet6 2620:52:0:2e39::20/128 scope global tentative dynamic noprefixroute 
       valid_lft 3600sec preferred_lft 3600sec
    inet6 2620:52:0:2e39:2506:eb60:1bb9:8bb3/64 scope global tentative dynamic noprefixroute 
       valid_lft 86400sec preferred_lft 14400sec
    inet6 fe80::1981:fdfa:87d7:7643/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

2620:52:0:2e39::20 is provided via DHCPv6:

dnsmasq-dhcp[1283315]: 10225194 DHCPSOLICIT(baremetal) 00:03:00:01:48:df:37:c7:75:d8
dnsmasq-dhcp[1283315]: 10225194 DHCPREPLY(baremetal) 2620:52:0:2e39::20 00:03:00:01:48:df:37:c7:75:d8 openshift-master-0
dnsmasq-dhcp[1283315]: 10225194 requested options: 23:dns-server, 24:domain-search, 56:ntp-server,
dnsmasq-dhcp[1283315]: 10225194 requested options: 31:sntp-server
dnsmasq-dhcp[1283315]: 10225194 tags: known, dhcpv6, baremetal
dnsmasq-dhcp[1283315]: 10225194 sent size: 10 option:  1 client-id  00:03:00:01:48:df:37:c7:75:d8
dnsmasq-dhcp[1283315]: 10225194 sent size: 14 option:  2 server-id  00:01:00:01:27:3e:c2:7a:94:40:c9:f8:24:2a
dnsmasq-dhcp[1283315]: 10225194 sent size:  0 option: 14 rapid-commit
dnsmasq-dhcp[1283315]: 10225194 sent size: 40 option:  3 ia-na  IAID=935818712 T1=1800 T2=3150
dnsmasq-dhcp[1283315]: 10225194 nest size: 24 option:  5 iaaddr  2620:52:0:2e39::20 PL=3600 VL=3600
dnsmasq-dhcp[1283315]: 10225194 sent size:  9 option: 13 status  0 success
dnsmasq-dhcp[1283315]: 10225194 sent size:  1 option:  7 preference  0
dnsmasq-dhcp[1283315]: 10225194 sent size: 16 option: 23 dns-server  2620:52:0:aa0::dead:beef
dnsmasq-dhcp[1283315]: 10225194 sent size: 55 option: 39 FQDN  openshift-master-0.ocp-edge1.lab.eng.tlv2.redhat.com

2620:52:0:2e39:2506:eb60:1bb9:8bb3 is SLAAC address. In this environment RAs were provided by radvd with AdvAutonomous on options, below is the radvd.conf:

interface baremetal
{
AdvManagedFlag on;
AdvOtherConfigFlag on;
AdvSendAdvert on;
MinRtrAdvInterval 30;
MaxRtrAdvInterval 100;
AdvDefaultLifetime 100;
prefix 2620:52:0:2e39::/64
{
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr off;
};
route ::/0 {
AdvRoutePreference medium;
RemoveRoute off;
};
};

3. SSH to one of the nodes and check the hostname:


Actual results:

[root@localhost core]# hostname -f
localhost


Expected results:

hostname is set according to the option provided by DHCPv6 server, e.g:
 option: 39 FQDN  openshift-master-0.ocp-edge1.lab.eng.tlv2.redhat.com

Additional info:

Attaching NetworkManager log from one of the master nodes.

Comment 1 Antoni Segura Puimedon 2020-11-18 20:33:41 UTC
Is there a DNS AAAA record for 2620:52:0:2e39::20 from which to obtain a name? Is there a DHCPv6 provided hostname?

Comment 2 Marius Cornea 2020-11-19 10:31:41 UTC
(In reply to Antoni Segura Puimedon from comment #1)
> Is there a DNS AAAA record for 2620:52:0:2e39::20 from which to obtain a
> name? Is there a DHCPv6 provided hostname?

Yes, there's a PTR record:

dig -x 2620:52:0:2e39::20  +short
openshift-master-0.ocp-edge1.lab.eng.tlv2.redhat.com.

DHCPv6 also provides the hostname, pasting below the dnsmasq config:

cat /etc/dnsmasq.d/baremetal.conf 

strict-order
local=/ocp-edge1.lab.eng.tlv2.redhat.com/
domain=ocp-edge1.lab.eng.tlv2.redhat.com
expand-hosts
pid-file=/var/run/dnsmasq.pid
except-interface=lo
bind-dynamic
interface=baremetal
dhcp-option=option6:dns-server,[2620:52:0:aa0::dead:beef]
dhcp-range=2620:52:0:2e39::d1,2620:52:0:2e39::f4,64
dhcp-lease-max=81
dhcp-hostsfile=/var/lib/dnsmasq/baremetal.hostsfile
log-dhcp

cat /var/lib/dnsmasq/baremetal.hostsfile
id:00:03:00:01:48:df:37:c7:75:d8,openshift-master-0.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::20]
id:00:03:00:01:48:df:37:c7:76:48,openshift-master-1.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::21]
id:00:03:00:01:48:df:37:c7:76:18,openshift-master-2.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::22]
id:00:03:00:01:48:df:37:c6:39:f8,openshift-worker-0.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::23]
id:00:03:00:01:48:df:37:c7:76:b8,openshift-worker-1.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::24]
id:00:03:00:01:BC:97:E1:69:DA:81,openshift-worker-2.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::25]
id:00:03:00:01:BC:97:E1:29:9C:81,openshift-worker-3.ocp-edge1.lab.eng.tlv2.redhat.com,[2620:52:0:2e39::26]

RAs providing SLAAC address were originating from radvd:

interface baremetal
{
AdvManagedFlag on;
AdvOtherConfigFlag on;
AdvSendAdvert on;
MinRtrAdvInterval 30;
MaxRtrAdvInterval 100;
AdvDefaultLifetime 100;
prefix 2620:52:0:2e39::/64
{
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr off;
};
route ::/0 {
AdvRoutePreference medium;
RemoveRoute off;
};
};

Comment 3 Ben Nemec 2020-12-22 22:58:58 UTC
I think the SLAAC aspect is a red herring. I am able to deploy with both DHCPv6 and SLAAC addresses locally.

Looking through the logs I see a couple of issues. First, the hostname is not in the dhcp6 options reported by NM. I thought that was fixed for 4.6, but I could be mistaken. Maybe it wasn't fixed yet in 4.6.3? It should definitely be fixed in 4.7.

Second, the resolv-prepender script is failing. Possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1905233 since the subnets overlap? I think we'd need to see the output from the resolv-prepender in order to figure out what's going on there.

I should note that I've done several deployments with SLAAC addresses in my dev environment and haven't been able to reproduce this problem. Even using overlapping subnets has not been an issue. Tomorrow I might try deploying 4.6.3 specifically just to see if this is something that was accidentally fixed since then.

Comment 4 Ben Nemec 2020-12-23 18:31:18 UTC
I was able to reproduce this bug in 4.6.3 but not in 4.6.9, so I believe this was fixed since the bug was opened.

Comment 5 Victor Voronkov 2021-01-08 13:32:06 UTC
Verified on OCP 4.7.0-fc.1

[core@master-0-0 ~]$ hostname
master-0-0.ocp-edge-cluster-0.qe.lab.redhat.com

Comment 8 errata-xmlrpc 2021-02-24 15:33:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.