Bug 2050296 - Metal Day-1 - IPv6 Deployments with static IP fail to recognise hostname configured with rDNS
Summary: Metal Day-1 - IPv6 Deployments with static IP fail to recognise hostname conf...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Tomas Sedovic
QA Contact: Yoav Porag
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-03 16:26 UTC by Yoav Porag
Modified: 2023-09-18 04:31 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2052098 (view as bug list)
Environment:
Last Closed: 2023-03-03 14:31:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Trace logs from localhost node (7.18 MB, text/plain)
2022-02-04 21:19 UTC, Ben Nemec
no flags Details

Comment 1 Derek Higgins 2022-02-03 16:40:10 UTC
On the master server, coredns is returning localhost as the server name

[root@localhost core]# dig -x fd2e:6f44:5dd8::face  
e.c.a.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.d.d.5.4.4.f.6.e.2.d.f.ip6.arpa. 30 IN	PTR localhost.
e.c.a.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.d.d.5.4.4.f.6.e.2.d.f.ip6.arpa. 30 IN	PTR localhost.ocp-edge-cluster-0.qe.lab.redhat.com.

;; Query time: 0 msec
;; SERVER: fd2e:6f44:5dd8::face#53(fd2e:6f44:5dd8::face)
;; WHEN: Thu Feb 03 16:16:47 UTC 2022
;; MSG SIZE  rcvd: 340


The coredns config contains localhost for the IP in question, I'd have expected it to be master-0-0
[root@localhost core]# cat /etc/coredns/Corefile 
...
    hosts {
        fd2e:6f44:5dd8::face localhost localhost.ocp-edge-cluster-0.qe.lab.redhat.com
        fd2e:6f44:5dd8::86 master-0-1 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com
        fd2e:6f44:5dd8::61 master-0-2 master-0-2.ocp-edge-cluster-0.qe.lab.redhat.com
        fd2e:6f44:5dd8::35 worker-0-0 worker-0-0.ocp-edge-cluster-0.qe.lab.redhat.com
        fd2e:6f44:5dd8::83 worker-0-1 worker-0-1.ocp-edge-cluster-0.qe.lab.redhat.com
        fallthrough
    }
}


I've also confirmed that the dnsmasq server is returning the correct hostname
[root@localhost core]# cat /etc/resolv.conf 
# Generated by KNI resolv prepender NM dispatcher script
nameserver fd2e:6f44:5dd8::face
nameserver fd2e:6f44:5dd8::1

[root@localhost core]# dig @fd2e:6f44:5dd8::1 -x fd2e:6f44:5dd8::face | grep PTR
;e.c.a.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.d.d.5.4.4.f.6.e.2.d.f.ip6.arpa. IN PTR
e.c.a.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.d.d.5.4.4.f.6.e.2.d.f.ip6.arpa. 0 IN PTR master-0-0.ocp-edge-cluster-0.qe.lab.redhat.com.

Comment 2 Ben Nemec 2022-02-03 23:10:38 UTC
I reproduced this in my local env too. I don't think the problem is the local coredns config though - the incorrect name is a side effect of the node registering itself with "localhost", which had to happen before it shows up in the coredns hosts list.

I'm not sure why it's not doing the reverse lookup on the IP in the first place. Oddly enough, on my node it did eventually pick a hostname - just the wrong one. After it took the API VIP it looked up the VIP address and took the hostname "api". :-/

Comment 4 Ben Nemec 2022-02-04 21:18:07 UTC
I think this is some odd (and I assume incorrect) behavior from NetworkManager. It's looking up the link-local ipv6 address, getting back "localhost", and stopping there. See the trace logs from my local deployment:

Feb 04 19:30:27 localhost NetworkManager[1728]: <debug> [1644003027.9957] device[72e74ac9a16e8cd6] (enp2s0): hostname-from-dns: lookup done for fe80::218:28ff:feec:68c8, result "localhost"
Feb 04 19:30:27 localhost NetworkManager[1728]: <trace> [1644003027.9957] policy: set-hostname: updating hostname (lookup finished)

I think we're going to need help from the NM team to fix this, unless we can get the hostname setting through CBO to work so we don't have to rely on rDNS lookup. I'll upload the full trace logs as well.

Comment 5 Ben Nemec 2022-02-04 21:19:49 UTC
Created attachment 1859142 [details]
Trace logs from localhost node

Comment 6 Pedro Amoedo 2022-02-16 14:56:29 UTC
My 2 cents:

I also suffered the same behavior when deploying UPI BM IPv6 single-stack, when setting the IPv6 statically via LACP bonding (iPXE UEFI) the NM misbehaves and tries to set the hostnames using dhcp-internal causing unexpected "localhost" configuration.

NOTE: despite the static settings (there is no DHCP6), there is standard DHCP present in the cloudprovider network that could explain the behavior.

FWIW, I was able to workaround the problem setting the hostname statically via coreos-installer additional kernel arguments using an ignition hook via iPXE, I'm not sure if this procedure is suitable for IPI BM but wanted to share it with you in case it can help, example of ignition hook:

~~~
{"ignition":{"version":"3.1.0"},"systemd":{"units":[{"contents":"[Unit]\nDescription=Run installer with custom kargs\nRequires=coreos-installer-pre.target\nAfter=coreos-installer-pre.target\nOnFailure=emergency.target\nOnFailureJobMode=replace-irreversibly\nAfter=network-online.target\nWants=network-online.target\n\n[Service]\nType=oneshot\nExecStart=/usr/bin/coreos-installer install /dev/sda --delete-karg console=ttyS0,115200n8 --append-karg console=ttyS0,115200n8 --append-karg bond=bond0:<NIC0>,<NIC1>:mode=802.3ad,lacp_rate=0,miimon=100,updelay=200,downdelay=200 --append-karg ip=<NIC0>:off --append-karg ip=[<IPV6_ADDR>]::[<IPV6_GW>]:127:<HOSTNAME>:bond0:none --append-karg nameserver=[<IPV6_DNS1>] --append-karg nameserver=[IPv6_DNS2>] --fetch-retries 10 --ignition-url http://<AUX_SERVER_IP>:8000/rhcos/ignitions/<CLUSTER_NAME>/<FLAVOR>.ign --insecure-ignition\nExecStart=/usr/bin/systemctl --no-block reboot\nStandardOutput=kmsg+console\nStandardError=kmsg+console\n\n[Install]\nRequiredBy=default.target\n","enabled":true,"name":"install.service"}]}}
~~~

NOTE: The parameter "--fetch-retries" was also needed in my case because the secondary interface in the LACP bonding sometimes take more time to be ready so the ignition gathering fails in the first attempt.

Best Regards.

Comment 7 Adina Wolff 2022-07-14 16:28:47 UTC
Update on the behavior: This issue does not occur if the cluster is deployed on anvironment without a dhcp server, or if the 'dhcp' field is absent from networkConfig in install-config.yaml.
In those cases, the deployment fails at a later stage. Issue is described in bz2105973

Comment 13 Red Hat Bugzilla 2023-09-18 04:31:31 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.