Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2050296

Summary:

Metal Day-1 - IPv6 Deployments with static IP fail to recognise hostname configured with rDNS

Product:

OpenShift Container Platform

Reporter:

Yoav Porag <yporagpa>

Component:

Installer

Assignee:

Tomas Sedovic <tsedovic>

Installer sub component:

OpenShift on Bare Metal IPI

QA Contact:

Yoav Porag <yporagpa>

Status:

CLOSED WORKSFORME

Docs Contact:

Severity:

high

Priority:

high

CC:

augol, awolff, bnemec, derekh, eglottma, grajaiya, imelofer, jhajyahy, mifiedle, pamoedom, tsedovic

Version:

4.10

Keywords:

Triaged

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

2052098 (view as bug list)

Environment:

Last Closed:

2023-03-03 14:31:16 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Trace logs from localhost node	none

Comment 1 Derek Higgins 2022-02-03 16:40:10 UTC

On the master server, coredns is returning localhost as the server name

[root@localhost core]# dig -x fd2e:6f44:5dd8::face  
e.c.a.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.d.d.5.4.4.f.6.e.2.d.f.ip6.arpa. 30 IN	PTR localhost.
e.c.a.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.d.d.5.4.4.f.6.e.2.d.f.ip6.arpa. 30 IN	PTR localhost.ocp-edge-cluster-0.qe.lab.redhat.com.

;; Query time: 0 msec
;; SERVER: fd2e:6f44:5dd8::face#53(fd2e:6f44:5dd8::face)
;; WHEN: Thu Feb 03 16:16:47 UTC 2022
;; MSG SIZE  rcvd: 340


The coredns config contains localhost for the IP in question, I'd have expected it to be master-0-0
[root@localhost core]# cat /etc/coredns/Corefile 
...
    hosts {
        fd2e:6f44:5dd8::face localhost localhost.ocp-edge-cluster-0.qe.lab.redhat.com
        fd2e:6f44:5dd8::86 master-0-1 master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com
        fd2e:6f44:5dd8::61 master-0-2 master-0-2.ocp-edge-cluster-0.qe.lab.redhat.com
        fd2e:6f44:5dd8::35 worker-0-0 worker-0-0.ocp-edge-cluster-0.qe.lab.redhat.com
        fd2e:6f44:5dd8::83 worker-0-1 worker-0-1.ocp-edge-cluster-0.qe.lab.redhat.com
        fallthrough
    }
}


I've also confirmed that the dnsmasq server is returning the correct hostname
[root@localhost core]# cat /etc/resolv.conf 
# Generated by KNI resolv prepender NM dispatcher script
nameserver fd2e:6f44:5dd8::face
nameserver fd2e:6f44:5dd8::1

[root@localhost core]# dig @fd2e:6f44:5dd8::1 -x fd2e:6f44:5dd8::face | grep PTR
;e.c.a.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.d.d.5.4.4.f.6.e.2.d.f.ip6.arpa. IN PTR
e.c.a.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.d.d.5.4.4.f.6.e.2.d.f.ip6.arpa. 0 IN PTR master-0-0.ocp-edge-cluster-0.qe.lab.redhat.com.

Comment 2 Ben Nemec 2022-02-03 23:10:38 UTC

I reproduced this in my local env too. I don't think the problem is the local coredns config though - the incorrect name is a side effect of the node registering itself with "localhost", which had to happen before it shows up in the coredns hosts list.

I'm not sure why it's not doing the reverse lookup on the IP in the first place. Oddly enough, on my node it did eventually pick a hostname - just the wrong one. After it took the API VIP it looked up the VIP address and took the hostname "api". :-/

Comment 4 Ben Nemec 2022-02-04 21:18:07 UTC

I think this is some odd (and I assume incorrect) behavior from NetworkManager. It's looking up the link-local ipv6 address, getting back "localhost", and stopping there. See the trace logs from my local deployment:

Feb 04 19:30:27 localhost NetworkManager[1728]: <debug> [1644003027.9957] device[72e74ac9a16e8cd6] (enp2s0): hostname-from-dns: lookup done for fe80::218:28ff:feec:68c8, result "localhost"
Feb 04 19:30:27 localhost NetworkManager[1728]: <trace> [1644003027.9957] policy: set-hostname: updating hostname (lookup finished)

I think we're going to need help from the NM team to fix this, unless we can get the hostname setting through CBO to work so we don't have to rely on rDNS lookup. I'll upload the full trace logs as well.

Comment 5 Ben Nemec 2022-02-04 21:19:49 UTC

Created attachment 1859142 [details]
Trace logs from localhost node

Comment 6 Pedro Amoedo 2022-02-16 14:56:29 UTC

My 2 cents:

I also suffered the same behavior when deploying UPI BM IPv6 single-stack, when setting the IPv6 statically via LACP bonding (iPXE UEFI) the NM misbehaves and tries to set the hostnames using dhcp-internal causing unexpected "localhost" configuration.

NOTE: despite the static settings (there is no DHCP6), there is standard DHCP present in the cloudprovider network that could explain the behavior.

FWIW, I was able to workaround the problem setting the hostname statically via coreos-installer additional kernel arguments using an ignition hook via iPXE, I'm not sure if this procedure is suitable for IPI BM but wanted to share it with you in case it can help, example of ignition hook:

~~~
{"ignition":{"version":"3.1.0"},"systemd":{"units":[{"contents":"[Unit]\nDescription=Run installer with custom kargs\nRequires=coreos-installer-pre.target\nAfter=coreos-installer-pre.target\nOnFailure=emergency.target\nOnFailureJobMode=replace-irreversibly\nAfter=network-online.target\nWants=network-online.target\n\n[Service]\nType=oneshot\nExecStart=/usr/bin/coreos-installer install /dev/sda --delete-karg console=ttyS0,115200n8 --append-karg console=ttyS0,115200n8 --append-karg bond=bond0:<NIC0>,<NIC1>:mode=802.3ad,lacp_rate=0,miimon=100,updelay=200,downdelay=200 --append-karg ip=<NIC0>:off --append-karg ip=[<IPV6_ADDR>]::[<IPV6_GW>]:127:<HOSTNAME>:bond0:none --append-karg nameserver=[<IPV6_DNS1>] --append-karg nameserver=[IPv6_DNS2>] --fetch-retries 10 --ignition-url http://<AUX_SERVER_IP>:8000/rhcos/ignitions/<CLUSTER_NAME>/<FLAVOR>.ign --insecure-ignition\nExecStart=/usr/bin/systemctl --no-block reboot\nStandardOutput=kmsg+console\nStandardError=kmsg+console\n\n[Install]\nRequiredBy=default.target\n","enabled":true,"name":"install.service"}]}}
~~~

NOTE: The parameter "--fetch-retries" was also needed in my case because the secondary interface in the LACP bonding sometimes take more time to be ready so the ignition gathering fails in the first attempt.

Best Regards.

Comment 7 Adina Wolff 2022-07-14 16:28:47 UTC

Update on the behavior: This issue does not occur if the cluster is deployed on anvironment without a dhcp server, or if the 'dhcp' field is absent from networkConfig in install-config.yaml.
In those cases, the deployment fails at a later stage. Issue is described in bz2105973

Comment 13 Red Hat Bugzilla 2023-09-18 04:31:31 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days