Bug 2073754
Summary: | OCP 4-10 deployment IPv6 Address not included in node InternalIP list. | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Greg Kopels <gkopels> |
Component: | oc | Assignee: | Nobody <nobody> |
oc sub component: | oc | QA Contact: | zhou ying <yinzhou> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | bgalvani, bzvonar, dornelas, elevin, grajaiya, jligon, keyoung, mfojtik, miabbott, mrussell, nobody, nstielau, smilner, spresti, thaller |
Version: | 4.10 | Keywords: | Reopened, TestBlocker |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-02-10 16:14:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Greg Kopels
2022-04-10 07:56:03 UTC
Hello, would it be possible to get the full journal of the broken node that includes the address assignment? @thaller @bgalvani wanted to reach out to both of you for your expertise on the Network Manager. yes, it's probably related to the linked bugs. I have problems to understand the log from comment 3. Would it be possible to collect complete `level=TRACE` logs of NetworkManager? See https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/main/contrib/fedora/rpm/NetworkManager.conf#L27 for hints about logging. Also, the logfile only spans a few seconds. Does this show the relevant part? Consider attaching the entire log (of the past minutes), possibly with additional hintss, like what IP addresses would you expect and on which interface? Hi I will need to run a redeploy of the cluster. I will run it at the end of the day and update tomorrow. Hi I will need to run a redeploy of the cluster. I will run it at the end of the day and update tomorrow. Added full journals for both nodes @thaller Reporter has provided requested journals, etc. Could you have another look and see if there is enough data collected to perform additional triage? Hi, Btw, the attached files from comment 14 and 15 are not at `level=TRACE`. That might be useful... But it am confused. Could you give some guidance as to what is happening? What does it mean that the "InternalIP" ipv6 address is missing? Should this reflect the actually configured addresses on the interface? What does `ip addr` say at this point? In the non-working log, we see towards the end [1652271045.8020] dhcp6 (br-ex): state changed unknown -> bound, address=2620:52:0:2e38::114 so it would seem that this interface should be up with the expected(?) IPv6 address. Hi, For a reminder. We have a hybrid cluster with two BM workers. The br-ex main interfaces are configured with dnsmasq. Both workers receive IPv4 and IPv6 addresses. However when I run the oc describe node on the workers only one of the workers had both IPv4 and IPv6 addresses as InternalIP. Worker0 Addresses: InternalIP: 10.46.56.13 InternalIP: 2620:52:0:2e38::113 Hostname: helix13.lab.eng.tlv2.redhat.com Worker1 Worker Node 2: Addresses: InternalIP: 10.46.56.14 Hostname: helix14.lab.eng.tlv2.redhat.com Worker1 had an IPv6 address 2620:52:0:2e38::114 on the br-ex interface. Not sure I answered your question. Feel free to ping me on Slack Greg Could you collect a complete `level=TRACE` log that shows the boot? Otherwise, detailed information about IP addresses is not logged, and it cannot be seen why an IP address might be missing. Debug logging can be enabled by setting `rd.debug` on the kernel command line and booting. Is there a difficulty reproducing the issue? Hi as soon as I have a free cluster I will rerun the deployment. Can you send me a doc on how to correctly run `level=TRACE` ? Thanks (In reply to Greg Kopels from comment #24) > Hi as soon as I have a free cluster I will rerun the deployment. Can you > send me a doc on how to correctly run `level=TRACE` ? > Thanks This is NM in initrd, is that right? Then pass `rd.debug` on the kernel command line. That is documented in `man dracut.cmdline`. Alternatively, how `level=TRACE` works is documented in `man NetworkManager.conf` and (more to the point) see the example at https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/main/contrib/fedora/rpm/NetworkManager.conf#L27 . That is mainly useful if you enable debug logging in real-root. Hi in attempting to deploy a dual stack cluster with OCP 4.11 we are hitting a new bug blocking us from further investigation of this bz. https://bugzilla.redhat.com/show_bug.cgi?id=2102158 sorry please ignore the above comment I made incorrect reference to 4.11 bug. We currently dont have a free cluster to deploy 4.10 dualstack. I believe it will be free already tomorrow or my Thursday. And then I will supply you with the trace logs. Thanks I will have the a cluster today to start deploying a 4.10 dualstack. I will reach out to Thomas during deployment. I will have the a cluster today to start deploying a 4.10 dualstack. I will reach out to Thomas during deployment. You're talking about the output of an `oc node ...` command here, but what are the IP addresses on the nodes? Can you give us the output of `ip a`? Hi I am still being blocked from deploying a dualstack cluster from bz https://bugzilla.redhat.com/show_bug.cgi?id=2102158 Sure. However we still need some info here to make progress so I'm keeping the NEEDINFO. We are unable to make progress on this bug without the requested information, so the bug is now being closed. If the problem persists, please provide the requested information and reopen the bug. (In reply to Timothée Ravier from comment #30) > You're talking about the output of an `oc node ...` command here, but what > are the IP addresses on the nodes? Can you give us the output of `ip a`? PROBLEMATIC NODE: 19: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 34:48:ed:f3:88:c4 brd ff:ff:ff:ff:ff:ff inet 10.46.56.13/24 brd 10.46.56.255 scope global dynamic noprefixroute br-ex valid_lft 2689sec preferred_lft 2689sec inet 10.46.56.72/32 scope global br-ex valid_lft forever preferred_lft forever inet6 2620:52:0:2e38::113/128 scope global dynamic noprefixroute valid_lft 2459sec preferred_lft 2459sec inet6 fe80::3648:edff:fef3:88c4/64 scope link noprefixroute valid_lft forever preferred_lft forever ------------------ but(`oc node ...`): status: addresses: - address: 10.46.56.13 type: InternalIP - address: helix13.lab.eng.tlv2.redhat.com type: Hostname ==================================================== ANOTHER NODE: 19: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 34:48:ed:f3:e2:2c brd ff:ff:ff:ff:ff:ff inet 10.46.56.14/24 brd 10.46.56.255 scope global dynamic noprefixroute br-ex valid_lft 2135sec preferred_lft 2135sec inet6 2620:52:0:2e38::114/128 scope global dynamic noprefixroute valid_lft 2233sec preferred_lft 2233sec inet6 fe80::3648:edff:fef3:e22c/64 scope link noprefixroute valid_lft forever preferred_lft forever ----------------- `oc node ...`: status: addresses: - address: 10.46.56.14 type: InternalIP - address: 2620:52:0:2e38::114 type: InternalIP - address: helix14.lab.eng.tlv2.redhat.com type: Hostname So if I understand correctly, this is a problem with the output of `oc`, not the IP address set on the node itself. Redirecting to the `oc` team. I am rerunning the test with latest 4.10 OCP I am rerunning the test with latest 4.10 OCP I am rerunning the test with latest 4.10 OCP I am rerunning the test with latest 4.10 OCP OCP 4.10.47 Still the same issue: Deployed a dual stack cluster Worker 0 oc describe node helix13.lab.eng.tlv2.redhat.com Annotations: k8s.ovn.org/host-addresses: ["10.46.56.13","2620:52:0:2e38::113"] * But internal IP address shows only IPv4 Addresses: InternalIP: 10.46.56.13 Hostname: helix13.lab.eng.tlv2.redhat.com Worker 1 oc describe node helix14.lab.eng.tlv2.redhat.com Annotations: k8s.ovn.org/host-addresses: ["10.46.56.14","10.46.56.72","2620:52:0:2e38::114"] Addresses: InternalIP: 10.46.56.14 Hostname: helix14.lab.eng.tlv2.redhat.com |