Bug 2087771 - [tracker] NetworkManager 1.36.0 loses DHCP lease and doesn't try again
Summary: [tracker] NetworkManager 1.36.0 loses DHCP lease and doesn't try again
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.11.0
Assignee: Micah Abbott
QA Contact: Aashish Radhakrishnan
URL:
Whiteboard:
Depends On: 2090280
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-18 13:26 UTC by Michael Nguyen
Modified: 2022-08-10 11:13 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:12:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:13:11 UTC

Description Michael Nguyen 2022-05-18 13:26:27 UTC
This bug was initially created as a copy of Bug #2077605

I am copying this bug because: 



In OpenShift, we rebased on RHEL 8.6 and saw failures to install with ovn-kubernetes.  Oddly, this only happens on Azure some of the time.

However,  what appears to happen after restarting NetworkManager very rapidly after getting a lease, we DHCP again but the interface never gets the IP.  The logs say "ip4: set state fail (was pending, reason: check-ip-state)" -- what is check-ip-state?

I will attach a while journal, but the timeline I've pieced together looks like this:



**** 14:00:54 Start NM

Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg configure-ovs.sh[1742]: + systemctl restart NetworkManager

**** 14:00:54 We got a DHCP lease

Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg systemd[1]: NetworkManager-wait-online.service: Succeeded.
[...]
Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2153]: <debug> [1650549654.8062] dhcp4 (eth0): option ip_address           => '10.0.128.6'

**** 14:00:54 Network Manager shut down again....

Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg systemd[1]: Stopping Network Manager...
Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2153]: <info>  [1650549654.8070] caught SIGTERM, shutting down normally.

**** 14:00:54 Starting again....

Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg systemd[1]: Starting Network Manager...

**** 14:00:55 Try to DHCP again

Apr 21 14:00:55 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2753]: <debug> [1650549655.0340] dhcp4 (eth0): send REQUEST to 255.255.255.255

**** 14:00:55 Get a lease again

Apr 21 14:00:55 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2753]: <debug> [1650549655.0492] dhcp4 (eth0): option ip_address           => '10.0.128.6'

**** 14:00:55 But NetworkManager doesn't apply it to the interface

Apr 21 14:00:55 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2753]: <debug> [1650549655.0505] device[e1c2da34f52474c2] (eth0): ip:dhcp4: set state fail (was pending)
Apr 21 14:00:55 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2753]: <debug> [1650549655.0509] device[e1c2da34f52474c2] (eth0): ip4: set state fail (was pending, reason: check-ip-state)
[...]
Apr 21 14:01:15 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2753]: <debug> [1650549675.0392] device[e1c2da34f52474c2] (eth0): ip4: required-timeout: expired

Comment 1 Micah Abbott 2022-05-25 16:36:35 UTC
Updating Depends On to point to the 8.6.0.z BZ

Comment 3 Micah Abbott 2022-06-13 20:03:28 UTC
We are temporarily carrying `NetworkManager-1.36.0-5.el8_6` in RHCOS until the release of RHEL 8.6.0.1 which will include the fix for 2090280.

Moving this to MODIFIED as we have the fix included in our version of NM in RHCOS 4.11.

Comment 5 Aashish Radhakrishnan 2022-06-17 16:24:14 UTC
sh-4.4# rpm -qa NetworkManager
NetworkManager-1.36.0-5.el8_6.x86_64

sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:50772552e45a9e42287cc479bd5ecad826c136ae716f19623c963a9a122f84c0
              CustomOrigin: Managed by machine-config-operator
                   Version: 411.86.202206131434-0 (2022-06-13T14:37:45Z)

  e0746a6268898ad4761a04d8c531ee3a45250866d5c62bfeb8b0efc008ffb8e9
                   Version: 411.85.202205101201-0 (2022-05-10T12:05:02Z)

sh-4.4# rpm -qa kernel
kernel-4.18.0-372.9.1.el8.x86_64

Comment 7 errata-xmlrpc 2022-08-10 11:12:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.