Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2087771

Summary: [tracker] NetworkManager 1.36.0 loses DHCP lease and doesn't try again
Product: OpenShift Container Platform Reporter: Michael Nguyen <mnguyen>
Component: RHCOSAssignee: Micah Abbott <miabbott>
Status: CLOSED ERRATA QA Contact: Aashish Radhakrishnan <aaradhak>
Severity: medium Docs Contact:
Priority: high    
Version: 4.11CC: dornelas, jligon, miabbott, mrussell, nstielau
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:12:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2090280    
Bug Blocks:    

Description Michael Nguyen 2022-05-18 13:26:27 UTC
This bug was initially created as a copy of Bug #2077605

I am copying this bug because: 



In OpenShift, we rebased on RHEL 8.6 and saw failures to install with ovn-kubernetes.  Oddly, this only happens on Azure some of the time.

However,  what appears to happen after restarting NetworkManager very rapidly after getting a lease, we DHCP again but the interface never gets the IP.  The logs say "ip4: set state fail (was pending, reason: check-ip-state)" -- what is check-ip-state?

I will attach a while journal, but the timeline I've pieced together looks like this:



**** 14:00:54 Start NM

Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg configure-ovs.sh[1742]: + systemctl restart NetworkManager

**** 14:00:54 We got a DHCP lease

Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg systemd[1]: NetworkManager-wait-online.service: Succeeded.
[...]
Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2153]: <debug> [1650549654.8062] dhcp4 (eth0): option ip_address           => '10.0.128.6'

**** 14:00:54 Network Manager shut down again....

Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg systemd[1]: Stopping Network Manager...
Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2153]: <info>  [1650549654.8070] caught SIGTERM, shutting down normally.

**** 14:00:54 Starting again....

Apr 21 14:00:54 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg systemd[1]: Starting Network Manager...

**** 14:00:55 Try to DHCP again

Apr 21 14:00:55 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2753]: <debug> [1650549655.0340] dhcp4 (eth0): send REQUEST to 255.255.255.255

**** 14:00:55 Get a lease again

Apr 21 14:00:55 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2753]: <debug> [1650549655.0492] dhcp4 (eth0): option ip_address           => '10.0.128.6'

**** 14:00:55 But NetworkManager doesn't apply it to the interface

Apr 21 14:00:55 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2753]: <debug> [1650549655.0505] device[e1c2da34f52474c2] (eth0): ip:dhcp4: set state fail (was pending)
Apr 21 14:00:55 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2753]: <debug> [1650549655.0509] device[e1c2da34f52474c2] (eth0): ip4: set state fail (was pending, reason: check-ip-state)
[...]
Apr 21 14:01:15 ci-ln-v6m988k-002ac-p4b5q-worker-centralus2-r4mkg NetworkManager[2753]: <debug> [1650549675.0392] device[e1c2da34f52474c2] (eth0): ip4: required-timeout: expired

Comment 1 Micah Abbott 2022-05-25 16:36:35 UTC
Updating Depends On to point to the 8.6.0.z BZ

Comment 3 Micah Abbott 2022-06-13 20:03:28 UTC
We are temporarily carrying `NetworkManager-1.36.0-5.el8_6` in RHCOS until the release of RHEL 8.6.0.1 which will include the fix for 2090280.

Moving this to MODIFIED as we have the fix included in our version of NM in RHCOS 4.11.

Comment 5 Aashish Radhakrishnan 2022-06-17 16:24:14 UTC
sh-4.4# rpm -qa NetworkManager
NetworkManager-1.36.0-5.el8_6.x86_64

sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:50772552e45a9e42287cc479bd5ecad826c136ae716f19623c963a9a122f84c0
              CustomOrigin: Managed by machine-config-operator
                   Version: 411.86.202206131434-0 (2022-06-13T14:37:45Z)

  e0746a6268898ad4761a04d8c531ee3a45250866d5c62bfeb8b0efc008ffb8e9
                   Version: 411.85.202205101201-0 (2022-05-10T12:05:02Z)

sh-4.4# rpm -qa kernel
kernel-4.18.0-372.9.1.el8.x86_64

Comment 7 errata-xmlrpc 2022-08-10 11:12:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069