RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1853750 - NM OVS: Fails to DHCP correctly on reboot
Summary: NM OVS: Fails to DHCP correctly on reboot
Keywords:
Status: CLOSED DUPLICATE of bug 1852106
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 8.0
Assignee: NetworkManager Development Team
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-03 19:26 UTC by Tim Rozet
Modified: 2020-07-29 13:27 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-29 13:27:16 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
journal logs (4.64 MB, application/gzip)
2020-07-03 19:27 UTC, Tim Rozet
no flags Details

Description Tim Rozet 2020-07-03 19:26:34 UTC
Description of problem:
It looks like there is a race condition when the server boots and OVS is configured with a port using DHCP (internal port br-ex). Sometimes the interface will get its IP and othertimes it will not, causing NetworkManager-wait-online to fail. The interface will never correctly get its DHCP even when retrying 5 minutes later. I'm not sure if this is is similar to:
https://bugzilla.redhat.com/show_bug.cgi?id=1852106

with perhaps a different trigger, where the interface is using the wrong MAC. I tried using the fix for 1852106, but it did not fix this issue. I have no access to the server when the problem happens, I can only reboot it and login afterwards. Will attach journal with trace level debugging.

Version-Release number of selected component (if applicable):
NetworkManager-libnm-1.22.8-4.el8.x86_64
NetworkManager-ovs-1.22.8-4.el8.x86_64
NetworkManager-1.22.8-4.el8.x86_64
NetworkManager-team-1.22.8-4.el8.x86_64
NetworkManager-tui-1.22.8-4.el8.x86_64

How reproducible:
50% of the time

Steps to Reproduce:
1. use the script located here: to configure nmcli https://github.com/openshift/machine-config-operator/pull/1860/files#diff-8197d7b27eb62b1ba2b9b44460cd3238
2. reboot the server

Comment 1 Tim Rozet 2020-07-03 19:27:59 UTC
Created attachment 1699884 [details]
journal logs

Comment 2 Tim Rozet 2020-07-03 19:31:54 UTC
I wonder if this is a race where br-ex comes up trying to dhcp before its dependent interface ens3 is link ready:
Jul 03 17:06:47 localhost NetworkManager[1360]: <info>  [1593796007.2618] device (ens3): Activation: connection 'ovs-port-phys0' enslaved, continuing activation
Jul 03 17:06:47 localhost NetworkManager[1360]: <info>  [1593796007.2620] device (ens3): state change: ip-config -> secondaries (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Jul 03 17:06:47 localhost NetworkManager[1360]: <info>  [1593796007.2624] device (br-ex): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Jul 03 17:06:47 localhost NetworkManager[1360]: <info>  [1593796007.2627] device (br-ex): Activation: connection 'ovs-port-br-ex' enslaved, continuing activation
Jul 03 17:06:47 localhost NetworkManager[1360]: <info>  [1593796007.2628] device (br-ex): state change: ip-config -> secondaries (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Jul 03 17:06:47 localhost NetworkManager[1360]: <info>  [1593796007.2631] device (br-ex): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Jul 03 17:06:47 localhost NetworkManager[1360]: <info>  [1593796007.2645] policy: set-hostname: set hostname to 'localhost.localdomain' (no default device)
Jul 03 17:06:47 localhost NetworkManager[1360]: <info>  [1593796007.2649] device (br-ex): Activation: successful, device activated.
Jul 03 17:06:47 localhost.localdomain systemd-hostnamed[1365]: Changed host name to 'localhost.localdomain'
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.2660] device (ens3): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.2680] device (ens3): Activation: connection 'ovs-if-phys0' enslaved, continuing activation
Jul 03 17:06:47 localhost.localdomain chronyd[1180]: Source 169.254.169.123 offline
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3123] device (ens3): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3141] device (ens3): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3151] device (ens3): Activation: successful, device activated.
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3157] device (br-ex): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3165] device (br-ex): Activation: connection 'ovs-if-br-ex' enslaved, continuing activation
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3167] device (br-ex): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Jul 03 17:06:47 localhost.localdomain kernel: ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3185] device (br-ex): Activation: successful, device activated.
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3219] device (ens3): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3222] device (ens3): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3236] device (ens3): Activation: successful, device activated.
Jul 03 17:06:47 localhost.localdomain kernel: device br-ex entered promiscuous mode
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3262] device (br-ex): carrier: link connected
Jul 03 17:06:47 localhost.localdomain NetworkManager[1360]: <info>  [1593796007.3274] dhcp4 (br-ex): activation: beginning transaction (timeout in 45 seconds)
Jul 03 17:06:47 localhost.localdomain systemd-udevd[1416]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jul 03 17:06:48 localhost.localdomain NetworkManager[1360]: <info>  [1593796008.2592] device (ens3): carrier: link connected
Jul 03 17:06:57 localhost.localdomain systemd[1]: NetworkManager-dispatcher.service: Consumed 116ms CPU time
Jul 03 17:07:16 localhost.localdomain systemd[1]: NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Jul 03 17:07:16 localhost.localdomain systemd[1]: NetworkManager-wait-online.service: Failed with result 'exit-code'.
Jul 03 17:07:16 localhost.localdomain systemd[1]: Failed to start Network Manager Wait Online.
Jul 03 17:07:16 localhost.localdomain systemd[1]: Dependency failed for Configures OVS with proper host networking configuration.
Jul 03 17:07:16 localhost.localdomain systemd[1]: ovs-configuration.service: Job ovs-configuration.service/start failed with result 'dependency'.
Jul 03 17:07:16 localhost.localdomain systemd[1]: NetworkManager-wait-online.service: Consumed 62ms CPU time


Another example with Trace logs:

Jul 03 18:53:20 localhost.localdomain NetworkManager[1401]: <debug> [1593802400.8029] ndisc[0x561cb1217860,"br-ex"]: router solicitation sent
Jul 03 18:53:20 localhost.localdomain NetworkManager[1401]: <debug> [1593802400.8030] ndisc[0x561cb1217860,"br-ex"]: did not receive a router advertisement after 3 solicitations.
Jul 03 18:53:20 localhost.localdomain NetworkManager[1401]: <debug> [1593802400.8030] ndisc-lndp[0x561cb1217860,"br-ex"]: processing libndp events
Jul 03 18:53:20 localhost.localdomain systemd[1]: NetworkManager-dispatcher.service: Consumed 134ms CPU time
Jul 03 18:53:28 localhost.localdomain NetworkManager[1401]: <debug> [1593802408.4209] dhcp4 (br-ex): send DISCOVER to 255.255.255.255
Jul 03 18:53:39 localhost.localdomain systemd[1]: NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Jul 03 18:53:39 localhost.localdomain systemd[1]: NetworkManager-wait-online.service: Failed with result 'exit-code'.
Jul 03 18:53:39 localhost.localdomain systemd[1]: Failed to start Network Manager Wait Online

Comment 3 Tim Rozet 2020-07-29 13:27:16 UTC
The fix for 1852106 fixes the issue if the OVS interface is configured with the cloned mac.

*** This bug has been marked as a duplicate of bug 1852106 ***


Note You need to log in before you can comment on or make changes to this bug.