Bug 2037098 - nmstate managed interfaces fail to consistently bring up managed second/third static interfaces on nodes, dhcp remains enabled
Summary: nmstate managed interfaces fail to consistently bring up managed second/third...
Keywords:
Status: CLOSED DUPLICATE of bug 1970021
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Ben Nemec
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-04 23:00 UTC by Will Russell
Modified: 2022-03-24 19:01 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-07 15:53:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Will Russell 2022-01-04 23:00:07 UTC
Description of problem:


Version-Release number of selected component (if applicable):
OCP4.9.11
nmstate operator


How reproducible:
Most every time after a reboot

Steps to Reproduce:
1. Reboot Node
2. Observe primary DHCP interface (nncp managed) online, secondary/third interfaces (nncp managed -- static) flapping/failure to configure
3. nncp status --> available, nnce --> READY, interfaces down in nmcli.
4. toggle status in nncp for node to 'absent' then back to 'up': --> interfaces configure immediately, move to UP, nmcli reports connections online.

Actual results:
nncp is failing to consistently bring all connections up.


Expected results:

nncp should bring all connections up consistently.

Additional info:
- theories:

- could be race condition between NNCP and NetworkManager both trying to provision the interfaces?

observed in logs:

~~~
Dec 27 20:38:04  NetworkManager[1436]: <info>  [1640637484.8484] device (ens224): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Dec 27 20:38:04  NetworkManager[1436]: <warn>  [1640637484.8492] device (ens224): Activation: failed for connection 'Wired connection 2'
Dec 27 20:38:04  NetworkManager[1436]: <info>  [1640637484.8492] device (ens256): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Dec 27 20:38:04  NetworkManager[1436]: <warn>  [1640637484.8498] device (ens256): Activation: failed for connection 'Wired connection 3'
Dec 27 20:38:04  NetworkManager[1436]: <info>  [1640637484.8508] device (ens224): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
Dec 27 20:38:04  NetworkManager[1436]: <info>  [1640637484.8611] dhcp4 (ens224): canceled DHCP transaction
Dec 27 20:38:04  NetworkManager[1436]: <info>  [1640637484.8611] dhcp4 (ens224): state changed timeout -> done
Dec 27 20:38:04  NetworkManager[1436]: <info>  [1640637484.8618] device (ens256): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
Dec 27 20:38:04  NetworkManager[1436]: <info>  [1640637484.8731] dhcp4 (ens256): canceled DHCP transaction
Dec 27 20:38:04  NetworkManager[1436]: <info>  [1640637484.8731] dhcp4 (ens256): state changed timeout -> done
~~~

DHCP status for these interfaces on nodes that successfully deployed after reboot list as 'dhcp: false'. On the node that is failing to deploy, we see 'dhcp: true' and this flapping message listed above repeated in logs.


Expanded description in next comment + additional logs in linked case

Comment 2 Ben Nemec 2022-01-05 22:14:11 UTC
Based on the fact that this is happening after reboot, I would say it is almost certainly https://bugzilla.redhat.com/show_bug.cgi?id=1970021 . There is a workaround in https://bugzilla.redhat.com/show_bug.cgi?id=1970021#c7 that provides a simple way to verify it is the same problem. Can you give that a try?

Based on the number of problems this behavior has caused, we're discussing a backport for the 4.10 fix which should also fix this. The machine-config is safe to use too though and provides an immediate solution.

Comment 3 Will Russell 2022-01-05 22:52:30 UTC
Thanks Ben,

I'll have our customer give this a go, I've passed on the instructions and will report back with our results, but I have a good feeling about it!

Cheers,
~Will


Note You need to log in before you can comment on or make changes to this bug.