Bug 1627820

Summary: dracut configured bridge interface does not stay up
Product: Red Hat Enterprise Linux 8 Reporter: Orion Poplawski <orion>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: high    
Version: 8.1CC: atragler, bgalvani, fgiudici, fpokryvk, jmaxwell, lrintel, rkhan, sukulkar, thaller, tpelka, vbenes
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.1   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: NetworkManager-1.25.1-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 01:48:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1626348, 1715493    
Bug Blocks: 1701002, 1955571    

Description Orion Poplawski 2018-09-11 14:59:32 UTC
Description of problem:

I'm using clevis to decrypt the root volume during boot, so I have configure the bridge via the command line:

bridge=br0:eth0

This creates:

# cat ifcfg-eth0
# Generated by dracut initrd
NAME="eth0"
TYPE=Ethernet
ONBOOT=yes
NETBOOT=yes
BRIDGE="br0"
UUID="14de37bb-5ac7-466b-b69c-aa7a461d77a9"
HWADDR="00:22:19:34:72:40"
# cat ifcfg-br0
# Generated by dracut initrd
NAME="br0"
DEVICE="br0"
ONBOOT=yes
NETBOOT=yes
UUID="ee7154ee-7bbe-4292-965a-42c0c38c4c94"
IPV6INIT=yes
BOOTPROTO=dhcp
TYPE=Bridge
NAME="br0"

The network comes up just fine:

Sep 10 07:10:40 barry.cora.nwra.com dracut-initqueue[335]: dhcp: PREINIT br0 up
Sep 10 07:10:40 barry.cora.nwra.com dhclient[518]: DHCPDISCOVER on br0 to 255.255.255.255 port 67 interval 5 (xid=0x5c3313fa)
Sep 10 07:10:40 barry.cora.nwra.com dhclient[518]: DHCPREQUEST on br0 to 255.255.255.255 port 67 (xid=0x5c3313fa)
Sep 10 07:10:40 barry.cora.nwra.com dhclient[518]: DHCPOFFER from 10.10.10.2
Sep 10 07:10:40 barry.cora.nwra.com dhclient[518]: DHCPACK from 10.10.10.2 (xid=0x5c3313fa)
Sep 10 07:10:40 barry.cora.nwra.com dracut-initqueue[335]: dhcp: BOND setting br0
Sep 10 07:10:42 barry.cora.nwra.com dhclient[518]: bound to 10.10.20.7 -- renewal in 27368 seconds.

But eventually gets shut down:

Sep 11 01:10:41 barry.cora.nwra.com avahi-daemon[1193]: Withdrawing address record for 10.10.20.7 on br0.
Sep 11 01:10:41 barry.cora.nwra.com avahi-daemon[1193]: Leaving mDNS multicast group on interface br0.IPv4 with address 10.10.20.7.

Also strange:
Sep 10 10:43:49 barry.cora.nwra.com avahi-daemon[1193]: Registering new address record for 10.11.0.1 on br0.IPv4.
Sep 11 01:10:41 barry.cora.nwra.com avahi-daemon[1193]: Joining mDNS multicast group on interface br0.IPv4 with address 10.11.0.1.
Sep 11 08:37:06 barry.cora.nwra.com avahi-daemon[1193]: Withdrawing address record for 10.11.0.1 on br0.
Sep 11 08:37:06 barry.cora.nwra.com avahi-daemon[1193]: Leaving mDNS multicast group on interface br0.IPv4 with address 10.11.0.1.

No idea where 10.11.0.1 is coming from.

NetworkManager seems to recognize br0 at boot:
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3026] device (br0): carrier: link connected
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3036] manager: (br0): new Bridge device (/org/freedesktop/NetworkManager/Devices/2)
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3089] ifcfg-rh: add connection in-memory (f1815986-8a35-4b00-8fe7-103e5dfab869,"br0")
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3105] device (br0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3113] device (br0): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3131] device (br0): Activation: starting connection 'br0' (f1815986-8a35-4b00-8fe7-103e5dfab869)
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3143] device (eth0): carrier: link connected
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3159] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/3)
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3188] ifcfg-rh: add connection in-memory (104263af-4b1e-4503-9a55-a529055fd24b,"eth0")
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3265] device (eth0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3277] device (eth0): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3294] device (eth0): Activation: starting connection 'eth0' (104263af-4b1e-4503-9a55-a529055fd24b)
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3668] modem-manager: ModemManager available
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3708] device (br0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3766] device (eth0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3789] device (br0): state change: prepare -> config (reason 'none', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3794] device (eth0): state change: prepare -> config (reason 'none', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3807] device (br0): state change: config -> ip-config (reason 'none', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3815] device (br0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3821] device (eth0): state change: config -> ip-config (reason 'none', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3823] device (br0): bridge port eth0 was attached
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3823] device (eth0): Activation: connection 'eth0' enslaved, continuing activation
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3824] device (eth0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3844] device (br0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3849] device (eth0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3857] device (br0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'external')
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.3861] manager: NetworkManager state is now CONNECTED_LOCAL
Sep 10 07:10:50 barry.cora.nwra.com NetworkManager[1390]: <info>  [1536585050.4516] device (br0): Activation: successful, device activated.

but it doesn't appear to keep it going.

Version-Release number of selected component (if applicable):
NetworkManager-1.10.2-16.el7_5.x86_64
dracut-033-535.el7_5.1.x86_64

How reproducible:
Every boot.

Comment 2 Orion Poplawski 2018-09-11 15:00:58 UTC
It also seems like this problem started when I shifted to a bridged interface.  When I just had eth0, the network stayed up fine.

Comment 3 Orion Poplawski 2018-09-19 14:32:05 UTC
Another interesting difference/symptom - in /etc/resolv.conf it only adds the domain to the search list, not the extra search lists given by the DHCP server.  So I get in /etc/resolv.conf:

# Generated by NetworkManager
search cora.nwra.com

instead of:

# Generated by NetworkManager
search cora.nwra.com nwra.com ad.nwra.com

which is what I get normally when it is brought by NM.

Comment 5 sushil kulkarni 2019-07-30 15:19:58 UTC
This depends on dracut patches (https://bugzilla.redhat.com/show_bug.cgi?id=1715493) that are not in yet.. Moving this to 8.2. Taking it out of the RPL.

-Sushil

Comment 8 Lubomir Rintel 2019-09-02 15:12:43 UTC
Going to be solved with RHEL 8.2

Comment 9 sushil kulkarni 2019-10-01 15:20:37 UTC
Hello,

Just FYI..We do not plan to fix this issue in RHEL 7. Fixing this in RHEL 7 will be a large effort and undesirable change in behavior.

Thanks!
Sushil

Comment 10 Beniamino Galvani 2019-12-09 16:15:28 UTC
I tried to boot RHEL 7.7 and RHEL 8.1 systems with the 'bridge=br0:eth0 rd.neednet=1' kernel commandline and in both cases NetworkManager picked up the connections generated by dracut in /etc/sysconfig/network-scripts/ifcfg-{br0,eth0}. After the boot the DHCP lease was renewed upon expiry and resolv.conf contained the search domain pushed by DHCP. So, I'm unable to reproduce the problem reported in comment 0.

I also tried with static addresses on RHEL 8.1 following the procedure to hardcode the address into the initrd:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/configuring-automated-unlocking-of-encrypted-volumes-using-policy-based-decryption_system-design-guide

The system come up with the configured addresses and NM correctly took over that configuration. I didn't see any loss of connectivity (which would be a bit surprising because the addresses are static and there is no action needed by NM to maintain network connectivity).

Akhil, can you please set the logging level of NM to 'trace', reproduce the problem and attach journal logs of the boot (journalctl -b)? Please also disable systemd-journald ratelimiting by setting RateLimitIntervalSec=0 in /etc/systemd/journald.conf. Thanks.

Comment 11 Orion Poplawski 2019-12-09 21:54:48 UTC
FWIW - it seems that if I just boot my EL7.7 system but do not log into the desktop session (KDE in my case) - the network stays up.  I'll try to reproduce on EL7.7 with the trace debugging on overnight.  I don't yet have a EL8 desktop machine to test with.

Comment 13 Orion Poplawski 2019-12-10 17:29:47 UTC
Well, the connection stayed up overnight, so perhaps this has been resolved.

Comment 16 sushil kulkarni 2020-02-06 15:13:31 UTC
Taking this off of the RPL. We will revisit in 8.3 once we hear back and if it needs any action from us.

-Sushil

Comment 17 Thomas Haller 2020-04-07 15:08:33 UTC
With comment 10, this might already be fixed on rhel-8.2 (with the legacy dracut network module).

In rhel-8.3, we will enable NetworkManager in initrd by default (bug 1626348), this backstory of this issue changes entirely (but it also should be fixed).


Moving bug to MODIFIED, so that we can add it to rhel-8.3 errata (and test it).

Comment 18 Thomas Haller 2020-04-08 07:31:29 UTC
Dropping from RPL-8.3. This is already fixed upstream, and thus will be automatically get in with the rebase in RHEL-8.3

Comment 23 errata-xmlrpc 2020-11-04 01:48:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4499