Bug 1162636 - bridged interface not coming up after suspend/resume
Summary: bridged interface not coming up after suspend/resume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 21
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Dan Williams
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-11 11:57 UTC by Jeff Layton
Modified: 2015-05-11 19:04 UTC (History)
8 users (show)

Fixed In Version: NetworkManager-1.0.2-1.fc22
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-05-11 19:04:28 UTC
Type: Bug


Attachments (Terms of Use)
debug log from a suspend/resume cycle affected by the bug (91.27 KB, text/plain)
2015-01-09 01:36 UTC, Adam Williamson
no flags Details
Another debug log for suspend/resume (127.59 KB, text/plain)
2015-01-09 12:56 UTC, Soeren Grunewald
no flags Details
Debug log for suspend/resume with debug-patch (0.9.10.1-1.1.git20150105) (57.90 KB, text/plain)
2015-01-09 16:16 UTC, Soeren Grunewald
no flags Details


Links
System ID Priority Status Summary Last Updated
GNOME Bugzilla 741687 Normal RESOLVED Bridge slave connection not nrought up on resume / wake from sleep 2020-03-25 19:55:06 UTC
GNOME Bugzilla 742675 Normal RESOLVED [PATCH] core: ensure manager state is updated on resume with connectivity checking enabled 2020-03-25 19:55:06 UTC

Description Jeff Layton 2014-11-11 11:57:56 UTC
On my main workstation, I have a the ethernet interface sitting inside a bridge, primarily so that my KVM guests can use it:

$ nmcli d
DEVICE  TYPE      STATE      CONNECTION  
br0     bridge    connected  br0         
eno1    ethernet  connected  br0 slave 1 
lo      loopback  unmanaged  --          

...whenever I suspend the box and later resume it, the interface always ends up staying down after the resume completes. Here are the NetworkManager logs from a suspend/resume cycle:

$ journalctl -t NetworkManager --since 6:45
-- Logs begin at Sat 2014-11-01 06:50:43 EDT, end at Tue 2014-11-11 06:49:46 EST. --
Nov 11 06:48:46 tlielax.poochiereds.net NetworkManager[1255]: <info> sleep requested (sleeping: no  enabled: yes)
Nov 11 06:48:46 tlielax.poochiereds.net NetworkManager[1255]: <info> sleeping...
Nov 11 06:48:46 tlielax.poochiereds.net NetworkManager[1255]: <info> NetworkManager state is now ASLEEP
Nov 11 06:48:54 tlielax.poochiereds.net NetworkManager[1255]: <info> (eno1): link disconnected (deferring action for 4 seconds)
Nov 11 06:48:54 tlielax.poochiereds.net NetworkManager[1255]: <info> (br0): link disconnected (deferring action for 4 seconds)
Nov 11 06:48:54 tlielax.poochiereds.net NetworkManager[1255]: <info> wake requested (sleeping: yes  enabled: yes)
Nov 11 06:48:54 tlielax.poochiereds.net NetworkManager[1255]: <info> waking up...
Nov 11 06:48:54 tlielax.poochiereds.net NetworkManager[1255]: <info> (eno1): device state change: activated -> unmanaged (reason 'sleeping') [100 10 37]
Nov 11 06:48:54 tlielax.poochiereds.net NetworkManager[1255]: <info> (eno1): deactivating device (reason 'sleeping') [37]
Nov 11 06:48:54 tlielax.poochiereds.net NetworkManager[1255]: <info> (eno1): enslaved to br0
Nov 11 06:49:05 tlielax.poochiereds.net NetworkManager[1255]: <info> (br0): detached bridge port eno1
Nov 11 06:49:05 tlielax.poochiereds.net NetworkManager[1255]: <info> (eno1): released from master br0
Nov 11 06:49:05 tlielax.poochiereds.net NetworkManager[1255]: <info> (eno1): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2]
Nov 11 06:49:06 tlielax.poochiereds.net NetworkManager[1255]: <info> (eno1): preparing device
Nov 11 06:49:06 tlielax.poochiereds.net NetworkManager[1255]: <info> (eno1): link disconnected (calling deferred action)
Nov 11 06:49:06 tlielax.poochiereds.net NetworkManager[1255]: <info> (br0): link disconnected (calling deferred action)
Nov 11 06:49:09 tlielax.poochiereds.net NetworkManager[1255]: <info> (eno1): link connected
Nov 11 06:49:09 tlielax.poochiereds.net NetworkManager[1255]: <info> (eno1): device state change: unavailable -> disconnected (reason 'carrier-changed') [20 30 40]
Nov 11 06:49:46 tlielax.poochiereds.net NetworkManager[1255]: <info> Connectivity check for uri 'https://fedoraproject.org/static/hotspot.txt' failed with 'Error resolving 'fedoraproject.org': Name or service not known'.
Nov 11 06:49:46 tlielax.poochiereds.net NetworkManager[1255]: <info> Connectivity check for uri 'https://fedoraproject.org/static/hotspot.txt' failed with 'Error resolving 'fedoraproject.org': Name or service not known'.

...and after that:

$ nmcli d
DEVICE  TYPE      STATE         CONNECTION 
br0     bridge    connected     br0        
eno1    ethernet  disconnected  --         
lo      loopback  unmanaged     --         

Do I need to configure something differently so that this comes up automatically after resume?

Comment 1 Jeff Layton 2014-11-11 12:02:45 UTC
Oh, and after resume if I run:

$ nmcli d connect eno1

...then the interface comes back. So I think the problem is that the slaved physical interface isn't coming back properly on resume?

Comment 2 ernest.beinrohr 2014-12-16 20:33:31 UTC
Same here, but on Fedora 20/64b. All connections are NM, even the bridge.

Comment 3 Luca Villa 2015-01-03 23:52:54 UTC
Same issue seen here after upgrading to F21 (NetworkManager-0.9.10.0-14.git20140704.fc21.x86_64).
This issue wasn't existing in F20 (NetworkManager-0.9.9.0-46.git20131003.fc20.x86_64).

My bridge master conf is:

$nmcli c show f9a8d75c-c4d9-459b-82df-4257c76d034f
connection.id:                          Bridge connection 1
connection.uuid:                        f9a8d75c-c4d9-459b-82df-4257c76d034f
connection.interface-name:              br0
connection.type:                        bridge
connection.autoconnect:                 yes
connection.timestamp:                   1420328674
connection.read-only:                   no
connection.permissions:                 
connection.zone:                        --
connection.master:                      --
connection.slave-type:                  --
connection.secondaries:                 
connection.gateway-ping-timeout:        0
...
bridge.interface-name:                  br0
bridge.mac-address:                     --
bridge.stp:                             no
bridge.priority:                        128
bridge.forward-delay:                   15
bridge.hello-time:                      2
bridge.max-age:                         20
bridge.ageing-time:                     300
...

It's slave ethernet device is:

$ nmcli c show a1dd88c4-31fa-4d19-bc58-3fdb3bf5229f
connection.id:                          br0 slave 1
connection.uuid:                        a1dd88c4-31fa-4d19-bc58-3fdb3bf5229f
connection.interface-name:              em1
connection.type:                        802-3-ethernet
connection.autoconnect:                 yes
connection.timestamp:                   1420328674
connection.read-only:                   no
connection.permissions:                 
connection.zone:                        --
connection.master:                      f9a8d75c-c4d9-459b-82df-4257c76d034f
connection.slave-type:                  bridge
connection.secondaries:                 
connection.gateway-ping-timeout:        0
802-3-ethernet.port:                    --
802-3-ethernet.speed:                   0
802-3-ethernet.duplex:                  --
802-3-ethernet.auto-negotiate:          yes
802-3-ethernet.mac-address:             54:EE:75:xx:yy:zz
802-3-ethernet.cloned-mac-address:      --
802-3-ethernet.mac-address-blacklist:   
802-3-ethernet.mtu:                     auto
...

Comment 4 Soeren Grunewald 2015-01-08 07:43:51 UTC
Same here using NetworkManager-0.9.10.1-1.git20150105.fc21.x86_64

$ brctl show 
bridge name	bridge id		STP enabled	interfaces
bridge0		8000.00224d99c72f	no		em1

$ nmcli c show f424c59f-e02c-4766-a3be-af9663a3f05e
connection.id:                          Bridge bridge0
connection.uuid:                        f424c59f-e02c-4766-a3be-af9663a3f05e
connection.interface-name:              bridge0
connection.type:                        bridge
connection.autoconnect:                 yes
connection.timestamp:                   1420702851
connection.read-only:                   no
connection.permissions:                 
connection.zone:                        --
connection.master:                      --
connection.slave-type:                  --
connection.secondaries:                 
connection.gateway-ping-timeout:        0
ipv4.method:                            auto
ipv4.dns:                               
ipv4.dns-search:                        
ipv4.addresses:                         
ipv4.routes:                            
ipv4.ignore-auto-routes:                no
ipv4.ignore-auto-dns:                   no
ipv4.dhcp-client-id:                    --
ipv4.dhcp-send-hostname:                yes
ipv4.dhcp-hostname:                     --
ipv4.never-default:                     no
ipv4.may-fail:                          yes
ipv6.method:                            ignore
ipv6.dns:                               
ipv6.dns-search:                        
ipv6.addresses:                         
ipv6.routes:                            
ipv6.ignore-auto-routes:                no
ipv6.ignore-auto-dns:                   no
ipv6.never-default:                     no
ipv6.may-fail:                          yes
ipv6.ip6-privacy:                       -1 (unknown)
ipv6.dhcp-hostname:                     --
bridge.interface-name:                  bridge0
bridge.mac-address:                     --
bridge.stp:                             no
bridge.priority:                        32768
bridge.forward-delay:                   15
bridge.hello-time:                      2
bridge.max-age:                         20
bridge.ageing-time:                     300
...

$ nmcli c show c42c7c12-b7a0-4642-9d7f-0c2203d3f491
connection.id:                          em1
connection.uuid:                        c42c7c12-b7a0-4642-9d7f-0c2203d3f491
connection.interface-name:              --
connection.type:                        802-3-ethernet
connection.autoconnect:                 yes
connection.timestamp:                   1420702851
connection.read-only:                   no
connection.permissions:                 
connection.zone:                        --
connection.master:                      bridge0
connection.slave-type:                  bridge
connection.secondaries:                 
connection.gateway-ping-timeout:        0
802-3-ethernet.port:                    --
802-3-ethernet.speed:                   0
802-3-ethernet.duplex:                  --
802-3-ethernet.auto-negotiate:          yes
802-3-ethernet.mac-address:             00:22:4D:99:C7:2F
802-3-ethernet.cloned-mac-address:      --
802-3-ethernet.mac-address-blacklist:   
802-3-ethernet.mtu:                     auto
802-3-ethernet.s390-subchannels:        
802-3-ethernet.s390-nettype:            --
802-3-ethernet.s390-options:

Comment 5 Soeren Grunewald 2015-01-08 07:45:19 UTC
$ sudo systemctl restart NetworkManager.service

Solves the issue after resume

Comment 6 Adam Williamson 2015-01-08 19:08:12 UTC
I'm seeing this too and have been for some time, I'd reported it privately to dcbw but he's been too busy to fully look into it yet.

Comment 7 Dan Williams 2015-01-08 22:49:47 UTC
Thanks for testing the latest version Soeren.  I tested both 1.0 and 0.9.10 and couldn't easily reproduce, so there's something else going on here.

Could somebody with this issue run:

sudo nmcli g log level debug

and then try to reproduce?  Then grab 'journalctl -b -u NetworkManager' and lets see what's going on.  Adam sent me a log from a while back but it doesn't appear detailed enough to debug what's really going on.

To turn off debugging, "sudo nmcli g log level info".

Comment 8 Dan Williams 2015-01-08 23:39:27 UTC
Alternatively, try this scratch build with extra debugging information:

http://koji.fedoraproject.org/koji/taskinfo?taskID=8564355

and grab the output of 'journalctl -b -u NetworkManager' when the problem occurs.  Thanks!

Comment 9 Adam Williamson 2015-01-09 01:36:20 UTC
Created attachment 977982 [details]
debug log from a suspend/resume cycle affected by the bug

I went for the "sudo nmcli g log level debug" option as I'm on Rawhide. I ran that command, then suspended at 17:29, resumed at 17:31, and did 'sudo nmcli con up "System em1"' - which brings the connection up, for me - at 17:33. Then I turned debugging off again. Here's the log extract covering that time.

Comment 10 Dan Williams 2015-01-09 04:35:50 UTC
Scratch build archived here in case it gets garbage collected:

http://people.redhat.com/dcbw/NetworkManager/rh1162636/

Comment 11 Soeren Grunewald 2015-01-09 12:56:59 UTC
Created attachment 978136 [details]
Another debug log for suspend/resume

1) Enable debug
2) suspend
3) resume
4) use "nmcli c up "Bridge brigde0""
5) use "nmcli c up em1"

Comment 12 Dan Williams 2015-01-09 15:24:00 UTC
It looks like the debug logs won't have all the information I need.  Would anyone mind installing the RPMs linked above (which are the latest version in F21 + the debug patch) and reproduce the issue?

Comment 13 Soeren Grunewald 2015-01-09 16:16:56 UTC
Created attachment 978263 [details]
Debug log for suspend/resume with debug-patch (0.9.10.1-1.1.git20150105)

a) Install packages
   $ sudo yum localinstall NetworkManager-0.9.10.1-1.1.git20150105.fc21.x86_64.rpm  NetworkManager-config-connectivity-fedora-0.9.10.1-1.1.git20150105.fc21.x86_64.rpm NetworkManager-glib-0.9.10.1-1.1.git20150105.fc21.x86_64.rpm NetworkManager-wifi-0.9.10.1-1.1.git20150105.fc21.x86_64.rpm
   $ rpm -qa | grep NetworkManager | sort
   NetworkManager-0.9.10.1-1.1.git20150105.fc21.x86_64
   NetworkManager-config-connectivity-fedora-0.9.10.1-1.1.git20150105.fc21.x86_64
   NetworkManager-glib-0.9.10.1-1.1.git20150105.fc21.x86_64
   NetworkManager-openvpn-0.9.9.0-3.git20140128.fc21.x86_64
   NetworkManager-openvpn-gnome-0.9.9.0-3.git20140128.fc21.x86_64
   NetworkManager-vpnc-0.9.9.0-6.git20140428.fc21.x86_64
   NetworkManager-vpnc-gnome-0.9.9.0-6.git20140428.fc21.x86_64
   NetworkManager-wifi-0.9.10.1-1.1.git20150105.fc21.x86_64

b) Restart NetworkManager
   $ sudo systemctl restart NetworkManager

c) Do the test
   1) sudo nmcli g log level debug
   2) suspend
   3) resume
   4) nmcli c up em1
   5) sudo nmcli g log level warn

Comment 14 Dan Williams 2015-01-09 17:31:26 UTC
Thanks, the issue is:

NetworkManager[21238]: <info>  (em1): device state change: unavailable -> disconnected (reason 'carrier-changed') [20 30 40]
NetworkManager[21238]: <info>  #### (em1): state 20 -> 30 (reason 40)
NetworkManager[21238]: <info>  #### (em1): processing DISCONNECTED state
NetworkManager[21238]: <debug> [1420819973.802712] [nm-policy.c:1179] reset_autoconnect_all(): Re-enabling autoconnect for all connections on em1
NetworkManager[21238]: <info>  #### (em1): scheduling activate check
NetworkManager[21238]: <info>  #### (em1): trying to schedule activation
NetworkManager[21238]: <info>  #### (em1):   manager STATE_ASLEEP

Comment 15 Dan Williams 2015-01-09 22:03:03 UTC
Koji build with candidate fix:

http://koji.fedoraproject.org/koji/taskinfo?taskID=8575860

Comment 16 Adam Williamson 2015-01-09 22:09:55 UTC
can you do a Rawhide build too? Thanks!

Comment 17 Dan Williams 2015-01-09 22:17:57 UTC
Also uploaded here:

http://people.redhat.com/dcbw/NetworkManager/rh1162636/

(release is -1.2)

Comment 18 Dan Williams 2015-01-09 22:21:11 UTC
Rawhide koji scratch build: http://koji.fedoraproject.org/koji/taskinfo?taskID=8575939

Comment 19 Adam Williamson 2015-01-09 22:35:18 UTC
first try with the rawhide scratch build works, shipit!

Comment 20 Luca Villa 2015-01-12 08:54:17 UTC
A quick test on F21 here seems to confirm the fix works. Thanks!

Comment 21 Adam Williamson (Fedora) 2015-01-12 16:07:06 UTC
It's a lot better, but I do seem to be running into a slightly different variant where I have to restart NetworkManager.service on resume to make the network come up, *sometimes* (maybe only after a long suspend, like overnight). I'll try and keep an eye on that and open a new bug.

Comment 22 Soeren Grunewald 2015-01-14 07:58:09 UTC
Works for me as well.

Comment 23 Jeff Layton 2015-01-14 12:34:09 UTC
Sorry for the delay on testing this, but I was travelling last week and didn't have access to the machine where I was seeing this. Yes, this package also seems to fix the problem for me. That said, I haven't tested it with a longer suspend cycle yet, so I can't confirm whether I've seen the same problem that Adam has.

Comment 24 Adam Williamson 2015-01-29 17:17:25 UTC
Dan: can you send the fix to Rawhide, at least, since it's now well-tested? -3 has since come out and superseded the scratch build...

Comment 25 Adam Williamson 2015-01-29 22:33:25 UTC
The fix seems to have been committed to NM upstream:

http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=a687d1f9e0f75b987f40335934b54aa748f6724b

I'm gonna go ahead and backport it to the Rawhide package at least, as it's annoying the pants off me. Dan, it'd be good if it could be backported to F21 too?

Comment 27 Adam Williamson 2015-01-29 22:42:03 UTC
oops, this is filed for f21, so I'll just leave it ASSIGNED for now.

Comment 28 Fedora Update System 2015-05-05 13:13:40 UTC
NetworkManager-1.0.2-1.fc22,network-manager-applet-1.0.2-1.fc22,NetworkManager-openconnect-1.0.2-1.fc22,NetworkManager-openvpn-1.0.2-1.fc22,NetworkManager-vpnc-1.0.2-1.fc22,NetworkManager-openswan-1.0.2-1.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/NetworkManager-1.0.2-1.fc22,network-manager-applet-1.0.2-1.fc22,NetworkManager-openconnect-1.0.2-1.fc22,NetworkManager-openvpn-1.0.2-1.fc22,NetworkManager-vpnc-1.0.2-1.fc22,NetworkManager-openswan-1.0.2-1.fc22

Comment 29 Lubomir Rintel 2015-05-06 09:22:22 UTC
That this is fixed in f21, the patch is present in NetworkManager-0.9.10.2-2.fc21

Comment 30 Fedora Update System 2015-05-10 23:50:38 UTC
Package NetworkManager-1.0.2-1.fc22, NetworkManager-openconnect-1.0.2-1.fc22, NetworkManager-vpnc-1.0.2-1.fc22, network-manager-applet-1.0.2-1.fc22, NetworkManager-openvpn-1.0.2-1.fc22, NetworkManager-openswan-1.0.2-1.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing NetworkManager-1.0.2-1.fc22 NetworkManager-openconnect-1.0.2-1.fc22 NetworkManager-vpnc-1.0.2-1.fc22 network-manager-applet-1.0.2-1.fc22 NetworkManager-openvpn-1.0.2-1.fc22 NetworkManager-openswan-1.0.2-1.fc22'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-7767/NetworkManager-1.0.2-1.fc22,network-manager-applet-1.0.2-1.fc22,NetworkManager-openconnect-1.0.2-1.fc22,NetworkManager-openvpn-1.0.2-1.fc22,NetworkManager-vpnc-1.0.2-1.fc22,NetworkManager-openswan-1.0.2-1.fc22
then log in and leave karma (feedback).

Comment 31 Fedora Update System 2015-05-11 19:04:28 UTC
NetworkManager-1.0.2-1.fc22, NetworkManager-openconnect-1.0.2-1.fc22, NetworkManager-vpnc-1.0.2-1.fc22, network-manager-applet-1.0.2-1.fc22, NetworkManager-openvpn-1.0.2-1.fc22, NetworkManager-openswan-1.0.2-1.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.