Bug 1242171 - Network not being fully re-establishes after reboot
Summary: Network not being fully re-establishes after reboot
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: openvswitch
Version: 22
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Panu Matilainen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-11 21:15 UTC by Carlos Guidugli
Modified: 2016-07-19 20:32 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-19 20:32:47 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Interfaces configuration (10.00 KB, application/x-tar)
2015-07-11 21:15 UTC, Carlos Guidugli
no flags Details
journalctl --full -a -b -1 (657.25 KB, text/plain)
2015-08-08 23:44 UTC, Carlos Guidugli
no flags Details
More troubleshoot steps (54.88 KB, text/plain)
2015-09-26 18:19 UTC, Carlos Guidugli
no flags Details

Description Carlos Guidugli 2015-07-11 21:15:24 UTC
Created attachment 1050918 [details]
Interfaces configuration

Description of problem:
When I reboot my desktop or Server (also running Fedora 22), the network is not 100% functional.
I noticed, on shutdown, messages like this:
=============Start==============
Jul 11 17:42:54 matrix.creuzo.net network[4012]: Shutting down interface bond0:  Failed to start openvswitch-nonetwork.service: Transaction is destructive.
Jul 11 17:42:54 matrix.creuzo.net python[4123]: detected unhandled Python exception in '/usr/bin/firewall-cmd'
Jul 11 17:42:54 matrix.creuzo.net python[4123]: can't communicate with ABRT daemon, is it running? [Errno 111] Connection refused
Jul 11 17:42:54 matrix.creuzo.net ovs-vsctl[4132]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-port ovsbr0 bond0
Jul 11 17:42:54 matrix.creuzo.net network[4012]: ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)
Jul 11 17:42:54 matrix.creuzo.net network[4012]: [  OK  ]
Jul 11 17:42:55 matrix.creuzo.net python[4208]: detected unhandled Python exception in '/usr/bin/firewall-cmd'
Jul 11 17:42:55 matrix.creuzo.net python[4208]: can't communicate with ABRT daemon, is it running? [Errno 111] Connection refused
==============End===============

During the system startup becomes messy because the system did not properly clean up interfaces on shutdown:

=============Start==============
Jul 11 17:43:49 matrix.creuzo.net network[1266]: Bringing up loopback interface:  [  OK  ]
Jul 11 17:43:49 matrix.creuzo.net network[1266]: Bringing up interface bond0:  Error: either "dev" is duplicate, or "ovsbr0" is a garbage.
Jul 11 17:43:49 matrix.creuzo.net network[1266]: Error: either "dev" is duplicate, or "ovsbr0" is a garbage.
Jul 11 17:43:49 matrix.creuzo.net network[1266]: cat: /sys/class/net/enp0s25: Is a directory
Jul 11 17:43:49 matrix.creuzo.net network[1266]: cat: ovsbr0/ifindex: No such file or directory
Jul 11 17:43:49 matrix.creuzo.net network[1266]: /etc/sysconfig/network-scripts/ifup-eth: line 296: 1000 + : syntax error: operand expected (error token is "+ ")
Jul 11 17:43:50 matrix.creuzo.net network[1266]: ERROR    : [/etc/sysconfig/network-scripts/ifup-aliases] Missing config file ovsbr0.
Jul 11 17:43:50 matrix.creuzo.net /etc/sysconfig/network-scripts/ifup-aliases[1938]: Missing config file ovsbr0.
Jul 11 17:43:50 matrix.creuzo.net ovs-vsctl[1944]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --fake-iface add-bond ovsbr0 bond0 enp5s0 enp0s25 bond_mode=balance-tcp lacp=active
Jul 11 17:43:50 matrix.creuzo.net ovs-vsctl[1944]: ovs|00002|vsctl|ERR|cannot create a port named bond0 because a port named bond0 already exists on bridge ovsbr0
Jul 11 17:43:50 matrix.creuzo.net network[1266]: ovs-vsctl: cannot create a port named bond0 because a port named bond0 already exists on bridge ovsbr0
Version-Release number of selected component (if applicable):
==============End===============

After rebooting, when I run systemctl restart network, everything works fine, no error messages are shown.

How reproducible:
100%

Steps to Reproduce:
1.Using the bond configuration make this problem happen on both workstation and on my server
2.
3.


Additional info:

Comment 1 Flavio Leitner 2015-07-31 00:52:09 UTC
Could you delete bond0 port from the bridge and reboot?

Comment 2 Carlos Guidugli 2015-08-02 18:07:11 UTC
Hi Flavio, 

Deleting the bond0 port (ovs-vsctl del-port bond0) and rebooting makes the network to come up fine after the reboot. Similarly, if I do a "systemctl stop network" and reboot, after reboot the network is fine.

I imagine that, during shutdown, openvswitch service cannot delete the necessary ports because some service is still down, and the the error after reboot.

Thanks

Comment 3 Flavio Leitner 2015-08-04 14:14:00 UTC
It seems there is something else happening on your system and I don't understand why the openvswitch is unable to delete ports.  The only service required to delete ports is ovsdb-server which is the openvswitch service.

Can you still reproduce the issue?

Comment 4 Carlos Guidugli 2015-08-06 03:47:55 UTC
Unfortunately this always happened since Fedora 21 when I started to use Bond. This happens on my server and desktop (both has similar configuration), but I did not had problem when I tried Debian few months ago.

These are the messages during shutdown:

Aug 02 14:58:02 matrix.creuzo.net network[4378]: Shutting down interface enp5s0:  [  OK  ]
Aug 02 14:58:03 matrix.creuzo.net network[4378]: Shutting down interface ovsbr0:  Failed to start openvswitch-nonetwork.service: Transaction is destructive.
Aug 02 14:58:03 matrix.creuzo.net python[4559]: detected unhandled Python exception in '/usr/bin/firewall-cmd'
Aug 02 14:58:03 matrix.creuzo.net python[4559]: can't communicate with ABRT daemon, is it running? [Errno 111] Connection refused
Aug 02 14:58:03 matrix.creuzo.net network[4378]: ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)
Aug 02 14:58:03 matrix.creuzo.net network[4378]: [  OK  ]
Aug 02 14:58:03 matrix.creuzo.net network[4378]: Shutting down interface ovsbr1:  Failed to start openvswitch-nonetwork.service: Transaction is destructive.
Aug 02 14:58:03 matrix.creuzo.net python[4635]: detected unhandled Python exception in '/usr/bin/firewall-cmd'
Aug 02 14:58:03 matrix.creuzo.net python[4635]: can't communicate with ABRT daemon, is it running? [Errno 111] Connection refused
Aug 02 14:58:03 matrix.creuzo.net ovs-vsctl[4639]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-br ovsbr11
Aug 02 14:58:03 matrix.creuzo.net network[4378]: ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)
Aug 02 14:58:03 matrix.creuzo.net network[4378]: [  OK  ]
Aug 02 14:58:03 matrix.creuzo.net network[4378]: Shutting down interface ovsbr50:  Failed to start openvswitch-nonetwork.service: Transaction is destructive.
Aug 02 14:58:03 matrix.creuzo.net python[4714]: detected unhandled Python exception in '/usr/bin/firewall-cmd'
Aug 02 14:58:03 matrix.creuzo.net python[4714]: can't communicate with ABRT daemon, is it running? [Errno 111] Connection refused
Aug 02 14:58:03 matrix.creuzo.net ovs-vsctl[4718]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-br ovsbr50
Aug 02 14:58:03 matrix.creuzo.net network[4378]: ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)
Aug 02 14:58:03 matrix.creuzo.net network[4378]: [  OK  ]
Aug 02 14:58:03 matrix.creuzo.net network[4378]: Shutting down interface ovsbr55:  Failed to start openvswitch-nonetwork.service: Transaction is destructive.
Aug 02 14:58:03 matrix.creuzo.net python[4790]: detected unhandled Python exception in '/usr/bin/firewall-cmd'
Aug 02 14:58:03 matrix.creuzo.net python[4790]: can't communicate with ABRT daemon, is it running? [Errno 111] Connection refused
Aug 02 14:58:03 matrix.creuzo.net ovs-vsctl[4794]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-br ovsbr55
Aug 02 14:58:03 matrix.creuzo.net network[4378]: ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)
Aug 02 14:58:03 matrix.creuzo.net network[4378]: [  OK  ]
Aug 02 14:58:03 matrix.creuzo.net network[4378]: Shutting down loopback interface:  [  OK  ]

Comment 5 Flavio Leitner 2015-08-06 14:24:12 UTC
The ifdown executed during the shutdown will make sure that the openvswitch service is running, but for some reason it's not in your system and it can't start "Failed to start openvswitch-nonetwork.service: Transaction is destructive".

The errors seem to have started before the begin of the log posted, so I can't really see what is going on.

I also see:
python[4559]: detected unhandled Python exception in '/usr/bin/firewall-cmd'
which isn't related to OVS or bond apparently

and this:
python[4635]: can't communicate with ABRT daemon, is it running? [Errno 111] Connection refused
The same thing.

Can you provide the full journalctl output from the previous boot reproducing the issue at the shutdown?

Thanks

Comment 6 Carlos Guidugli 2015-08-08 23:44:33 UTC
Created attachment 1060662 [details]
journalctl --full -a -b -1

Comment 7 Flavio Leitner 2015-08-26 17:44:00 UTC
Unfortunately that output didn't provide anything new.
Could you also provide the outputs of:
# systemctl status -l openvswitch
# systemctl status -l openvswitch-nonetwork 
# systemctl
# rpm -Va

and what happens if you:
// stop networking
# systemctl stop network
// stop openvswitch
# systemctl stop openvswitch
// start openvswitch
# systemctl start openvswitch
// start network again
# systemctl start network

The above should reproduce the issue and then we might be able to get info from it.

Thanks

Comment 8 Carlos Guidugli 2015-09-26 18:19:51 UTC
Created attachment 1077545 [details]
More troubleshoot steps

Comment 9 Carlos Guidugli 2015-09-26 18:22:20 UTC
Hi Flavio, 

the last uploaded file will show the output of the commands you asked to run. Please note that the problem only happens after a reboot. You can notice that the first time I run ifconfig, it only shows one physical interface active for the bond. After restarting network, ifconfig shows both interfaces. So, after each reboot I manually run systemctl restart network to fix the problem.

Comment 10 Carlos Guidugli 2015-09-27 22:06:40 UTC
Hi,

Apparently I was able to solve the problem by editing  /usr/lib/systemd/system/openvswitch-nonetwork.service and adding the following line:
Before=network.service

Adding this option eliminates errors (like sample below) during shutdown.

Sep 27 17:34:40 matrix.creuzo.net network[4630]: Shutting down interface enp5s0:  Failed to start openvswitch-nonetwork.service: Transaction is destructive.

Comment 11 Flavio Leitner 2015-10-08 19:16:05 UTC
I see some errors in the start up script:
Sep 25 11:54:33 matrix.creuzo.net systemd[1]: Starting LSB: Bring up/down networking...[m
Sep 25 11:54:33 matrix.creuzo.net network[1300]: Bringing up loopback interface:  [  OK  ][m
Sep 25 11:54:34 matrix.creuzo.net network[1300]: Bringing up interface bond0:  Error: either "dev" is duplicate, or "ovsbr0@NONE" is a garbage.[m
Sep 25 11:54:34 matrix.creuzo.net network[1300]: Error: either "dev" is duplicate, or "ovsbr0@NONE" is a garbage.[m
Sep 25 11:54:34 matrix.creuzo.net network[1300]: cat: /sys/class/net/enp0s25: Is a directory[m

This indicates there is a problem in your ifcfg-bond0 file.  I looked the one attached here and I don't see any issues. Have you changed the file?

I don't see why you need to add that line.  The openvswitch-nonetwork has 'nonetwork' in its name exactly because it doesn't depend on 'network' service.

Also that, when the 'network' tries to bring up the bond0 interface, it will first bring up the OVS bridge, then its ports.  In any ifup, the 'ifup-ovs'
will start openvswitch-nonetwork.service.

Maybe the error above leaves the service in a fail state?

if you think the ifcfg- files are okay.  Then add the following line:
set -x
to the second line of these scripts:
/etc/sysconfig/network-scripts/ifup
/etc/sysconfig/network-scripts/ifup-ovs
/etc/sysconfig/network-scripts/ifup-eth

That will output debug info, which might be recorded in the journal for us to understand better what is going on.

fbl

Comment 12 Carlos Guidugli 2015-10-16 22:28:37 UTC
The problem was related to service ordering. During a shutdown, the openvswitch service was already down and ifdown scripts cannot clean/remove the interfaces. Somehow, entries with strange names are created (for example, ovsbr0@NONE). 

If I do a systemctl stop network before shutdown, the startup scripts runs fine. If the system is up, if I restart network service, it also works. 

When I added the line "Before=network.service" into /usr/lib/systemd/system/openvswitch-nonetwork.service, then openvswitch-nonetwork shutdowns only after network scripts are executed.

After that, I don't see any more errors during shutdown or startup. Would it be possible to update the package to reflect this solution?

Comment 13 Fedora End Of Life 2016-07-19 20:32:47 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.