Bug 1402960 - Upgrading OVS package or stopping/restarting OVS service makes host non-responsive
Summary: Upgrading OVS package or stopping/restarting OVS service makes host non-respo...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-provider-ovn
Classification: oVirt
Component: driver
Version: unspecified
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
: ---
Assignee: Marcin Mirecki
QA Contact: Mor
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-08 18:40 UTC by Mor
Modified: 2017-03-27 04:44 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-11 13:42:39 UTC
oVirt Team: Network


Attachments (Terms of Use)
logs (700.73 KB, application/octet-stream)
2016-12-11 08:41 UTC, Mor
no flags Details

Description Mor 2016-12-08 18:40:25 UTC
Description of problem:
As apart of writing my automations, I install the OVN feature and its required dependencies in a deployed oVirt environment, run OVN test cases, remove OVN features and clean remaining leftovers.

I use the following steps in order to deploy OVN in our environment:

On OVN central server:
----------------------
1. Stop firewall services: firewalld and iptables (BZ ticket: 1390938).
2. Install OVN dependencies: "openvswitch", "openvswitch-ovn-common", "python-openvswitch" (if already installed, try to upgrade them to the latest version).
3. Install OVN packages: "openvswitch-ovn-central", "ovirt-provider-ovn" (if already installed, try to upgrade them to the latest version).
4. Reload systemd daemon (I saw cases where systemd does not refresh the services list).
5. Start OVN provider service (ovirt-provider-ovn).
6. ***** Run tests *****
7. Stop OVN provider service and openvswitch (to avoid OVS package leftovers).
8. Remove all OVN related packages (as listed in step 3).

On OVN driver server:
---------------------
1. Stop firewall services: firewalld and iptables.
2. Install OVS related packages: "openvswitch", "openvswitch-ovn-common", "python-openvswitch" (if already installed, try to upgrade them to the latest version).
3. install OVN related packages: "openvswitch-ovn-host", "ovirt-provider-ovn-driver" (if already installed, try to upgrade them).
4. Same as step 4 from OVN central.
5. Start OVN provider driver service (ovn-controller).
6. Configure OVN with vdsmtool.
7. ***** Run tests *****
8. Stop OVN provider driver service and ovsdb-server (to avoid OVS package leftovers).
9. Remove all OVN related packages (as listed in step 3).
10. Remove OVN bridge interfacs from host (RPM does not remove the bridge on removal).

Using the described steps, I experience problems with oVirt hosts becoming non-responsive. I suspect that it is related to openvswitch service. I also verified it manually, if I restart/stop openvswitch it makes the host non-responsive. If I reactivate the host it becomes responsive again.

Version-Release number of selected component (if applicable):
oVirt Engine Version: 4.1.0-0.2.master.20161201131309.git6c02a32.el7.centos

How reproducible:
100%

Steps to Reproduce:
Case 1:
1. Stop or restart openvswitch 
Case 2:
1. yum -y upgrade openvswitch

Actual results:
Hosts become non-responsive.

Expected results:
Host should remain responsive.

Additional info:
Mburman reported a similar issue in the past with openvswitch-2.4: https://bugzilla.redhat.com/show_bug.cgi?id=1371840

Comment 1 Dan Kenigsberg 2016-12-10 16:03:46 UTC
Could you attach yum.log and /var/log/message ?

Comment 2 Mor 2016-12-11 08:41:48 UTC
Created attachment 1230522 [details]
logs

Dec 11 10:36:26 vega04 systemd: Stopping Open vSwitch...
Dec 11 10:36:26 vega04 systemd: Stopped Open vSwitch.

Comment 3 Dan Kenigsberg 2016-12-11 09:12:31 UTC
yum.log may be useful too, to understand which version where updated and when.

Comment 4 Yaniv Lavi 2016-12-11 11:54:27 UTC
Was the host on maintenance?

Comment 5 Mor 2016-12-11 12:56:01 UTC
(In reply to Dan Kenigsberg from comment #3)
> yum.log may be useful too, to understand which version where updated and
> when.

I need to rebuild the env again in order to get the upgrade scenario. I will do it and update with the yum.log.

Comment 6 Mor 2016-12-11 12:57:24 UTC
(In reply to Yaniv Dary from comment #4)
> Was the host on maintenance?

No. Does it needs to be in maintenance mode to upgrade openvswitch?

Comment 7 Yaniv Lavi 2016-12-11 13:30:53 UTC
(In reply to Mor from comment #6)
> (In reply to Yaniv Dary from comment #4)
> > Was the host on maintenance?
> 
> No. Does it needs to be in maintenance mode to upgrade openvswitch?

Any package update or service restart requires host to be in maintenance mode.
Please consider closing this bug as not a bug.

Comment 8 Dan Kenigsberg 2016-12-11 13:42:39 UTC
Dec 11 10:36:21 vega04 systemd: Stopped Open vSwitch Database Unit.
Dec 11 10:36:23 vega04 journal: vdsm vds.dispatcher ERROR SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 35314, 0, 0) at 0x29e8ab8>: unexpected eof
Dec 11 10:36:23 vega04 systemd: Stopped MOM instance configured for VDSM purposes.
Dec 11 10:36:23 vega04 systemd: Stopping Virtual Desktop Server Manager...

I'm afraid Dary is right - Vdsm currently depends on OvS, which means that systemd stops Vdsm when ovs is stopped. This explains the temporary non-responsiveness, and indeed is not a bug.

Thanks for opening this bug - I did not expect this myself.


Note You need to log in before you can comment on or make changes to this bug.