Hide Forgot
Description of problem: As apart of writing my automations, I install the OVN feature and its required dependencies in a deployed oVirt environment, run OVN test cases, remove OVN features and clean remaining leftovers. I use the following steps in order to deploy OVN in our environment: On OVN central server: ---------------------- 1. Stop firewall services: firewalld and iptables (BZ ticket: 1390938). 2. Install OVN dependencies: "openvswitch", "openvswitch-ovn-common", "python-openvswitch" (if already installed, try to upgrade them to the latest version). 3. Install OVN packages: "openvswitch-ovn-central", "ovirt-provider-ovn" (if already installed, try to upgrade them to the latest version). 4. Reload systemd daemon (I saw cases where systemd does not refresh the services list). 5. Start OVN provider service (ovirt-provider-ovn). 6. ***** Run tests ***** 7. Stop OVN provider service and openvswitch (to avoid OVS package leftovers). 8. Remove all OVN related packages (as listed in step 3). On OVN driver server: --------------------- 1. Stop firewall services: firewalld and iptables. 2. Install OVS related packages: "openvswitch", "openvswitch-ovn-common", "python-openvswitch" (if already installed, try to upgrade them to the latest version). 3. install OVN related packages: "openvswitch-ovn-host", "ovirt-provider-ovn-driver" (if already installed, try to upgrade them). 4. Same as step 4 from OVN central. 5. Start OVN provider driver service (ovn-controller). 6. Configure OVN with vdsmtool. 7. ***** Run tests ***** 8. Stop OVN provider driver service and ovsdb-server (to avoid OVS package leftovers). 9. Remove all OVN related packages (as listed in step 3). 10. Remove OVN bridge interfacs from host (RPM does not remove the bridge on removal). Using the described steps, I experience problems with oVirt hosts becoming non-responsive. I suspect that it is related to openvswitch service. I also verified it manually, if I restart/stop openvswitch it makes the host non-responsive. If I reactivate the host it becomes responsive again. Version-Release number of selected component (if applicable): oVirt Engine Version: 4.1.0-0.2.master.20161201131309.git6c02a32.el7.centos How reproducible: 100% Steps to Reproduce: Case 1: 1. Stop or restart openvswitch Case 2: 1. yum -y upgrade openvswitch Actual results: Hosts become non-responsive. Expected results: Host should remain responsive. Additional info: Mburman reported a similar issue in the past with openvswitch-2.4: https://bugzilla.redhat.com/show_bug.cgi?id=1371840
Could you attach yum.log and /var/log/message ?
Created attachment 1230522 [details] logs Dec 11 10:36:26 vega04 systemd: Stopping Open vSwitch... Dec 11 10:36:26 vega04 systemd: Stopped Open vSwitch.
yum.log may be useful too, to understand which version where updated and when.
Was the host on maintenance?
(In reply to Dan Kenigsberg from comment #3) > yum.log may be useful too, to understand which version where updated and > when. I need to rebuild the env again in order to get the upgrade scenario. I will do it and update with the yum.log.
(In reply to Yaniv Dary from comment #4) > Was the host on maintenance? No. Does it needs to be in maintenance mode to upgrade openvswitch?
(In reply to Mor from comment #6) > (In reply to Yaniv Dary from comment #4) > > Was the host on maintenance? > > No. Does it needs to be in maintenance mode to upgrade openvswitch? Any package update or service restart requires host to be in maintenance mode. Please consider closing this bug as not a bug.
Dec 11 10:36:21 vega04 systemd: Stopped Open vSwitch Database Unit. Dec 11 10:36:23 vega04 journal: vdsm vds.dispatcher ERROR SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 35314, 0, 0) at 0x29e8ab8>: unexpected eof Dec 11 10:36:23 vega04 systemd: Stopped MOM instance configured for VDSM purposes. Dec 11 10:36:23 vega04 systemd: Stopping Virtual Desktop Server Manager... I'm afraid Dary is right - Vdsm currently depends on OvS, which means that systemd stops Vdsm when ovs is stopped. This explains the temporary non-responsiveness, and indeed is not a bug. Thanks for opening this bug - I did not expect this myself.