Bug 1401306 - [DOC] OSP 10 OVS DPDK is out of date when upgrading openvswitch from 2.5.0-14 to 2.5.0-22, breaking the environment
Summary: [DOC] OSP 10 OVS DPDK is out of date when upgrading openvswitch from 2.5.0-14...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: async
: 10.0 (Newton)
Assignee: Angus Thomas
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-04 14:23 UTC by Maxim Babushkin
Modified: 2017-01-24 00:42 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Technology Preview
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-05 18:19:06 UTC
Target Upstream Version:


Attachments (Terms of Use)
first-boot.yaml (3.28 KB, text/plain)
2016-12-04 14:28 UTC, Maxim Babushkin
no flags Details
Updated first-boot.yaml for deployment with openvswitch-2.5.0-22 version (3.39 KB, text/plain)
2016-12-04 21:22 UTC, Maxim Babushkin
no flags Details
Veriosn 2 - Updated first-boot.yaml for deployment with openvswitch-2.5.0-22 version (3.66 KB, text/plain)
2016-12-05 10:34 UTC, Maxim Babushkin
no flags Details
Veriosn 3 - Updated first-boot.yaml for deployment with openvswitch-2.5.0-22 version (3.54 KB, text/plain)
2016-12-08 00:55 UTC, Maxim Babushkin
no flags Details

Description Maxim Babushkin 2016-12-04 14:23:34 UTC
Description of problem:
RHOS10 OVS DPDK environment breaks when updating openvswitch-2.5.0-14 to openvswitch-2.5.0-22 version.

Current RHOS 10 OVS DPDK deployment installs openvswitch-2.5.0-14 version.
During the deployment, first-boot.yaml apply fix to vhu permission that allow instance to bind to openvswitch socket during the creation.
Fix modifying two files:
* /usr/share/openvswitch/scripts/ovs-ctl
    * Adding 'umask 0002' to the 'start_daemon "$OVS_VSWITCHD_PRIORITY" "$OVS_VSWITCHD_WRAPPER" "$@"' line.
    
* /usr/lib/systemd/system/openvswitch-nonetwork.service
    * Adding the following entries.
        - RuntimeDirectoryMode=0775
        - Group=qemu
        - UMask=0002
        

I have tested the openvswitch 2.5.0-22 version by manually installing it.
Openvswitch 2.5.0-22 version removes openvswitch-nonetwork.service which means that the first-boot.yaml fix will not be applied and the instances will not boot.

Since the fix is not applied, openvsitch directory desn't get qemu group ownership and created instance fails with the following error:
ERROR nova.virt.libvirt.guest [req-7707db9e-9b2f-491d-83a5-bb6c9e32b94e 03bf1de786ea40be8f578ad015a23375 e5e73346d04f429ca8e17ff2be409841 - - -] Error launching a defined domain with XML: <domain type='kvm'>

ERROR nova.compute.manager [instance: 284458f5-0ad4-47f1-b522-1b2749df95f6] libvirtError: internal error: process exited while connecting to monitor: 2016-11-30T13:49:11.2229
32Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhuca1793cc-89: Failed to connect socket: Permission denied

Version-Release number of selected component (if applicable):
RHOS10
openvswitch-2.5.0-22.git20160727.el7fdp.x86_64.rpm

Steps to Reproduce:
1. Deploy RHOS 10 OVS DPDK environment.
2. Manually upgrade the openvswitch from 2.5.0-14 to 2.5.9-22 version.
3. Boot and instance.

Actual results:
The instance gets into error state.

Expected results:
Instance should successfully boot.

Additional info:
I'm attaching email thread regarding this issue within the comments.

Comment 1 Maxim Babushkin 2016-12-04 14:25:11 UTC
Maxim Babushkin wrote:
----------------------

Hi Flavio,

I tested the openvswitch-2.5.0-22.git20160727.el7fdp.x86_64.rpm with the manual upgrade.
The rpm taken from brew.

I was not able to boot an instance. Got the following error:

ERROR nova.virt.libvirt.guest [req-7707db9e-9b2f-491d-83a5-bb6c9e32b94e 03bf1de786ea40be8f578ad015a23375 e5e73346d04f429ca8e17ff2be409841 - - -] Error launching a defined domain with XML: <domain type='kvm'>

ERROR nova.compute.manager [instance: 284458f5-0ad4-47f1-b522-1b2749df95f6] libvirtError: internal error: process exited while connecting to monitor: 2016-11-30T13:49:11.2229
32Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhuca1793cc-89: Failed to connect socket: Permission denied

Thanks,
Maxim.

Comment 2 Maxim Babushkin 2016-12-04 14:26:27 UTC
Flavio Leitner wrote:
---------------------

There is a long history behind but basically OVS can't fix that.  So,
OSP has a hack to fix the vhu permission and I don't know why it
didn't work when you manually updated the package.

It could be that or a SELinux issue which is also being fixed by
an OSP selinux package.

Comment 3 Maxim Babushkin 2016-12-04 14:28:09 UTC
Maxim Babushkin wrote:
----------------------

It is not a SELinux issue, as that is the first thing I checked.
The hack to fix the vhu permission is the following [0]?

First-boot.yaml attached.

Comment 4 Maxim Babushkin 2016-12-04 14:28:58 UTC
Created attachment 1227936 [details]
first-boot.yaml

Comment 5 Maxim Babushkin 2016-12-04 14:29:49 UTC
Terry Wilson wrote:
-------------------

That looks like the last accepted solution that I've seen, anyway.

Comment 6 Maxim Babushkin 2016-12-04 14:30:34 UTC
Flavio Leitner wrote:
---------------------

From the OVS perspective we will be able to "fix" when vhost changes
to the client mode, then libvirt will manage the sockets properly.

Comment 7 Maxim Babushkin 2016-12-04 14:31:15 UTC
Maxim Babushkin wrote:
----------------------

In addition, I noticed that when I upgrade the openvswitch from 2.5.0-14 to 2.5.0-22, the openvswitch-nonetwork service disappears.
When I removed the 2.5.0-22 and installed 2.5.0-14 again, the openvswitch-nonetwork service appeared again.

Comment 8 Maxim Babushkin 2016-12-04 14:31:45 UTC
Flavio Leitner wrote:
---------------------

Well, openvswitch-nonetwork is internal to OVS, so you should not need
to use it.

Now we provide one systemd service for each daemon and the main
openvswitch service remains as before.

Flavio

Comment 9 Maxim Babushkin 2016-12-04 14:32:13 UTC
Maxim Babushkin wrote:
----------------------

Ok.

But one of the hack to fix the vhu permission require me to configure the following settings:

RuntimeDirectoryMode=0775
Group=qemu
UMask=0002

Now, as the service does not exists, the fix will not work.

Comment 10 Maxim Babushkin 2016-12-04 14:32:46 UTC
Flavio Leitner wrote:
---------------------

Yes, but the same fix could be applied to the new services.

Flavio

Comment 11 Maxim Babushkin 2016-12-04 14:33:29 UTC
Aaron Conole wrote:
-------------------

It should work if applied to ovs-vswitchd;  there are other problems
you'll run into, though.   This will make all files created by vswitchd
(which includes the monitors, etc) owned by qemu group.  If that's what
is acceptable, it should work.

The service is now called ovs-vswitchd, so no crazy obfuscation.  Just
reference it as it is.

Comment 12 Maxim Babushkin 2016-12-04 14:34:07 UTC
Maxim Babushkin wrote:
----------------------

Yes, the file should be owned by qemu group.

I will check this.

Thanks,
Maxim.

Comment 13 Maxim Babushkin 2016-12-04 21:21:41 UTC
I have verified that the change of the file as suggested by Aaron works and the instance is able to boot successfully.

The following values should be modified within the /usr/lib/systemd/system/ovs-vswitchd.service instead of /usr/lib/systemd/system/openvswitch-nonetwork.service.

RuntimeDirectoryMode=0775
Group=qemu
UMask=0002

Updated first-boot.yaml file for the environment with opencswitch-2.0.5-22 version attached.

Comment 14 Maxim Babushkin 2016-12-04 21:22:56 UTC
Created attachment 1227945 [details]
Updated first-boot.yaml for deployment with openvswitch-2.5.0-22 version

Comment 16 Maxim Babushkin 2016-12-05 10:34:19 UTC
Created attachment 1228023 [details]
Veriosn 2 - Updated first-boot.yaml for deployment with openvswitch-2.5.0-22 version

Comment 17 Maxim Babushkin 2016-12-05 10:37:18 UTC
I have attached an updated first-boot.yaml file witch will be relevant for both openvswitch versions.

Comment 18 Maxim Babushkin 2016-12-05 10:41:19 UTC
An important thing to mention is that if manual openvswitch upgrade made, the bash script from the first-boot.yaml file should be run again manually.

Comment 25 Maxim Babushkin 2016-12-06 15:41:35 UTC
Please, be aware of the following new bug.

https://bugzilla.redhat.com/show_bug.cgi?id=1402032

Unable to start an instance when openvswitch-2.5.0-22 installed within overcloud-full image.

Comment 27 Assaf Muller 2016-12-07 13:07:30 UTC
Unsetting the blocker flag as it was set before we had a full picture. Moving from GA to async as this only affects OVS DPDK which is not a GA blocker. As first-boot.yaml is not a part of RH-OSP or upstream I'm still not sure what is expected as far as patches to Director, still trying to find that out.

Comment 30 Maxim Babushkin 2016-12-08 00:55:55 UTC
Created attachment 1229283 [details]
Veriosn 3 - Updated first-boot.yaml for deployment with openvswitch-2.5.0-22 version

Comment 31 Maxim Babushkin 2016-12-08 00:58:22 UTC
Updated the first-boot.yaml.
Selinux disable section removed since selinux has been verified for the new ovs version 2.5.0-22.

Comment 33 Franck Baudin 2017-01-05 18:19:06 UTC
Closed, superseded by https://bugzilla.redhat.com/show_bug.cgi?id=1402997

Comment 34 Deepti Navale 2017-01-24 00:42:07 UTC
Cancelling the needinfo request.


Note You need to log in before you can comment on or make changes to this bug.