| Summary: | [DOC] OSP 10 OVS DPDK is out of date when upgrading openvswitch from 2.5.0-14 to 2.5.0-22, breaking the environment | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Maxim Babushkin <mbabushk> |
| Component: | rhosp-director | Assignee: | Angus Thomas <athomas> |
| Status: | CLOSED WONTFIX | QA Contact: | Omri Hochman <ohochman> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 10.0 (Newton) | CC: | aconole, amuller, apevec, atelang, chrisw, dbecker, dnavale, fbaudin, fleitner, mbabushk, mburns, morazi, oblaut, pmyers, rhel-osp-director-maint, rhos-maint, srevivo, vchundur, yrachman |
| Target Milestone: | async | Keywords: | ZStream |
| Target Release: | 10.0 (Newton) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Technology Preview | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-05 18:19:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Attachments: | |||
Maxim Babushkin wrote: ---------------------- Hi Flavio, I tested the openvswitch-2.5.0-22.git20160727.el7fdp.x86_64.rpm with the manual upgrade. The rpm taken from brew. I was not able to boot an instance. Got the following error: ERROR nova.virt.libvirt.guest [req-7707db9e-9b2f-491d-83a5-bb6c9e32b94e 03bf1de786ea40be8f578ad015a23375 e5e73346d04f429ca8e17ff2be409841 - - -] Error launching a defined domain with XML: <domain type='kvm'> ERROR nova.compute.manager [instance: 284458f5-0ad4-47f1-b522-1b2749df95f6] libvirtError: internal error: process exited while connecting to monitor: 2016-11-30T13:49:11.2229 32Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhuca1793cc-89: Failed to connect socket: Permission denied Thanks, Maxim. Flavio Leitner wrote: --------------------- There is a long history behind but basically OVS can't fix that. So, OSP has a hack to fix the vhu permission and I don't know why it didn't work when you manually updated the package. It could be that or a SELinux issue which is also being fixed by an OSP selinux package. Maxim Babushkin wrote: ---------------------- It is not a SELinux issue, as that is the first thing I checked. The hack to fix the vhu permission is the following [0]? First-boot.yaml attached. Created attachment 1227936 [details]
first-boot.yaml
Terry Wilson wrote: ------------------- That looks like the last accepted solution that I've seen, anyway. Flavio Leitner wrote: --------------------- From the OVS perspective we will be able to "fix" when vhost changes to the client mode, then libvirt will manage the sockets properly. Maxim Babushkin wrote: ---------------------- In addition, I noticed that when I upgrade the openvswitch from 2.5.0-14 to 2.5.0-22, the openvswitch-nonetwork service disappears. When I removed the 2.5.0-22 and installed 2.5.0-14 again, the openvswitch-nonetwork service appeared again. Flavio Leitner wrote: --------------------- Well, openvswitch-nonetwork is internal to OVS, so you should not need to use it. Now we provide one systemd service for each daemon and the main openvswitch service remains as before. Flavio Maxim Babushkin wrote: ---------------------- Ok. But one of the hack to fix the vhu permission require me to configure the following settings: RuntimeDirectoryMode=0775 Group=qemu UMask=0002 Now, as the service does not exists, the fix will not work. Flavio Leitner wrote: --------------------- Yes, but the same fix could be applied to the new services. Flavio Aaron Conole wrote: ------------------- It should work if applied to ovs-vswitchd; there are other problems you'll run into, though. This will make all files created by vswitchd (which includes the monitors, etc) owned by qemu group. If that's what is acceptable, it should work. The service is now called ovs-vswitchd, so no crazy obfuscation. Just reference it as it is. Maxim Babushkin wrote: ---------------------- Yes, the file should be owned by qemu group. I will check this. Thanks, Maxim. I have verified that the change of the file as suggested by Aaron works and the instance is able to boot successfully. The following values should be modified within the /usr/lib/systemd/system/ovs-vswitchd.service instead of /usr/lib/systemd/system/openvswitch-nonetwork.service. RuntimeDirectoryMode=0775 Group=qemu UMask=0002 Updated first-boot.yaml file for the environment with opencswitch-2.0.5-22 version attached. Created attachment 1227945 [details]
Updated first-boot.yaml for deployment with openvswitch-2.5.0-22 version
Created attachment 1228023 [details]
Veriosn 2 - Updated first-boot.yaml for deployment with openvswitch-2.5.0-22 version
I have attached an updated first-boot.yaml file witch will be relevant for both openvswitch versions. An important thing to mention is that if manual openvswitch upgrade made, the bash script from the first-boot.yaml file should be run again manually. Please, be aware of the following new bug. https://bugzilla.redhat.com/show_bug.cgi?id=1402032 Unable to start an instance when openvswitch-2.5.0-22 installed within overcloud-full image. Unsetting the blocker flag as it was set before we had a full picture. Moving from GA to async as this only affects OVS DPDK which is not a GA blocker. As first-boot.yaml is not a part of RH-OSP or upstream I'm still not sure what is expected as far as patches to Director, still trying to find that out. Created attachment 1229283 [details]
Veriosn 3 - Updated first-boot.yaml for deployment with openvswitch-2.5.0-22 version
Updated the first-boot.yaml. Selinux disable section removed since selinux has been verified for the new ovs version 2.5.0-22. Closed, superseded by https://bugzilla.redhat.com/show_bug.cgi?id=1402997 Cancelling the needinfo request. |
Description of problem: RHOS10 OVS DPDK environment breaks when updating openvswitch-2.5.0-14 to openvswitch-2.5.0-22 version. Current RHOS 10 OVS DPDK deployment installs openvswitch-2.5.0-14 version. During the deployment, first-boot.yaml apply fix to vhu permission that allow instance to bind to openvswitch socket during the creation. Fix modifying two files: * /usr/share/openvswitch/scripts/ovs-ctl * Adding 'umask 0002' to the 'start_daemon "$OVS_VSWITCHD_PRIORITY" "$OVS_VSWITCHD_WRAPPER" "$@"' line. * /usr/lib/systemd/system/openvswitch-nonetwork.service * Adding the following entries. - RuntimeDirectoryMode=0775 - Group=qemu - UMask=0002 I have tested the openvswitch 2.5.0-22 version by manually installing it. Openvswitch 2.5.0-22 version removes openvswitch-nonetwork.service which means that the first-boot.yaml fix will not be applied and the instances will not boot. Since the fix is not applied, openvsitch directory desn't get qemu group ownership and created instance fails with the following error: ERROR nova.virt.libvirt.guest [req-7707db9e-9b2f-491d-83a5-bb6c9e32b94e 03bf1de786ea40be8f578ad015a23375 e5e73346d04f429ca8e17ff2be409841 - - -] Error launching a defined domain with XML: <domain type='kvm'> ERROR nova.compute.manager [instance: 284458f5-0ad4-47f1-b522-1b2749df95f6] libvirtError: internal error: process exited while connecting to monitor: 2016-11-30T13:49:11.2229 32Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhuca1793cc-89: Failed to connect socket: Permission denied Version-Release number of selected component (if applicable): RHOS10 openvswitch-2.5.0-22.git20160727.el7fdp.x86_64.rpm Steps to Reproduce: 1. Deploy RHOS 10 OVS DPDK environment. 2. Manually upgrade the openvswitch from 2.5.0-14 to 2.5.9-22 version. 3. Boot and instance. Actual results: The instance gets into error state. Expected results: Instance should successfully boot. Additional info: I'm attaching email thread regarding this issue within the comments.