Bug 1447112 - RHOS 11 DPDK vhost_sockets directory wrong
Summary: RHOS 11 DPDK vhost_sockets directory wrong
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 11.0 (Ocata)
Assignee: Karthik Sundaravel
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-01 19:02 UTC by Lon Hohberger
Modified: 2017-12-11 12:10 UTC (History)
35 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1431556
: 1496700 (view as bug list)
Environment:
Last Closed: 2017-12-11 12:10:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1675690 0 None None None 2017-05-01 19:02:48 UTC
OpenStack gerrit 449530 0 None None None 2017-05-01 19:02:48 UTC

Description Lon Hohberger 2017-05-01 19:02:48 UTC
+++ This bug was initially created as a clone of Bug #1431556 +++

Description of problem:
RHOS 11 OVS DPDK setup unable to boot dpdk instance.

Version-Release number of selected component (if applicable):
RHOS 11
2017-03-08.3 puddle

[root@compute-0 ~]# rpm -qa |grep openvswitch
openvswitch-ovn-central-2.6.1-8.git20161206.el7fdb.x86_64
openstack-neutron-openvswitch-10.0.0-4.el7ost.noarch
openvswitch-2.6.1-8.git20161206.el7fdb.x86_64
openvswitch-ovn-common-2.6.1-8.git20161206.el7fdb.x86_64
python-openvswitch-2.6.1-8.git20161206.el7fdb.noarch
openvswitch-ovn-host-2.6.1-8.git20161206.el7fdb.x86_64

How reproducible:
Deploy RHOS11 with OVS DPDK
Try to boot a dpdk instance.

Actual results:
The instance gets ERROR state on boot.

Expected results:
The instance should boot successfully.

Additional info:
Instance error:
{u'message': u'internal error: process exited while connecting to monitor: t=1 -vnc 10.10.111.107:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on\n2017-03-13T08:40:47.885506Z qemu-kvm:', u'code': 500, u'details': u'  File
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1780, in _do_build_and_run_instance\n    filter_properties)\n  File "/usr/lib/python2.7/site-
packages/nova/compute/manager.py", line 2016, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'created':
u'2017-03-13T08:40:49Z'}


Libvirt instance log error:
2017-03-13T08:40:47.885506Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhu003c709e-28,server: Failed to bind socket to /var/run/openvswitch/vhu003c709e-28: Permission denied
2017-03-13 08:40:47.922+0000: shutting down


--- Additional comment from Maxim Babushkin on 2017-03-17 18:00:59 EDT ---

Hi Terry,

Thanks for the location suggestion.

I have verified the deployment and it works lie a charm.
No need any permission workarounds.
Just changing the NeutronVhostuserSocketDir to "/var/lib/libvirt/qemu" within network-environment file did the trick.

I sure we could use it in RHOS 10 OVS DPDK deployments as well.

--- Additional comment from Michal Privoznik on 2017-03-18 02:19:35 EDT ---

(In reply to Terry Wilson from comment #18)
> I would like to see whether just switching vhostuser_socket_dir to
> /var/lib/libvirt/qemu would work since it is owned qemu:qemu already. It
> would be a simple solution.

This is a private libvirt directory. This is just a temporary workaround (that I'm not a big fan of either). What the problem here is, is that qemu creates a socket. Thus libvirt has no option, no action to take. It can't chown the parent directory - other domains (with different uid:gid) might use the directory as well. So in here, we just surrender and leave it up to system admin to create a directory with proper security labels and we make our promise to no touch them (the labels I mean). Ideal solution would be that it is libvirt who creates the socket (among with proper security labels), and then just pass it to qemu to use. But we are long way from there.

IOW, I'd suggest to create your own directory and set proper owner on it instead of misusing a libvirt's internal directory.

--- Additional comment from Karthik Sundaravel on 2017-03-20 07:37:08 EDT ---

(In reply to Michal Privoznik from comment #20)
Is it ok to use /usr/local/openvswitch with permissions qemu:qemu g+w ?

--- Additional comment from Terry Wilson on 2017-03-20 19:48:18 EDT ---

Since qemu is in charge of creating the sockets, it seems weird to me to put the sockets in a directory named "openvswitch" that is owned by qemu. It also seems a little weird for /usr/local to be used for storing data for programs that are installed under /usr.

What are the permissions on the sockets created by qemu? If they are restrictive enough, could /tmp be used much like pgsql sockets, etc.? In any case, we know what works so someone needs to decide.

--- Additional comment from Yariv on 2017-03-21 12:09:27 EDT ---

Please decide on agreed solution and document in the BZ
Thanks

--- Additional comment from Karthik Sundaravel on 2017-03-22 03:52:17 EDT ---

Terry, agreed with /usr/local.
I think we shall go ahead with /var/lib/vhost_sockets. 
/var/lib is meant for dynamic data libraries and files. Does it make sense to use this.


---[ SNIP ]---

--- Additional comment from Maxim Babushkin on 2017-04-19 06:38:38 EDT ---

Hi Lon, Lukas,

Thank you for the help.
The rpm is working.

Lukas, thank you for the DAC_OVERRIDE article. It helped me to understand what should be added to the deployment.

OVS permission workaround used in RHOS 10 should be implemented in RHOS 11 as well. Updated first-boot attached.


Franck, is the ovs vhu permission workaround acceptable for us in RHOS 11? I thought we will not be using it in ovs 2.6.

--- Additional comment from Lon Hohberger on 2017-05-01 12:53:27 EDT ---

The following error seems to occur even with SELinux disabled; is this potentially a red herring?


2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320] Traceback (most recent call last):
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2125, in _build_resources
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     yield resources
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1930, in _build_and_run_instance
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     block_device_info=block_device_info)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2698, in spawn
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     destroy_disks_on_failure=True)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5112, in _create_domain_and_network
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     destroy_disks_on_failure)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     self.force_reraise()
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     six.reraise(self.type_, self.value, self.tb)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5084, in _create_domain_and_network
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     post_xml_callback=post_xml_callback)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5002, in _create_domain
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     guest.launch(pause=pause)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 145, in launch
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     self._encoded_xml, errors='ignore')
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     self.force_reraise()
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     six.reraise(self.type_, self.value, self.tb)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 140, in launch
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     return self._domain.createWithFlags(flags)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     result = proxy_call(self._autowrap, f, *args, **kwargs)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     rv = execute(f, *args, **kwargs)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     six.reraise(c, e, tb)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     rv = meth(*args, **kwargs)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1065, in createWithFlags
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320] libvirtError: internal error: process exited while connecting to monitor: 2017-04-30T19:30:55.102672Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhu43f166c3-e9,server: Failed to bind socket to /var/run/openvswitch/vhu43f166c3-e9: Permission denied
2017-04-30 19:30:55.605 30855 ERROR nova.compute.manager [instance: 69492f63-a1f2-4dcf-bc7a-09c68ae76320]

Here's my attempt to create servers on the same host:

2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [req-4c5588f2-fc1e-4e27-baec-4f2fa67a72bf 4423f30fd4764b49a5df5b97de52a139 cd143ac333e94ad4b956b7d78ff1652b - - -] [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51] Instance failed to spawn
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51] Traceback (most recent call last):
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2125, in _build_resources
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     yield resources
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1930, in _build_and_run_instance
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     block_device_info=block_device_info)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2698, in spawn
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     destroy_disks_on_failure=True)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5112, in _create_domain_and_network
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     destroy_disks_on_failure)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     self.force_reraise()
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     six.reraise(self.type_, self.value, self.tb)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5084, in _create_domain_and_network
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     post_xml_callback=post_xml_callback)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5002, in _create_domain
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     guest.launch(pause=pause)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 145, in launch
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     self._encoded_xml, errors='ignore')
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     self.force_reraise()
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     six.reraise(self.type_, self.value, self.tb)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 140, in launch
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     return self._domain.createWithFlags(flags)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     result = proxy_call(self._autowrap, f, *args, **kwargs)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     rv = execute(f, *args, **kwargs)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     six.reraise(c, e, tb)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     rv = meth(*args, **kwargs)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1065, in createWithFlags
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51] libvirtError: internal error: qemu unexpectedly closed the monitor: 2017-05-01T16:48:28.307048Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhuce684eb9-15,server: Failed to bind socket to /var/run/openvswitch/vhuce684eb9-15: Permission denied
2017-05-01 16:48:28.745 30855 ERROR nova.compute.manager [instance: 22fac6c6-a8d6-4585-a398-d05a1d0dde51]


This happens even with the entire overcloud with SELinux in permissive.

--- Additional comment from Lon Hohberger on 2017-05-01 13:16:15 EDT ---

/var/run/openvswitch is trying to be written to by a non-permitted user (mode 755); this is one problem.

There are also some AVCs that can be observed in permissive mode:

--- Additional comment from Lon Hohberger on 2017-05-01 13:35:12 EDT ---

With SELinux in permissive mode and having changed /var/run/openvswitch to mode 0777 (a+rw), I was able to launch instances.

It's almost as though something changed locations recently; I'd like to understand that prior to using /var/run/openvswitch.

--- Additional comment from Lon Hohberger on 2017-05-01 13:38:58 EDT ---

The deployment appears to have changed from using "/var/lib/vhost_sockets" to "/var/run/openvswitch", which is the likely cause for these AVCs.

--- Additional comment from Lon Hohberger on 2017-05-01 13:45:32 EDT ---

Maxim, do you have any idea why using /var/run/openvswitch for vhost_sockets would be used instead of /var/lib/vhost_sockets as we built code for (and/or do you know what sets NeutronVhostuserSocketDir )?

This should be working if /var/lib/vhost_sockets is the value used for NeutronVhostuserSocketDir, I think...

--- Additional comment from Lon Hohberger on 2017-05-01 13:49:51 EDT ---

It looks like https://review.openstack.org/#/c/449530/3 was never merged.


=========================

TL;DR: In solving bug 1431556, the expected location for vhost_sockets was /var/lib/vhost_sockets.  Using /var/run/openvswitch can't be used; it requires the directory to be writable by qemu and openvswitch users.

Other directories besides /var/lib/vhost_sockets can be used as long as they are not shared between other processes (but changing the path require further changes to openstack-selinux).

See bug 1431556, comment #24

Comment 1 Lon Hohberger 2017-05-01 19:04:27 UTC
Basically, a fix was proposed - but never finalized upstream.

Comment 2 Lon Hohberger 2017-05-04 13:22:42 UTC
Dropping priority - this is solved with an updated first_boot.yaml in bug 1431556.

Comment 4 Saravanan KR 2017-08-04 07:05:06 UTC
Upstream issue is fixed and for OSP10 and OSP11 documentation is provided with first-boot changes.

Comment 5 Edu Alcaniz 2017-09-27 15:22:56 UTC
Hi, we are having issues with one customer about using these OVS DPDK
Version OSP10Z4

ealcaniz@ealcaniz systemd]$ grep vhue49fc191-70 *
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[16983]: ovs|00001|db_ctl_base|ERR|no row "/vhue49fc191-70" in table Interface
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[17001]: ovs|00001|db_ctl_base|ERR|no row "/vhue49fc191-70" in table Interface
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it libvirtd[2789]: 2017-09-25 09:37:15.824+0000: 2789: error : qemuProcessReportLogError:1862 : internal error: qemu unexpectedly closed the monitor: 2017-09-25T09:37:15.813841Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhue49fc191-70: Failed to connect socket: Permission denied
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[16972]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- --if-exists del-port vhue49fc191-70 -- add-port br-int vhue49fc191-70 -- set Interface vhue49fc191-70 external-ids:iface-id=e49fc191-70bd-4edf-b096-cef609c8b7d4 external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:90:c4:62 external-ids:vm-uuid=8f1be557-ea9a-4d5e-b31c-93e06666ed08 type=dpdkvhostuser
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[16975]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- set interface vhue49fc191-70 mtu_request=9000
systemctl_status_--all:Sep 25 11:37:16 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[17031]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- --if-exists del-port br-int vhue49fc191-70
systemctl_status_--all:Sep 25 11:37:16 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[17033]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- --if-exists del-port br-int vhue49fc191-70
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it ovs-vswitchd[1577]: VHOST_CONFIG: bind to /var/run/openvswitch/vhue49fc191-70




2017-09-25 09:37:15.660+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08:54:01, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: cpt1-dpdk-totp.nfv.cselt.it
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-000010a4,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-instance-000010a4/master-key.aes -machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Client,ss=on,hypervisor=on,tsc_adjust=on,pdpe1gb=on,mpx=off,xsavec=off,xgetbv1=off -m 4096 -realtime mlock=off -smp 2,sockets=1,cores=1,threads=2 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/3-instance-000010a4,share=yes,size=4294967296,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -uuid 8f1be557-ea9a-4d5e-b31c-93e06666ed08 -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.0.7-11.el7ost,serial=38873f97-1eae-4dc9-a74a-71b078bb59c9,uuid=8f1be557-ea9a-4d5e-b31c-93e06666ed08,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-instance-000010a4/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/8f1be557-ea9a-4d5e-b31c-93e06666ed08/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/var/run/openvswitch/vhue49fc191-70 -netdev vhost-user,chardev=charnet0,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:90:c4:62,bus=pci.0,addr=0x3 -add-fd set=0,fd=28 -chardev file,id=charserial0,path=/dev/fdset/0,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 10.20.0.181:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2017-09-25T09:37:15.813841Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhue49fc191-70: Failed to connect socket: Permission denied
2017-09-25 09:37:15.824+0000: shutting down, reason=failed



**nova-compute.log**

2017-09-25 11:37:16.038 5296 ERROR nova.virt.libvirt.driver [req-dadd0f0e-93b9-4700-be40-d44dd62dfbfe df9893ff6fe042dd955337ea04279f0d dfd9ac55feec4b7795208bfa9415955d - - -] [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08] Failed to start libvirt guest
2017-09-25 11:37:16.050 5296 INFO os_vif [req-dadd0f0e-93b9-4700-be40-d44dd62dfbfe df9893ff6fe042dd955337ea04279f0d dfd9ac55feec4b7795208bfa9415955d - - -] Successfully unplugged vif VIFVHostUser(active=False,address=fa:16:3e:90:c4:62,has_traffic_filtering=False,id=e49fc191-70bd-4edf-b096-cef609c8b7d4,mode='client',network=Network(3d557637-a9e5-40ab-849f-2e4dbd842a51),path='/var/run/openvswitch/vhue49fc191-70',plugin='ovs',port_profile=VIFPortProfileBase,preserve_on_delete=False,vif_name=<?>)
2017-09-25 11:37:16.058 5296 INFO nova.virt.libvirt.driver [req-dadd0f0e-93b9-4700-be40-d44dd62dfbfe df9893ff6fe042dd955337ea04279f0d dfd9ac55feec4b7795208bfa9415955d - - -] [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08] Deleting instance files /var/lib/nova/instances/8f1be557-ea9a-4d5e-b31c-93e06666ed08_del
2017-09-25 11:37:16.059 5296 INFO nova.virt.libvirt.driver [req-dadd0f0e-93b9-4700-be40-d44dd62dfbfe df9893ff6fe042dd955337ea04279f0d dfd9ac55feec4b7795208bfa9415955d - - -] [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08] Deletion of /var/lib/nova/instances/8f1be557-ea9a-4d5e-b31c-93e06666ed08_del complete
2017-09-25 11:37:16.206 5296 ERROR nova.compute.manager [req-dadd0f0e-93b9-4700-be40-d44dd62dfbfe df9893ff6fe042dd955337ea04279f0d dfd9ac55feec4b7795208bfa9415955d - - -] [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08] Instance failed to spawn

.....

017-09-25 11:37:16.206 5296 ERROR nova.compute.manager [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1069, in createWithFlags
2017-09-25 11:37:16.206 5296 ERROR nova.compute.manager [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2017-09-25 11:37:16.206 5296 ERROR nova.compute.manager [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08] libvirtError: internal error: qemu unexpectedly closed the monitor: 2017-09-25T09:37:15.813841Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhue49fc191-70: Failed to connect socket: Permission denied
2017-09-25 11:37:16.206 5296 ERROR nova.compute.manager [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08]

Comment 6 Edu Alcaniz 2017-09-27 15:23:04 UTC
Hi, we are having issues with one customer about using these OVS DPDK
Version OSP10Z4

ealcaniz@ealcaniz systemd]$ grep vhue49fc191-70 *
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[16983]: ovs|00001|db_ctl_base|ERR|no row "/vhue49fc191-70" in table Interface
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[17001]: ovs|00001|db_ctl_base|ERR|no row "/vhue49fc191-70" in table Interface
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it libvirtd[2789]: 2017-09-25 09:37:15.824+0000: 2789: error : qemuProcessReportLogError:1862 : internal error: qemu unexpectedly closed the monitor: 2017-09-25T09:37:15.813841Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhue49fc191-70: Failed to connect socket: Permission denied
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[16972]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- --if-exists del-port vhue49fc191-70 -- add-port br-int vhue49fc191-70 -- set Interface vhue49fc191-70 external-ids:iface-id=e49fc191-70bd-4edf-b096-cef609c8b7d4 external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:90:c4:62 external-ids:vm-uuid=8f1be557-ea9a-4d5e-b31c-93e06666ed08 type=dpdkvhostuser
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[16975]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- set interface vhue49fc191-70 mtu_request=9000
systemctl_status_--all:Sep 25 11:37:16 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[17031]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- --if-exists del-port br-int vhue49fc191-70
systemctl_status_--all:Sep 25 11:37:16 cpt1-dpdk-totp.nfv.cselt.it ovs-vsctl[17033]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- --if-exists del-port br-int vhue49fc191-70
systemctl_status_--all:Sep 25 11:37:15 cpt1-dpdk-totp.nfv.cselt.it ovs-vswitchd[1577]: VHOST_CONFIG: bind to /var/run/openvswitch/vhue49fc191-70




2017-09-25 09:37:15.660+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08:54:01, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: cpt1-dpdk-totp.nfv.cselt.it
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-000010a4,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-instance-000010a4/master-key.aes -machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Client,ss=on,hypervisor=on,tsc_adjust=on,pdpe1gb=on,mpx=off,xsavec=off,xgetbv1=off -m 4096 -realtime mlock=off -smp 2,sockets=1,cores=1,threads=2 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/3-instance-000010a4,share=yes,size=4294967296,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -uuid 8f1be557-ea9a-4d5e-b31c-93e06666ed08 -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.0.7-11.el7ost,serial=38873f97-1eae-4dc9-a74a-71b078bb59c9,uuid=8f1be557-ea9a-4d5e-b31c-93e06666ed08,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-instance-000010a4/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/8f1be557-ea9a-4d5e-b31c-93e06666ed08/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/var/run/openvswitch/vhue49fc191-70 -netdev vhost-user,chardev=charnet0,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:90:c4:62,bus=pci.0,addr=0x3 -add-fd set=0,fd=28 -chardev file,id=charserial0,path=/dev/fdset/0,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 10.20.0.181:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2017-09-25T09:37:15.813841Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhue49fc191-70: Failed to connect socket: Permission denied
2017-09-25 09:37:15.824+0000: shutting down, reason=failed



**nova-compute.log**

2017-09-25 11:37:16.038 5296 ERROR nova.virt.libvirt.driver [req-dadd0f0e-93b9-4700-be40-d44dd62dfbfe df9893ff6fe042dd955337ea04279f0d dfd9ac55feec4b7795208bfa9415955d - - -] [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08] Failed to start libvirt guest
2017-09-25 11:37:16.050 5296 INFO os_vif [req-dadd0f0e-93b9-4700-be40-d44dd62dfbfe df9893ff6fe042dd955337ea04279f0d dfd9ac55feec4b7795208bfa9415955d - - -] Successfully unplugged vif VIFVHostUser(active=False,address=fa:16:3e:90:c4:62,has_traffic_filtering=False,id=e49fc191-70bd-4edf-b096-cef609c8b7d4,mode='client',network=Network(3d557637-a9e5-40ab-849f-2e4dbd842a51),path='/var/run/openvswitch/vhue49fc191-70',plugin='ovs',port_profile=VIFPortProfileBase,preserve_on_delete=False,vif_name=<?>)
2017-09-25 11:37:16.058 5296 INFO nova.virt.libvirt.driver [req-dadd0f0e-93b9-4700-be40-d44dd62dfbfe df9893ff6fe042dd955337ea04279f0d dfd9ac55feec4b7795208bfa9415955d - - -] [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08] Deleting instance files /var/lib/nova/instances/8f1be557-ea9a-4d5e-b31c-93e06666ed08_del
2017-09-25 11:37:16.059 5296 INFO nova.virt.libvirt.driver [req-dadd0f0e-93b9-4700-be40-d44dd62dfbfe df9893ff6fe042dd955337ea04279f0d dfd9ac55feec4b7795208bfa9415955d - - -] [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08] Deletion of /var/lib/nova/instances/8f1be557-ea9a-4d5e-b31c-93e06666ed08_del complete
2017-09-25 11:37:16.206 5296 ERROR nova.compute.manager [req-dadd0f0e-93b9-4700-be40-d44dd62dfbfe df9893ff6fe042dd955337ea04279f0d dfd9ac55feec4b7795208bfa9415955d - - -] [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08] Instance failed to spawn

.....

017-09-25 11:37:16.206 5296 ERROR nova.compute.manager [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1069, in createWithFlags
2017-09-25 11:37:16.206 5296 ERROR nova.compute.manager [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2017-09-25 11:37:16.206 5296 ERROR nova.compute.manager [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08] libvirtError: internal error: qemu unexpectedly closed the monitor: 2017-09-25T09:37:15.813841Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhue49fc191-70: Failed to connect socket: Permission denied
2017-09-25 11:37:16.206 5296 ERROR nova.compute.manager [instance: 8f1be557-ea9a-4d5e-b31c-93e06666ed08]

Comment 11 Karthik Sundaravel 2017-09-28 07:22:05 UTC
1. Please check if the file /usr/lib/systemd/system/ovs-vswitchd.service in compute node has
RuntimeDirectoryMode=0775
Group=qemu
UMask=0002
2. Check if the file /usr/share/openvswitch/scripts/ovs-ctl in compute node has 
umask 0002 && start_daemon "$OVS_VSWITCHD_PRIORITY" "$OVS_VSWITCHD_WRAPPER" "$@" ||
in the function do_start_forwarding()

3. Please check if ovs-vsctl show throws any errors on the compute node with DPDK

4. Please add the SOS reports

Please note that this BZ is reported on OSP11.
OSP11 and later works with mode=server, while OSP10 works with mode=client, and the vhostuser socket directories differ in both cases.
So I think its appropriate to raise a new BZ

Comment 12 Edu Alcaniz 2017-09-28 07:35:33 UTC
moved to bz https://bugzilla.redhat.com/show_bug.cgi?id=1496700 for OSP10.


Note You need to log in before you can comment on or make changes to this bug.