|Summary:||Fixing the permission mismatch for DPDK vhost user ports with openvswitch and qemu|
|Product:||Red Hat OpenStack||Reporter:||Saravanan KR <skramaja>|
|Component:||openstack-tripleo-heat-templates||Assignee:||Saravanan KR <skramaja>|
|Status:||CLOSED ERRATA||QA Contact:||Yariv <yrachman>|
|Version:||12.0 (Pike)||CC:||aloughla, apevec, atelang, atragler, chrisw, dbecker, jschluet, ksundara, mburns, mleitner, morazi, mprivozn, nyechiel, rhel-osp-director-maint, rhos-maint, skramaja, srevivo|
|Target Release:||13.0 (Queens)|
|Fixed In Version:||openstack-tripleo-heat-templates-8.0.2-0.20180327213843.f25e2d8.el7ost||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|Last Closed:||2018-06-27 13:33:53 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:||1515269, 1568360, 1573068|
Description Saravanan KR 2017-08-07 05:57:56 UTC
Description of problem: Currently a workaround has been used to modifying the permission to make ovs to run as qemu group in TripleO, which is a intermediate solution. https://review.openstack.org/#/c/478163/ Actual solution has been worked out by ovs team in https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/333423.html This is the BZ to track the upstream progress. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Comment 1 Aaron Conole 2017-08-11 14:32:42 UTC
Note that this solution has been accepted upstream, and requires that QEMU advertise the sockets with group permissions of +rw, and group ownership of hugetlbfs.
Comment 2 Saravanan KR 2017-08-22 10:30:47 UTC
(In reply to Aaron Conole from comment #1) > Note that this solution has been accepted upstream, and requires that QEMU > advertise the sockets with group permissions of +rw, and group ownership of > hugetlbfs. Could you elaborate on QEMU advertising sockets with required permissions? Are you expecting any particular format or any pre-existing format? We need to add respective teams to continue discuss on it.
Comment 3 Aaron Conole 2017-08-23 19:28:25 UTC
By advertise, what I mean is to just make sure that the file is group owned by hugetlbfs and has group permissions +rw. There shouldn't be anything else needed from discretionary access controls. Mandatory access controls (selinux) is different, and I am working with QE to figure out those issues now.
Comment 4 Saravanan KR 2017-08-24 04:51:36 UTC
Thanks Aaron for the clarification. https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu.conf#L372 There is an option qemu.conf to apply a group id to the qemu processes and its created files. I couldn't find an option to specify the vhost socket file permissions. Adding libvirt team to confirm whether this "group" option could be set as "hugetlbfs" for DPDK OpenStack deployment with "+rw".
Comment 5 Michal Privoznik 2017-08-31 11:23:13 UTC
Libvirt allows setting per-device DAC labels. However, because of lack of implementation we don't support it for vhostuser netdevs. Ideally, the XML config would look like this: <interface type='vhostuser'> <mac address='52:54:00:ee:96:6c'/> <source type='unix' path='/tmp/vhost1.sock' mode='server'/> <model type='virtio'/> <seclabel type='static' model='dac' relabel='yes'> <label>myUser:myGroup</label> </seclabel> </interface> So that the socket can be owned by correct owner. However, libvirt starts qemu with umask(0x002) and there's no way to specify the mode for files created by libvirt nor qemu in domain XML or a config file.
Comment 6 Aaron Conole 2017-09-01 18:41:29 UTC
What is still needed from OvS side for this? Is anything?
Comment 7 Saravanan KR 2017-09-04 05:54:04 UTC
From comment #5, it looks like the libvirt (qemu) may not be able to change the group ownership of the vhost-user sockets to hugetlbfs. I would prefer we agree on the way forward across components - ovs, libvirt, nova, tripleo (nfv). Let me know if a call is required to discuss or we can continue on this BZ itself.
Comment 8 Aaron Conole 2017-09-10 12:34:52 UTC
Let's have a call to discuss. Please schedule it.
Comment 10 Saravanan KR 2017-10-31 13:13:29 UTC
Minutes of discussion between Saravanan, Aaron, Michal and Karthik on agreement on the approach and the next steps: ovs 2.8 has modified the permissions of ovs process - user as openvswitch and group as hugetlbfs by default. All vhost sockets created (in server mode) and opened (in client mode) will look for group permission as hugetlbfs. From libvrit perspective following are the two approaches: 1) Create the domain xml with the specific permission 2) Configure qemu.conf's group  value to hugetlbfs to ensure vhost sockets are created in with specific group id, Aaron has commented that this has been tried standalone by QE and found to be working  From libvirt's perspective, both changes will have same effect for the vhost user sockets. Already with OSP12, the qemu user id is changed to 42427  from the system's default value for the kolla containers. So effectively, qemu is running with uid/gid as 42427 in OSP12 (which is overridden for DPDK deployments in OSP12 , for mismatch) I will try this approach to see if we can take approach (2) for qemu/libvirt and ovs will run as hugetlbfs. Based on the experiment, we will conclude on the approach. Additionally, Aaron is working on configuring the user and group for ovs from /etc/sysconfig/openvswitch (work in progress?), which could also be worked up on if above approach fails. On top of that we have to consider below upgrade scenarios for the analysis, to see how the change in user and group id in ovs2.8 and OSP13 should be handled: a) OSP12 (baremetal) > OSP13 (baremetal) b) OSP12 (containers) > OSP13 (containers) c) OSP12 (baremetal) > OSP13 (containers) a) OSP10 (baremetal) > OSP13 (baremetal) c) OSP10 (baremetal) > OSP13 (containers)  https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu.conf#L412  https://github.com/openstack/kolla/blob/187b1f08f586327e5c47a0bed3760a575daa1287/kolla/common/config.py#L750  https://bugzilla.redhat.com/show_bug.cgi?id=1486127  https://bugzilla.redhat.com/show_bug.cgi?id=1489631
Comment 11 Saravanan KR 2017-11-20 13:19:45 UTC
This bug tracks the removal of THT change to patch the ovs service. It depends on https://bugzilla.redhat.com/show_bug.cgi?id=1515269
Comment 19 Yariv 2018-06-14 17:17:02 UTC
TripleO: network-environmet.yaml VhostuserSocketGroup: "hugetlbfs" On Compute grep OVS_ /etc/sysconfig/openvswitch OVS_USER_ID="openvswitch:hugetlbfs" grep hugetlbfs /etc/libvirt/qemu.conf group = "hugetlbfs" ll /var/run/openvswitch -d drwxr-xr-x. 2 openvswitch hugetlbfs 340 Jun 11 15:57 /var/run/openvswitch With the following RPM openvswitch-2.9.0-19.el7fdp.1.x86_64
Comment 21 errata-xmlrpc 2018-06-27 13:33:53 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086