Bug 1550398

Summary: failed to start vhost interface after updating ovs to 2.9 in osp10
Product: Red Hat OpenStack Reporter: zenghui.shi <zshi>
Component: openvswitchAssignee: Aaron Conole <aconole>
Status: CLOSED ERRATA QA Contact: Ofer Blaut <oblaut>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: aconole, amuller, apevec, atelang, chrisw, fbaudin, lhh, pliu, rhos-maint, skramaja, srevivo, yroblamo, zshi
Target Milestone: ---Keywords: SELinux, Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-selinux-0.8.14, openvswitch-2.9.0-15 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 23:33:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
audit.log
none
audit-2018-4-2.log none

Description zenghui.shi 2018-03-01 08:05:22 UTC
Created attachment 1402281 [details]
audit.log

Description of problem:

Note: this is not a fresh installation of osp13 with ovs2.9, instead it updates ovs from 2.6 to 2.9 in a osp10 dpdk environment and manually apply the necessary ovs configuration[1] needed for running ovs2.9(e.g user&group permission, dpdk-devargs etc etc), and finally reboot the compute node.

after the compute node is rebooted, use 'nova start' to bring up vm instance(with vhost interface), vm failed to boot and stay in powering-on state, error messages in /var/log/libvirt/qemu/instance.log as beow:

---------------------------------------------------
2018-03-01T07:39:11.887429Z qemu-kvm: -netdev vhost-user,chardev=charnet0,id=hostnet0: Failed to read msg header. Read -1 instead of 12. Original request 1.
2018-03-01T07:39:11.887440Z qemu-kvm: -netdev vhost-user,chardev=charnet0,id=hostnet0: vhost_get_features failed: Input/output error (5)
2018-03-01T07:39:11.887452Z qemu-kvm: -netdev vhost-user,chardev=charnet0,id=hostnet0: failed to init vhost_net for queue 0
2018-03-01T07:39:11.887937Z qemu-kvm: -netdev vhost-user,chardev=charnet0,id=hostnet0: Failed to read msg header. Read -1 instead of 12. Original request 1.
...
---------------------------------------------------

if disabling selinux with 'senforce 0', vm will then become active and running.
which looks like a selinux policy issue blocking the connection/read of vhost socket.(the selinux-policy package version is at 3.13.1-192). the audit.log file is attached.


log in /var/log/openvswitch/ovs-vswitchd.log shows:
---------------------------------------------------
2018-03-01T07:44:47.322Z|14530105|dpdk|ERR|VHOST_CONFIG: truncted msg
2018-03-01T07:44:47.322Z|14530106|dpdk|ERR|VHOST_CONFIG: vhost read message failed
2018-03-01T07:44:47.323Z|14530107|dpdk|INFO|VHOST_CONFIG: new vhost user connection is 402
2018-03-01T07:44:47.323Z|14530108|dpdk|INFO|VHOST_CONFIG: new device, handle is 0
2018-03-01T07:44:47.323Z|14530109|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
2018-03-01T07:44:47.323Z|14530110|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES
2018-03-01T07:44:47.323Z|14530111|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES
2018-03-01T07:44:47.323Z|14530112|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_GET_QUEUE_NUM
2018-03-01T07:44:47.323Z|14530113|dpdk|ERR|VHOST_CONFIG: truncted msg
2018-03-01T07:44:47.323Z|14530114|dpdk|ERR|VHOST_CONFIG: vhost read message failed
2018-03-01T07:44:47.323Z|14530115|dpdk|INFO|VHOST_CONFIG: new vhost user connection is 402
2018-03-01T07:44:47.323Z|14530116|dpdk|INFO|VHOST_CONFIG: new device, handle is 0
2018-03-01T07:44:47.323Z|14530117|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
2018-03-01T07:44:47.323Z|14530118|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES
2018-03-01T07:44:47.323Z|14530119|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES
2018-03-01T07:44:47.323Z|14530120|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_GET_QUEUE_NUM
2018-03-01T07:44:47.323Z|14530121|dpdk|ERR|VHOST_CONFIG: truncted msg
---------------------------------------------------

Version-Release number of selected component (if applicable):

[root@overcloud-compute-0 ~]# rpm -qa | grep 'qemu\|selinux\|openvswitch\|dpdk\|kernel'
kernel-devel-3.10.0-514.21.1.el7.x86_64
kernel-3.10.0-693.17.1.el7.x86_64
libselinux-python-2.5-12.el7.x86_64
kernel-headers-3.10.0-858.el7.x86_64
container-selinux-2.48-1.el7.noarch
ceph-selinux-10.2.10-17.el7cp.x86_64
erlang-kernel-18.3.4.7-1.el7ost.x86_64
kernel-3.10.0-514.21.1.el7.x86_64
python-openvswitch-2.6.1-16.git20161206.el7ost.noarch
dpdk-17.11-7.el7.x86_64
openstack-neutron-openvswitch-9.4.1-12.el7ost.noarch
libselinux-ruby-2.5-12.el7.x86_64
qemu-kvm-common-rhev-2.9.0-16.el7_4.13.x86_64
libselinux-utils-2.5-12.el7.x86_64
libselinux-2.5-12.el7.x86_64
selinux-policy-targeted-3.13.1-192.el7.noarch
openvswitch-2.9.0-3.el7fdp.x86_64
qemu-img-rhev-2.9.0-16.el7_4.13.x86_64
libvirt-daemon-driver-qemu-3.9.0-13.el7.x86_64
libselinux-2.5-12.el7.i686
kernel-3.10.0-858.el7.x86_64
openstack-selinux-0.8.13-1.el7ost.noarch
ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch
kernel-tools-libs-3.10.0-858.el7.x86_64
kernel-devel-3.10.0-858.el7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64
kernel-devel-3.10.0-693.17.1.el7.x86_64
kernel-tools-3.10.0-858.el7.x86_64
selinux-policy-3.13.1-192.el7.noarch


[root@overcloud-compute-0 ~]# cat /etc/sysconfig/openvswitch
OPTIONS=""
DPDK_OPTIONS = "-l 4,6,20,22 -n 4 --socket-mem 20482048 -w 0000:06:00.1"

[root@overcloud-compute-0 ~]# cat /etc/openvswitch/default.conf 
OVS_USER_ID="openvswitch:hugetlbfs"

[root@overcloud-compute-0 ~]# cat /etc/libvirt/qemu.conf
max_files = 32768
max_processes = 131072
user = "openvswitch"
group = "hugetlbfs"

[root@overcloud-compute-0 ~]# ls -al /var/run/openvswitch/
total 344
drwxr-xr-x.  2 openvswitch hugetlbfs    480 Mar  1 07:28 .
drwxr-xr-x. 46 root        root        1400 Mar  1 07:30 ..
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 br-ex.mgmt
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 br-ex.snoop
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 br-int.mgmt
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 br-int.snoop
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 br-isolation.mgmt
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 br-isolation.snoop
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:28 br-link.mgmt
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 br-link.snoop
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 br-tun.mgmt
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 br-tun.snoop
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 db.sock
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 ovsdb-server.1436.ctl
-rw-r--r--.  1 openvswitch hugetlbfs      5 Mar  1 07:27 ovsdb-server.pid
srwxr-x---.  1 openvswitch hugetlbfs      0 Mar  1 07:27 ovs-vswitchd.1529.ctl
-rw-r--r--.  1 openvswitch hugetlbfs      5 Mar  1 07:27 ovs-vswitchd.pid
-rw-r-----.  1 openvswitch hugetlbfs 208420 Mar  1 07:27 .rte_config
-rw-r--r--.  1 openvswitch hugetlbfs 132608 Mar  1 07:27 .rte_hugepage_info
srwxr-xr-x.  1 openvswitch hugetlbfs      0 Mar  1 07:27 .rte_mp_socket
srwxr-xr-x.  1 openvswitch hugetlbfs      0 Mar  1 07:28 vhu04394507-c7
srwxr-xr-x.  1 openvswitch hugetlbfs      0 Mar  1 07:28 vhu46786d43-cf
srwxr-xr-x.  1 openvswitch hugetlbfs      0 Mar  1 07:28 vhu6530141e-3c
srwxr-xr-x.  1 openvswitch hugetlbfs      0 Mar  1 07:28 vhue12a098b-b1

[root@overcloud-compute-0 ~]# ls -al /etc/openvswitch/
total 180
drwxr-xr-x.   2 openvswitch hugetlbfs    152 Mar  1 07:27 .
drwxr-xr-x. 143 root        root        8192 Mar  1 07:30 ..
-rw-r--r--.   1 openvswitch hugetlbfs  61236 Mar  1 07:29 conf.db
-rw-r--r--.   1 root        root      101880 Mar  1 07:27 conf.db.backup7.14.0-3974332717
-rw-------.   1 openvswitch hugetlbfs      0 Feb 28 11:18 .conf.db.~lock~
-rw-------.   1 openvswitch hugetlbfs      0 Mar  1 07:27 .conf.db.tmp.~lock~
-rw-r--r--.   1 openvswitch hugetlbfs    175 Mar  1 07:23 default.conf
-rw-r--r--.   1 openvswitch hugetlbfs     37 Feb 28 11:18 system-id.conf

[root@overcloud-compute-0 ~]# ls -al /var/log/openvswitch/
total 2886804
drwxr-xr-x.  2 openvswitch hugetlbfs        126 Mar  1 07:44 .
drwxr-xr-x. 41 root        root            4096 Mar  1 07:27 ..
-rw-r--r--.  1 openvswitch hugetlbfs        674 Mar  1 07:27 ovsdb-server.log
-rw-r--r--.  1 openvswitch hugetlbfs        381 Feb 28 11:23 ovsdb-server.log-20180301.gz
-rw-r--r--.  1 openvswitch hugetlbfs 2806580006 Mar  1 08:02 ovs-vswitchd.log
-rw-r--r--.  1 openvswitch hugetlbfs       4521 Mar  1 03:02 ovs-vswitchd.log-20180301.gz

[root@overcloud-compute-0 ~]# ovs-vsctl show
84a7a13b-f8e8-4212-90cd-b52ebffe5492
    Manager "ptcp:6640:127.0.0.1"
        is_connected: true
    Bridge br-tun
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port br-tun
            Interface br-tun
                type: internal
        Port "vxlan-ac11020f"
            Interface "vxlan-ac11020f"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.17.2.22", out_key=flow, remote_ip="172.17.2.15"}
        Port "vxlan-ac11020d"
            Interface "vxlan-ac11020d"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.17.2.22", out_key=flow, remote_ip="172.17.2.13"}
        Port "vxlan-ac110211"
            Interface "vxlan-ac110211"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.17.2.22", out_key=flow, remote_ip="172.17.2.17"}
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "vxlan-ac110213"
            Interface "vxlan-ac110213"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.17.2.22", out_key=flow, remote_ip="172.17.2.19"}
    Bridge br-ex
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port phy-br-ex
            Interface phy-br-ex
                type: patch
                options: {peer=int-br-ex}
        Port br-ex
            Interface br-ex
                type: internal
    Bridge br-isolation
        fail_mode: standalone
        Port "vlan202"
            tag: 202
            Interface "vlan202"
                type: internal
        Port "p5p1"
            Interface "p5p1"
        Port "vlan201"
            tag: 201
            Interface "vlan201"
                type: internal
        Port br-isolation
            Interface br-isolation
                type: internal
        Port "vlan203"
            tag: 203
            Interface "vlan203"
                type: internal
    Bridge br-link
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port phy-br-link
            Interface phy-br-link
                type: patch
                options: {peer=int-br-link}
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:06:00.1"}
        Port br-link
            Interface br-link
                type: internal
    Bridge br-int
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port int-br-ex
            Interface int-br-ex
                type: patch
                options: {peer=phy-br-ex}
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port br-int
            Interface br-int
                type: internal
        Port "vhu46786d43-cf"
            tag: 1
            Interface "vhu46786d43-cf"
                type: dpdkvhostuser
        Port "vhu6530141e-3c"
            tag: 2
            Interface "vhu6530141e-3c"
                type: dpdkvhostuser
        Port int-br-link
            Interface int-br-link
                type: patch
                options: {peer=phy-br-link}
        Port "vhue12a098b-b1"
            tag: 1
            Interface "vhue12a098b-b1"
                type: dpdkvhostuser
        Port "vhu04394507-c7"
            tag: 2
            Interface "vhu04394507-c7"
                type: dpdkvhostuser
    ovs_version: "2.9.0"

[root@overcloud-compute-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-dpdk0 
# This file is autogenerated by os-net-config
DEVICE=dpdk0
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=no
PEERDNS=no
DEVICETYPE=ovs
TYPE=OVSDPDKPort
OVS_BRIDGE=br-link
OVS_EXTRA="set Interface dpdk0 options:dpdk-devargs=0000:06:00.1"


[1]: https://github.com/redhat-nfvpe/infrared_upgrade_config/blob/master/ovs_upgrade/tasks/main.yml

Comment 1 Saravanan KR 2018-03-01 08:41:39 UTC
This issue is in OSP10 with ovs2.9, which means the default vhost-user mode is dpdkvhostuser (ovs in the server mode). And the vhost sockets will be created on the /var/run/openvswitch/ directory.

Comment 2 zenghui.shi 2018-03-01 09:47:41 UTC
The config of --socket-mem looks odd '20482048', but I didn't configure or change it manually, is it auto-generated?

[root@overcloud-compute-0 log]# cat /etc/sysconfig/openvswitch
OPTIONS=""
DPDK_OPTIONS = "-l 4,6,20,22 -n 4 --socket-mem 20482048 -w 0000:06:00.1"

Getting other_config returns good value:

[root@overcloud-compute-0 log]# ovs-vsctl get Open_vSwitch . other_config
{dpdk-init="true", dpdk-lcore-mask="a800a8", dpdk-socket-mem="2048,2048", pmd-cpu-mask="500050"}

Comment 3 Saravanan KR 2018-03-01 10:11:08 UTC
(In reply to zenghui.shi from comment #2)
> The config of --socket-mem looks odd '20482048', but I didn't configure or
> change it manually, is it auto-generated?
> 
> [root@overcloud-compute-0 log]# cat /etc/sysconfig/openvswitch
> OPTIONS=""
> DPDK_OPTIONS = "-l 4,6,20,22 -n 4 --socket-mem 20482048 -w 0000:06:00.1"
> 
DPDK_OPTIONS is not used from ovs2.6 onwards. other_config:dpdk-socket-mem is only used. There some mismatch in the puppet configuration because of changed parameter format, but you can ignore it as it is not used.

Comment 13 zenghui.shi 2018-04-02 13:39:13 UTC
Created attachment 1416255 [details]
audit-2018-4-2.log

Comment 17 Lon Hohberger 2018-04-06 16:01:12 UTC
type=USER_AVC msg=audit(1522670657.993:1217): pid=1282 uid=81 auid=4294967295 ses=4294967295 subj=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 msg='avc:  denied  { send_msg } for msgtype=method_call interface=org.freedesktop.login1.Manager member=Inhibit dest=org.freedesktop.login1 spid=242890 tpid=1293 scontext=system_u:system_r:virtlogd_t:s0-s0:c0.c1023 tcontext=system_u:system_r:systemd_logind_t:s0 tclass=dbus  exe=2F7573722F62696E2F646275732D6461656D6F6E3B3561633165373430202864656C6574656429 sauid=81 hostname=? addr=? terminal=?'

This looks like:

https://bugzilla.redhat.com/show_bug.cgi?id=1561729

Comment 18 Lon Hohberger 2018-04-06 16:02:08 UTC
Next update for OSP10 should have the fix for that USER_AVC (for now), which will be resolved in a different way by libvirt at a later time.

Comment 42 errata-xmlrpc 2018-06-27 23:33:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2102