RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1489631 - OVS-dpdk: A vhostuser server guest hang on start
Summary: OVS-dpdk: A vhostuser server guest hang on start
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch
Version: 7.4
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Aaron Conole
QA Contact: Jean-Tsung Hsiao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-08 00:49 UTC by Jean-Tsung Hsiao
Modified: 2017-10-13 18:02 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-13 18:02:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jean-Tsung Hsiao 2017-09-08 00:49:28 UTC
Description of problem: OVS-dpdk: A vhostuser server guest hang on start

A VM was configured with a vhostuser server port:
<source type='unix' path='/tmp/vhost0' mode='server'/>
 
When started, it hang after qemu-kvm created /tmp/vhost0.

Version-Release number of selected component (if applicable):

[root@netqe19 ~]# uname -a
Linux netqe19.knqe.lab.eng.bos.redhat.com 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

[root@netqe19 ~]# rpm -q openvswitch
openvswitch-2.8.0-0.2.20170810git3631ed2.el7fdb.x86_64

[root@netqe19 ~]# rpm -q qemu-kvm-rhev
qemu-kvm-rhev-2.6.0-25.el7.x86_64
[root@netqe19 ~]# 

*** NOTE: When OVS 2.7.0-10 was deployed. the VM got started successfully.***

How reproducible: Reproducible


Steps to Reproduce:
1. Configure an OVS-dpdk bridge with a dpdkvhostuserclient port
2. Configure the corresponding VM with a vhostuser server port
3. Start the bridge
4. Start the VM

Actual results:
VM hang on start

Expected results:
VM should start successfully.

Additional info:

Comment 2 Kevin Traynor 2017-09-22 09:28:54 UTC
Hi, This works for me using qemu. I've tested with
qemu-kvm-rhev-2.6.0-25.el7.x86_64.rpm
using both 
openvswitch-2.7.0-10.git20170530.el7fdb.x86_64.rpm and
openvswitch-2.8.0-1.el7fdb.x86_64.rpm

The package openvswitch-2.8.0-0.2.20170810git3631ed2.el7fdb.x86_64 was a pre-release and is not in brew.
Can you retest with openvswitch-2.8.0-1.el7fdb.x86_64.rpm ?

If still not working then please give virsh dmpxml, exact steps to reproduce, last log message from qemu, and numactl -H

thanks,
Kevin.

Comment 3 Jean-Tsung Hsiao 2017-09-22 13:23:07 UTC
(In reply to Kevin Traynor from comment #2)
> Hi, This works for me using qemu. I've tested with
> qemu-kvm-rhev-2.6.0-25.el7.x86_64.rpm
> using both 
> openvswitch-2.7.0-10.git20170530.el7fdb.x86_64.rpm and
> openvswitch-2.8.0-1.el7fdb.x86_64.rpm
> 
> The package openvswitch-2.8.0-0.2.20170810git3631ed2.el7fdb.x86_64 was a
> pre-release and is not in brew.
> Can you retest with openvswitch-2.8.0-1.el7fdb.x86_64.rpm ?
> 
> If still not working then please give virsh dmpxml, exact steps to
> reproduce, last log message from qemu, and numactl -H
> 
> thanks,
> Kevin.

Yes, I will.
Thanks!
Jean

Comment 4 Jean-Tsung Hsiao 2017-09-23 17:36:18 UTC
Hi Kevin,
I tried with penvswitch-2.8.0-1.el7fdb.x86_64.rpm, but the same issue still remains.
Please see attached below.
What else needs to be tried ?
Thanks!
Jean

*** kernel and OVS ***
[root@netqe5 vxlan-tunnel]# uname -a
Linux netqe5.knqe.lab.eng.bos.redhat.com 4.11.0-16.el7a.x86_64 #1 SMP Mon Jul 24 22:25:46 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@netqe5 vxlan-tunnel]# rpm -q openvswitch
openvswitch-2.8.0-1.el7fdb.x86_64

*** virsh start mq-vhu-tunnel-server hang here ***
virsh # start mq-vhu-tunnel-server

*** It's now in paused state ***
[root@netqe5 vxlan-tunnel]# virsh dominfo mq-vhu-tunnel-server
Id:             5
Name:           mq-vhu-tunnel-server
UUID:           599be106-543c-4c3a-b65f-4e11f5bd8d79
OS Type:        hvm
State:          paused
CPU(s):         5
CPU time:       0.1s
Max memory:     8388608 KiB
Used memory:    8388608 KiB
Persistent:     yes
Autostart:      disable
Managed save:   no
Security model: selinux
Security DOI:   0
Security label: system_u:system_r:svirt_t:s0:c416,c1017 (permissive)

*** socket created ***
[root@netqe5 vxlan-tunnel]# ls -l /tmp/vhost0 
srwxrwxr-x. 1 qemu qemu 0 Sep 23 13:24 /tmp/vhost0

*** qemu task ***
[root@netqe5 vxlan-tunnel]# ps -elf | grep qemu-kvm
6 S qemu      2412     1  0  80   0 - 77917 skb_wa 13:24 ?        00:00:00 /usr/libexec/qemu-kv -name guest=mq-vhu-tunnel-server,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-5-mq-vhu-tunnel-server/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX -m 8192 -realtime mlock=off -smp 5,sockets=5,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/5-mq-vhu-tunnel-server,share=yes,size=8589934592 -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid 599be106-543c-4c3a-b65f-4e11f5bd8d79 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-5-mq-vhu-tunnel-server/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/home/images/mq_vhu_tunnel_server.img,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/tmp/vhost0,server -netdev vhost-user,chardev=charnet0,queues=2,id=hostnet0 -device virtio-net-pci,mq=on,vectors=6,netdev=hostnet0,id=net0,mac=52:54:00:32:0f:71,bus=pci.0,addr=0x3 -netdev tap,fd=25,id=hostnet1,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:77:96:95,bus=pci.0,addr=0xb -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-5-mq-vhu-tunnel-server/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

Comment 5 Kevin Traynor 2017-10-03 18:25:51 UTC
Hanging after the socket is /tmp/vhost0 is created can indicate it is waiting for the ovs-dpdk to connect to the socket. Is ovs-dpdk running and trying to connect to the right socket location?

Can you run an sosreport when the hang happens?

Comment 6 Jean-Tsung Hsiao 2017-10-04 11:30:27 UTC
(In reply to Kevin Traynor from comment #5)
> Hanging after the socket is /tmp/vhost0 is created can indicate it is
> waiting for the ovs-dpdk to connect to the socket. Is ovs-dpdk running and

Guest is server now. "virsh start" should just return whether ovs-dpdk is running or not. But, the domain state of the guest is paused --- see commentt #4.

NOTE: Same configuration and same process works for 2.7. This is a 2.8 specific issue.

> trying to connect to the right socket location?
> 
> Can you run an sosreport when the hang happens?

I will when I have a chance to reproduce the issue.

Comment 7 Jean-Tsung Hsiao 2017-10-04 12:57:22 UTC
(In reply to Jean-Tsung Hsiao from comment #6)
> (In reply to Kevin Traynor from comment #5)
> > Hanging after the socket is /tmp/vhost0 is created can indicate it is
> > waiting for the ovs-dpdk to connect to the socket. Is ovs-dpdk running and
> 
> Guest is server now. "virsh start" should just return whether ovs-dpdk is
> running or not. But, the domain state of the guest is paused --- see
> commentt #4.
> 
> NOTE: Same configuration and same process works for 2.7. This is a 2.8
> specific issue.
> 
> > trying to connect to the right socket location?
> > 
> > Can you run an sosreport when the hang happens?
> 
> I will when I have a chance to reproduce the issue.

http://file.bos.redhat.com/~jhsiao/sosreport-JHsiao.1489631-20171004083450.tar.xz

Hi Kevin,

Re-produced the issue again with ovs-2.8.0-1. Please get the sosreport from the link above. 

Here are the steps

* Start OVS-dpdk first, then
* start the guest

Following the same steps there is no such issue for 2.7.0-10 and 2.7.2-10.

More questions?

Thanks!

Jean

Comment 8 Jean-Tsung Hsiao 2017-10-04 13:30:05 UTC
Hi Kevin,

Here is an update.

If guest is started first, then it will be in paused state. This is true for 2.7 and 2.8.

Once OVS-dpdk gets started, the state will be active. This is true for 2.7, but not for 2.8.

Thanks!

Jean

Comment 9 Kevin Traynor 2017-10-05 17:01:35 UTC
> 
> http://file.bos.redhat.com/~jhsiao/sosreport-JHsiao.1489631-20171004083450.
> tar.xz

Hi Jean, Can you put this on netqe5? I don't seem to have permission to pick it up in the link above.

Comment 10 Jean-Tsung Hsiao 2017-10-05 23:53:53 UTC
(In reply to Kevin Traynor from comment #9)
> > 
> > http://file.bos.redhat.com/~jhsiao/sosreport-JHsiao.1489631-20171004083450.
> > tar.xz
> 
> Hi Jean, Can you put this on netqe5? I don't seem to have permission to pick
> it up in the link above.

root@netqe5 ~]# ll
total 11608
-rw-------. 1 root root    15689 Jul  6 12:16 anaconda-ks.cfg
drwxr-xr-x. 7 root root      226 Jul 26 09:01 kev
-rw-------. 1 root root    14099 Jul  6 12:16 original-ks.cfg
-rw-------. 1 root root 11849948 Oct  4 08:35 sosreport-JHsiao.1489631-20171004083450.tar.xz
[root@netqe5 ~]#

NOTE: I also made the following file readable.

http://file.bos.redhat.com/~jhsiao/sosreport-JHsiao.1489631-20171004083450.
tar.xz

Comment 11 Kevin Traynor 2017-10-09 16:50:24 UTC
I tested on Jean's board and my own. I think the issue is related to permissions accessing memory when running with systemd. After running with systemd, comparing OVS 2.7 and OVS 2.8 when DPDK is initialized:

[root@netqe5 qemu_hang]# ovs-vsctl -t 5 get Open_vSwitch . other_config
{dpdk-extra="--log-level 8", dpdk-lcore-mask="0x4", dpdk-socket-mem="1024"}
[root@netqe5 qemu_hang]# cat /proc/meminfo | grep Huge
AnonHugePages:      4096 kB
ShmemHugePages:        0 kB
HugePages_Total:      32
HugePages_Free:       31
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
[root@netqe5 qemu_hang]# grep huge /proc/mounts 
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,seclabel,relatime 0 0
[root@netqe5 qemu_hang]# cat /sys/devices/system/node/node*/meminfo | fgrep Huge
Node 0 AnonHugePages:      4096 kB
Node 0 ShmemHugePages:        0 kB
Node 0 HugePages_Total:    16
Node 0 HugePages_Free:     15
Node 0 HugePages_Surp:      0
Node 1 AnonHugePages:         0 kB
Node 1 ShmemHugePages:        0 kB
Node 1 HugePages_Total:    16
Node 1 HugePages_Free:     16
Node 1 HugePages_Surp:      0
[root@netqe5 qemu_hang]# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true



######openvswitch-2.8.0-1.el7fdb.x86_64:

2017-10-09T14:58:21.226Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2017-10-09T14:58:21.230Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0
2017-10-09T14:58:21.230Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1
2017-10-09T14:58:21.230Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores
2017-10-09T14:58:21.230Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2017-10-09T14:58:21.230Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2017-10-09T14:58:21.231Z|00007|dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable
2017-10-09T14:58:21.232Z|00008|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.8.0
2017-10-09T14:59:00.127Z|00009|memory|INFO|18020 kB peak resident set size after 38.9 seconds

2017-10-09T15:01:22.683Z|00010|dpdk|INFO|DPDK Enabled - initializing...
2017-10-09T15:01:22.683Z|00011|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2017-10-09T15:01:22.683Z|00012|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8
2017-10-09T15:01:22.684Z|00013|dpdk|INFO|EAL: Detected 24 lcore(s)
2017-10-09T15:01:22.685Z|00014|dpdk|INFO|EAL: Probing VFIO support...
2017-10-09T15:01:22.685Z|00015|dpdk|INFO|EAL: VFIO support initialized
2017-10-09T15:01:22.687Z|00016|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2017-10-09T15:01:22.687Z|00017|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
2017-10-09T15:01:22.687Z|00018|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB
2017-10-09T15:01:22.687Z|00019|dpdk|ERR|EAL: Cannot init memory
2017-10-09T15:01:23.059Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2017-10-09T15:01:23.063Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0
2017-10-09T15:01:23.063Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1
2017-10-09T15:01:23.063Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores
2017-10-09T15:01:23.063Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2017-10-09T15:01:23.063Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2017-10-09T15:01:23.064Z|00007|dpdk|INFO|DPDK Enabled - initializing...
2017-10-09T15:01:23.064Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2017-10-09T15:01:23.064Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8
2017-10-09T15:01:23.065Z|00010|dpdk|INFO|EAL: Detected 24 lcore(s)
2017-10-09T15:01:23.065Z|00011|dpdk|INFO|EAL: Probing VFIO support...
2017-10-09T15:01:23.065Z|00012|dpdk|INFO|EAL: VFIO support initialized
2017-10-09T15:01:23.068Z|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2017-10-09T15:01:23.068Z|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
2017-10-09T15:01:23.068Z|00015|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB
2017-10-09T15:01:23.068Z|00016|dpdk|ERR|EAL: Cannot init memory
2017-10-09T15:01:23.073Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted))
2017-10-09T15:01:23.073Z|00003|daemon_unix|EMER|could not detach from foreground session
2017-10-09T15:01:23.379Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2017-10-09T15:01:23.382Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0
2017-10-09T15:01:23.382Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1
2017-10-09T15:01:23.382Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores
2017-10-09T15:01:23.382Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2017-10-09T15:01:23.382Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2017-10-09T15:01:23.384Z|00007|dpdk|INFO|DPDK Enabled - initializing...
2017-10-09T15:01:23.384Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2017-10-09T15:01:23.384Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8
2017-10-09T15:01:23.385Z|00010|dpdk|INFO|EAL: Detected 24 lcore(s)
2017-10-09T15:01:23.385Z|00011|dpdk|INFO|EAL: Probing VFIO support...
2017-10-09T15:01:23.385Z|00012|dpdk|INFO|EAL: VFIO support initialized
2017-10-09T15:01:23.388Z|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2017-10-09T15:01:23.388Z|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
2017-10-09T15:01:23.388Z|00015|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB
2017-10-09T15:01:23.388Z|00016|dpdk|ERR|EAL: Cannot init memory
2017-10-09T15:01:23.392Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted))
2017-10-09T15:01:23.392Z|00003|daemon_unix|EMER|could not detach from foreground session
2017-10-09T15:01:23.629Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2017-10-09T15:01:23.632Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0
2017-10-09T15:01:23.632Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1
2017-10-09T15:01:23.632Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores
2017-10-09T15:01:23.632Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2017-10-09T15:01:23.632Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2017-10-09T15:01:23.634Z|00007|dpdk|INFO|DPDK Enabled - initializing...
2017-10-09T15:01:23.634Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2017-10-09T15:01:23.634Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8
2017-10-09T15:01:23.635Z|00010|dpdk|INFO|EAL: Detected 24 lcore(s)
2017-10-09T15:01:23.635Z|00011|dpdk|INFO|EAL: Probing VFIO support...
2017-10-09T15:01:23.635Z|00012|dpdk|INFO|EAL: VFIO support initialized
2017-10-09T15:01:23.638Z|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2017-10-09T15:01:23.638Z|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
2017-10-09T15:01:23.638Z|00015|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB
2017-10-09T15:01:23.638Z|00016|dpdk|ERR|EAL: Cannot init memory
2017-10-09T15:01:23.642Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted))
2017-10-09T15:01:23.643Z|00003|daemon_unix|EMER|could not detach from foreground session
2017-10-09T15:01:23.879Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2017-10-09T15:01:23.882Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0
2017-10-09T15:01:23.882Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1
2017-10-09T15:01:23.882Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores
2017-10-09T15:01:23.882Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2017-10-09T15:01:23.883Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2017-10-09T15:01:23.884Z|00007|dpdk|INFO|DPDK Enabled - initializing...
2017-10-09T15:01:23.884Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2017-10-09T15:01:23.884Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8
2017-10-09T15:01:23.885Z|00010|dpdk|INFO|EAL: Detected 24 lcore(s)
2017-10-09T15:01:23.885Z|00011|dpdk|INFO|EAL: Probing VFIO support...
2017-10-09T15:01:23.885Z|00012|dpdk|INFO|EAL: VFIO support initialized
2017-10-09T15:01:23.888Z|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2017-10-09T15:01:23.888Z|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
2017-10-09T15:01:23.888Z|00015|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB
2017-10-09T15:01:23.888Z|00016|dpdk|ERR|EAL: Cannot init memory
2017-10-09T15:01:23.893Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted))
2017-10-09T15:01:23.893Z|00003|daemon_unix|EMER|could not detach from foreground session
2017-10-09T15:01:24.129Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2017-10-09T15:01:24.132Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0
2017-10-09T15:01:24.132Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1
2017-10-09T15:01:24.132Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores
2017-10-09T15:01:24.132Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2017-10-09T15:01:24.132Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2017-10-09T15:01:24.134Z|00007|dpdk|INFO|DPDK Enabled - initializing...
2017-10-09T15:01:24.134Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2017-10-09T15:01:24.134Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8
2017-10-09T15:01:24.135Z|00010|dpdk|INFO|EAL: Detected 24 lcore(s)
2017-10-09T15:01:24.135Z|00011|dpdk|INFO|EAL: Probing VFIO support...
2017-10-09T15:01:24.135Z|00012|dpdk|INFO|EAL: VFIO support initialized
2017-10-09T15:01:24.138Z|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2017-10-09T15:01:24.138Z|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
2017-10-09T15:01:24.138Z|00015|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB
2017-10-09T15:01:24.138Z|00016|dpdk|ERR|EAL: Cannot init memory
2017-10-09T15:01:24.142Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted))
2017-10-09T15:01:24.142Z|00003|daemon_unix|EMER|could not detach from foreground session



######openvswitch-2.7.2-10.git20170914.el7fdp.x86_64:

2017-10-09T15:14:58.928Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2017-10-09T15:14:58.931Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0
2017-10-09T15:14:58.931Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1
2017-10-09T15:14:58.931Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores
2017-10-09T15:14:58.931Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2017-10-09T15:14:58.931Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2017-10-09T15:14:58.932Z|00007|dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable
2017-10-09T15:14:58.933Z|00008|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.7.2
2017-10-09T15:15:12.043Z|00009|memory|INFO|15284 kB peak resident set size after 13.1 seconds
2017-10-09T15:15:22.211Z|00010|dpdk|INFO|DPDK Enabled - initializing...
2017-10-09T15:15:22.211Z|00011|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2017-10-09T15:15:22.211Z|00012|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8
2017-10-09T15:15:22.212Z|00013|dpdk|INFO|EAL: Detected 24 lcore(s)
2017-10-09T15:15:22.235Z|00014|dpdk|INFO|EAL: Probing VFIO support...
2017-10-09T15:15:22.235Z|00015|dpdk|INFO|EAL: VFIO support initialized
2017-10-09T15:15:30.851Z|00016|dpdk|INFO|EAL: PCI device 0000:81:00.0 on NUMA socket 1
2017-10-09T15:15:30.851Z|00017|dpdk|INFO|EAL:   probe driver: 8086:154d net_ixgbe
2017-10-09T15:15:30.851Z|00018|dpdk|INFO|EAL:   using IOMMU type 1 (Type 1)
2017-10-09T15:15:31.021Z|00019|dpdk|INFO|EAL: Ignore mapping IO port bar(2) addr: 8021
2017-10-09T15:15:31.277Z|00020|dpdk|INFO|EAL: PCI device 0000:81:00.1 on NUMA socket 1
2017-10-09T15:15:31.277Z|00021|dpdk|INFO|EAL:   probe driver: 8086:154d net_ixgbe
2017-10-09T15:15:31.441Z|00022|dpdk|INFO|EAL: Ignore mapping IO port bar(2) addr: 8001
2017-10-09T15:15:31.697Z|00023|dpdk|INFO|EAL: PCI device 0000:83:00.0 on NUMA socket 1
2017-10-09T15:15:31.697Z|00024|dpdk|INFO|EAL:   probe driver: 8086:1528 net_ixgbe
2017-10-09T15:15:31.697Z|00025|dpdk|INFO|EAL: PCI device 0000:83:00.1 on NUMA socket 1
2017-10-09T15:15:31.697Z|00026|dpdk|INFO|EAL:   probe driver: 8086:1528 net_ixgbe
2017-10-09T15:15:31.697Z|00027|dpdk|INFO|EAL: PCI device 0000:84:00.0 on NUMA socket 1
2017-10-09T15:15:31.697Z|00028|dpdk|INFO|EAL:   probe driver: 8086:1583 net_i40e
2017-10-09T15:15:31.994Z|00029|dpdk|INFO|PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.04 eetrack 80002505
2017-10-09T15:15:32.097Z|00030|dpdk|INFO|EAL: PCI device 0000:84:00.1 on NUMA socket 1
2017-10-09T15:15:32.097Z|00031|dpdk|INFO|EAL:   probe driver: 8086:1583 net_i40e
2017-10-09T15:15:32.102Z|00032|dpdk|INFO|DPDK Enabled - initialized

Comment 12 Kevin Traynor 2017-10-09 16:55:29 UTC
I've also seen permission issues with running the openvswitch service

-- Unit ovs-vswitchd.service has begun starting up.
Oct 09 06:28:25 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
Oct 09 06:28:25 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00017|dpdk|ERR|EAL: Cannot open /dev/vfio/17: Permission denied
Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00018|dpdk|ERR|EAL: Requested device 0000:04:00.0 cannot be used
Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00021|dpdk|ERR|EAL: Cannot open /dev/vfio/17: Permission denied
Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00022|dpdk|ERR|EAL: Requested device 0000:04:00.1 cannot be used
Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00025|dpdk|ERR|EAL: Cannot open /dev/vfio/40: Permission denied
Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00026|dpdk|ERR|EAL: Requested device 0000:81:00.0 cannot be used
Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00029|dpdk|ERR|EAL: Cannot open /dev/vfio/41: Permission denied
Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00030|dpdk|ERR|EAL: Requested device 0000:81:00.1 cannot be used
Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00037|dpdk|ERR|EAL: Cannot open /dev/vfio/45: Permission denied
Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00038|dpdk|ERR|EAL: Requested device 0000:84:00.0 cannot be used

Comment 13 Kevin Traynor 2017-10-09 16:56:43 UTC
and similar error, when having dpdk-init=true before starting ovs-vswitchd:

2017-10-09T16:45:08.473Z|00007|dpdk|INFO|DPDK Enabled - initializing...
2017-10-09T16:45:08.473Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2017-10-09T16:45:08.473Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024,1024 --log-level 8
2017-10-09T16:45:08.474Z|00010|dpdk|INFO|EAL: Detected 28 lcore(s)
2017-10-09T16:45:08.474Z|00011|dpdk|INFO|EAL: Probing VFIO support...
2017-10-09T16:45:08.474Z|00012|dpdk|INFO|EAL: VFIO support initialized
2017-10-09T16:45:16.922Z|00013|dpdk|INFO|EAL: PCI device 0000:01:00.0 on NUMA socket 0
2017-10-09T16:45:16.922Z|00014|dpdk|INFO|EAL:   probe driver: 8086:10fb net_ixgbe
2017-10-09T16:45:16.922Z|00015|dpdk|ERR|EAL: Cannot open /dev/vfio/15: Permission denied
2017-10-09T16:45:16.922Z|00016|dpdk|ERR|EAL: Requested device 0000:01:00.0 cannot be used
2017-10-09T16:45:16.922Z|00017|dpdk|INFO|EAL: PCI device 0000:01:00.1 on NUMA socket 0
2017-10-09T16:45:16.922Z|00018|dpdk|INFO|EAL:   probe driver: 8086:10fb net_ixgbe
2017-10-09T16:45:16.922Z|00019|dpdk|ERR|EAL: Cannot open /dev/vfio/16: Permission denied
2017-10-09T16:45:16.922Z|00020|dpdk|ERR|EAL: Requested device 0000:01:00.1 cannot be used
2017-10-09T16:45:16.922Z|00021|dpdk|INFO|EAL: PCI device 0000:07:00.0 on NUMA socket 0
2017-10-09T16:45:16.922Z|00022|dpdk|INFO|EAL:   probe driver: 8086:1521 net_e1000_igb
2017-10-09T16:45:16.922Z|00023|dpdk|INFO|EAL: PCI device 0000:07:00.1 on NUMA socket 0
2017-10-09T16:45:16.922Z|00024|dpdk|INFO|EAL:   probe driver: 8086:1521 net_e1000_igb
2017-10-09T16:45:16.923Z|00025|dpdk|INFO|DPDK pdump packet capture enabled
2017-10-09T16:45:16.927Z|00026|dpdk|INFO|DPDK Enabled - initialized
2017-10-09T16:45:16.929Z|00027|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.8.0

Comment 14 Kevin Traynor 2017-10-09 17:25:55 UTC
Jean, I spoke to Aaron about this issue. He says there should be a reboot for upgrade from OVS 2.7 to OVS 2.8. Can you test with a reboot between them?

Comment 15 Jean-Tsung Hsiao 2017-10-10 02:39:26 UTC
Hi Kevin,

I have provisioned a new system, netqe19(10.19.15.69), and installed OVS 2.8.0-1. Still, encountered the same issue.

Please log in and debug there.

Thanks!

Jean

Comment 16 Kevin Traynor 2017-10-10 16:57:00 UTC
I tested today on my lab machine with reboots. What I am seeing is that when ovs-vswitchd is run through systemd as user openvswitch, ovs-vswitchd can't connect to the qemu or libvirt/qemu created vhost sockets.

Comment 17 Aaron Conole 2017-10-10 20:07:31 UTC
Here's the deal:

1. QEMU, when started by libvirt, spawns the vhost chardev as a socket in server mode.  Whenever a socket chardev is spawned, if it is a listen socket (which a server is) it is defaulted to 'wait for connection'.  This means that the qemu process will *always* wait for a connection on a server socket for vhostuser unless a chardev option directs it not to.  Hence the 'stuck' observation.

2. When libvirt starts QEMU, it first checks in /etc/libvirt/qemu.conf for a user/group to spawn the vm process (qemu).  This defaults to qemu:qemu.

To get rid of this, edit /etc/libvirt/qemu.conf and change the line:

#group = "root"

to 

group = "hugetlbfs"

Please try this, and see if it gets you passed the problem.  I don't think this is a bug in the code, but will need to be documented somewhere for client mode support.

Comment 18 Jean-Tsung Hsiao 2017-10-10 22:26:55 UTC
(In reply to Aaron Conole from comment #17)
> Here's the deal:
> 
> 1. QEMU, when started by libvirt, spawns the vhost chardev as a socket in
> server mode.  Whenever a socket chardev is spawned, if it is a listen socket
> (which a server is) it is defaulted to 'wait for connection'.  This means
> that the qemu process will *always* wait for a connection on a server socket
> for vhostuser unless a chardev option directs it not to.  Hence the 'stuck'
> observation.
> 
> 2. When libvirt starts QEMU, it first checks in /etc/libvirt/qemu.conf for a
> user/group to spawn the vm process (qemu).  This defaults to qemu:qemu.
> 
> To get rid of this, edit /etc/libvirt/qemu.conf and change the line:
> 
> #group = "root"
> 
> to 
> 
> group = "hugetlbfs"
> 
> Please try this, and see if it gets you passed the problem.  I don't think
> this is a bug in the code, but will need to be documented somewhere for
> client mode support.

Hi Aaron,

I tried as you instructed on netqe19, but still encountered the same issue. I even changed the user to openvswitch --- same issue.

Feel free login into netqe19 and check.

Thanks!

Jean

Comment 21 Aaron Conole 2017-10-13 18:02:51 UTC
This is something that needs to be changed in libvirtd.  We have a bug 1478791 to address this, but it has to be done from a triple-o / orchestration side.

Thanks for the confirmation, Jean.


Note You need to log in before you can comment on or make changes to this bug.