Bug 1489631
Summary: | OVS-dpdk: A vhostuser server guest hang on start | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jean-Tsung Hsiao <jhsiao> |
Component: | openvswitch | Assignee: | Aaron Conole <aconole> |
Status: | CLOSED NOTABUG | QA Contact: | Jean-Tsung Hsiao <jhsiao> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.4 | CC: | aconole, atragler, echaudro, fleitner, jhsiao, ktraynor, kzhang, rcain |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-10-13 18:02:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jean-Tsung Hsiao
2017-09-08 00:49:28 UTC
Hi, This works for me using qemu. I've tested with qemu-kvm-rhev-2.6.0-25.el7.x86_64.rpm using both openvswitch-2.7.0-10.git20170530.el7fdb.x86_64.rpm and openvswitch-2.8.0-1.el7fdb.x86_64.rpm The package openvswitch-2.8.0-0.2.20170810git3631ed2.el7fdb.x86_64 was a pre-release and is not in brew. Can you retest with openvswitch-2.8.0-1.el7fdb.x86_64.rpm ? If still not working then please give virsh dmpxml, exact steps to reproduce, last log message from qemu, and numactl -H thanks, Kevin. (In reply to Kevin Traynor from comment #2) > Hi, This works for me using qemu. I've tested with > qemu-kvm-rhev-2.6.0-25.el7.x86_64.rpm > using both > openvswitch-2.7.0-10.git20170530.el7fdb.x86_64.rpm and > openvswitch-2.8.0-1.el7fdb.x86_64.rpm > > The package openvswitch-2.8.0-0.2.20170810git3631ed2.el7fdb.x86_64 was a > pre-release and is not in brew. > Can you retest with openvswitch-2.8.0-1.el7fdb.x86_64.rpm ? > > If still not working then please give virsh dmpxml, exact steps to > reproduce, last log message from qemu, and numactl -H > > thanks, > Kevin. Yes, I will. Thanks! Jean Hi Kevin, I tried with penvswitch-2.8.0-1.el7fdb.x86_64.rpm, but the same issue still remains. Please see attached below. What else needs to be tried ? Thanks! Jean *** kernel and OVS *** [root@netqe5 vxlan-tunnel]# uname -a Linux netqe5.knqe.lab.eng.bos.redhat.com 4.11.0-16.el7a.x86_64 #1 SMP Mon Jul 24 22:25:46 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux [root@netqe5 vxlan-tunnel]# rpm -q openvswitch openvswitch-2.8.0-1.el7fdb.x86_64 *** virsh start mq-vhu-tunnel-server hang here *** virsh # start mq-vhu-tunnel-server *** It's now in paused state *** [root@netqe5 vxlan-tunnel]# virsh dominfo mq-vhu-tunnel-server Id: 5 Name: mq-vhu-tunnel-server UUID: 599be106-543c-4c3a-b65f-4e11f5bd8d79 OS Type: hvm State: paused CPU(s): 5 CPU time: 0.1s Max memory: 8388608 KiB Used memory: 8388608 KiB Persistent: yes Autostart: disable Managed save: no Security model: selinux Security DOI: 0 Security label: system_u:system_r:svirt_t:s0:c416,c1017 (permissive) *** socket created *** [root@netqe5 vxlan-tunnel]# ls -l /tmp/vhost0 srwxrwxr-x. 1 qemu qemu 0 Sep 23 13:24 /tmp/vhost0 *** qemu task *** [root@netqe5 vxlan-tunnel]# ps -elf | grep qemu-kvm 6 S qemu 2412 1 0 80 0 - 77917 skb_wa 13:24 ? 00:00:00 /usr/libexec/qemu-kv -name guest=mq-vhu-tunnel-server,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-5-mq-vhu-tunnel-server/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX -m 8192 -realtime mlock=off -smp 5,sockets=5,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/5-mq-vhu-tunnel-server,share=yes,size=8589934592 -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid 599be106-543c-4c3a-b65f-4e11f5bd8d79 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-5-mq-vhu-tunnel-server/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/home/images/mq_vhu_tunnel_server.img,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/tmp/vhost0,server -netdev vhost-user,chardev=charnet0,queues=2,id=hostnet0 -device virtio-net-pci,mq=on,vectors=6,netdev=hostnet0,id=net0,mac=52:54:00:32:0f:71,bus=pci.0,addr=0x3 -netdev tap,fd=25,id=hostnet1,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:77:96:95,bus=pci.0,addr=0xb -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-5-mq-vhu-tunnel-server/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on Hanging after the socket is /tmp/vhost0 is created can indicate it is waiting for the ovs-dpdk to connect to the socket. Is ovs-dpdk running and trying to connect to the right socket location? Can you run an sosreport when the hang happens? (In reply to Kevin Traynor from comment #5) > Hanging after the socket is /tmp/vhost0 is created can indicate it is > waiting for the ovs-dpdk to connect to the socket. Is ovs-dpdk running and Guest is server now. "virsh start" should just return whether ovs-dpdk is running or not. But, the domain state of the guest is paused --- see commentt #4. NOTE: Same configuration and same process works for 2.7. This is a 2.8 specific issue. > trying to connect to the right socket location? > > Can you run an sosreport when the hang happens? I will when I have a chance to reproduce the issue. (In reply to Jean-Tsung Hsiao from comment #6) > (In reply to Kevin Traynor from comment #5) > > Hanging after the socket is /tmp/vhost0 is created can indicate it is > > waiting for the ovs-dpdk to connect to the socket. Is ovs-dpdk running and > > Guest is server now. "virsh start" should just return whether ovs-dpdk is > running or not. But, the domain state of the guest is paused --- see > commentt #4. > > NOTE: Same configuration and same process works for 2.7. This is a 2.8 > specific issue. > > > trying to connect to the right socket location? > > > > Can you run an sosreport when the hang happens? > > I will when I have a chance to reproduce the issue. http://file.bos.redhat.com/~jhsiao/sosreport-JHsiao.1489631-20171004083450.tar.xz Hi Kevin, Re-produced the issue again with ovs-2.8.0-1. Please get the sosreport from the link above. Here are the steps * Start OVS-dpdk first, then * start the guest Following the same steps there is no such issue for 2.7.0-10 and 2.7.2-10. More questions? Thanks! Jean Hi Kevin, Here is an update. If guest is started first, then it will be in paused state. This is true for 2.7 and 2.8. Once OVS-dpdk gets started, the state will be active. This is true for 2.7, but not for 2.8. Thanks! Jean
>
> http://file.bos.redhat.com/~jhsiao/sosreport-JHsiao.1489631-20171004083450.
> tar.xz
Hi Jean, Can you put this on netqe5? I don't seem to have permission to pick it up in the link above.
(In reply to Kevin Traynor from comment #9) > > > > http://file.bos.redhat.com/~jhsiao/sosreport-JHsiao.1489631-20171004083450. > > tar.xz > > Hi Jean, Can you put this on netqe5? I don't seem to have permission to pick > it up in the link above. root@netqe5 ~]# ll total 11608 -rw-------. 1 root root 15689 Jul 6 12:16 anaconda-ks.cfg drwxr-xr-x. 7 root root 226 Jul 26 09:01 kev -rw-------. 1 root root 14099 Jul 6 12:16 original-ks.cfg -rw-------. 1 root root 11849948 Oct 4 08:35 sosreport-JHsiao.1489631-20171004083450.tar.xz [root@netqe5 ~]# NOTE: I also made the following file readable. http://file.bos.redhat.com/~jhsiao/sosreport-JHsiao.1489631-20171004083450. tar.xz I tested on Jean's board and my own. I think the issue is related to permissions accessing memory when running with systemd. After running with systemd, comparing OVS 2.7 and OVS 2.8 when DPDK is initialized: [root@netqe5 qemu_hang]# ovs-vsctl -t 5 get Open_vSwitch . other_config {dpdk-extra="--log-level 8", dpdk-lcore-mask="0x4", dpdk-socket-mem="1024"} [root@netqe5 qemu_hang]# cat /proc/meminfo | grep Huge AnonHugePages: 4096 kB ShmemHugePages: 0 kB HugePages_Total: 32 HugePages_Free: 31 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB [root@netqe5 qemu_hang]# grep huge /proc/mounts cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0 hugetlbfs /dev/hugepages hugetlbfs rw,seclabel,relatime 0 0 [root@netqe5 qemu_hang]# cat /sys/devices/system/node/node*/meminfo | fgrep Huge Node 0 AnonHugePages: 4096 kB Node 0 ShmemHugePages: 0 kB Node 0 HugePages_Total: 16 Node 0 HugePages_Free: 15 Node 0 HugePages_Surp: 0 Node 1 AnonHugePages: 0 kB Node 1 ShmemHugePages: 0 kB Node 1 HugePages_Total: 16 Node 1 HugePages_Free: 16 Node 1 HugePages_Surp: 0 [root@netqe5 qemu_hang]# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ######openvswitch-2.8.0-1.el7fdb.x86_64: 2017-10-09T14:58:21.226Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2017-10-09T14:58:21.230Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0 2017-10-09T14:58:21.230Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1 2017-10-09T14:58:21.230Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores 2017-10-09T14:58:21.230Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2017-10-09T14:58:21.230Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2017-10-09T14:58:21.231Z|00007|dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable 2017-10-09T14:58:21.232Z|00008|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.8.0 2017-10-09T14:59:00.127Z|00009|memory|INFO|18020 kB peak resident set size after 38.9 seconds 2017-10-09T15:01:22.683Z|00010|dpdk|INFO|DPDK Enabled - initializing... 2017-10-09T15:01:22.683Z|00011|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch 2017-10-09T15:01:22.683Z|00012|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8 2017-10-09T15:01:22.684Z|00013|dpdk|INFO|EAL: Detected 24 lcore(s) 2017-10-09T15:01:22.685Z|00014|dpdk|INFO|EAL: Probing VFIO support... 2017-10-09T15:01:22.685Z|00015|dpdk|INFO|EAL: VFIO support initialized 2017-10-09T15:01:22.687Z|00016|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied 2017-10-09T15:01:22.687Z|00017|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function. 2017-10-09T15:01:22.687Z|00018|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB 2017-10-09T15:01:22.687Z|00019|dpdk|ERR|EAL: Cannot init memory 2017-10-09T15:01:23.059Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2017-10-09T15:01:23.063Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0 2017-10-09T15:01:23.063Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1 2017-10-09T15:01:23.063Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores 2017-10-09T15:01:23.063Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2017-10-09T15:01:23.063Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2017-10-09T15:01:23.064Z|00007|dpdk|INFO|DPDK Enabled - initializing... 2017-10-09T15:01:23.064Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch 2017-10-09T15:01:23.064Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8 2017-10-09T15:01:23.065Z|00010|dpdk|INFO|EAL: Detected 24 lcore(s) 2017-10-09T15:01:23.065Z|00011|dpdk|INFO|EAL: Probing VFIO support... 2017-10-09T15:01:23.065Z|00012|dpdk|INFO|EAL: VFIO support initialized 2017-10-09T15:01:23.068Z|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied 2017-10-09T15:01:23.068Z|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function. 2017-10-09T15:01:23.068Z|00015|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB 2017-10-09T15:01:23.068Z|00016|dpdk|ERR|EAL: Cannot init memory 2017-10-09T15:01:23.073Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted)) 2017-10-09T15:01:23.073Z|00003|daemon_unix|EMER|could not detach from foreground session 2017-10-09T15:01:23.379Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2017-10-09T15:01:23.382Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0 2017-10-09T15:01:23.382Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1 2017-10-09T15:01:23.382Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores 2017-10-09T15:01:23.382Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2017-10-09T15:01:23.382Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2017-10-09T15:01:23.384Z|00007|dpdk|INFO|DPDK Enabled - initializing... 2017-10-09T15:01:23.384Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch 2017-10-09T15:01:23.384Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8 2017-10-09T15:01:23.385Z|00010|dpdk|INFO|EAL: Detected 24 lcore(s) 2017-10-09T15:01:23.385Z|00011|dpdk|INFO|EAL: Probing VFIO support... 2017-10-09T15:01:23.385Z|00012|dpdk|INFO|EAL: VFIO support initialized 2017-10-09T15:01:23.388Z|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied 2017-10-09T15:01:23.388Z|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function. 2017-10-09T15:01:23.388Z|00015|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB 2017-10-09T15:01:23.388Z|00016|dpdk|ERR|EAL: Cannot init memory 2017-10-09T15:01:23.392Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted)) 2017-10-09T15:01:23.392Z|00003|daemon_unix|EMER|could not detach from foreground session 2017-10-09T15:01:23.629Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2017-10-09T15:01:23.632Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0 2017-10-09T15:01:23.632Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1 2017-10-09T15:01:23.632Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores 2017-10-09T15:01:23.632Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2017-10-09T15:01:23.632Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2017-10-09T15:01:23.634Z|00007|dpdk|INFO|DPDK Enabled - initializing... 2017-10-09T15:01:23.634Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch 2017-10-09T15:01:23.634Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8 2017-10-09T15:01:23.635Z|00010|dpdk|INFO|EAL: Detected 24 lcore(s) 2017-10-09T15:01:23.635Z|00011|dpdk|INFO|EAL: Probing VFIO support... 2017-10-09T15:01:23.635Z|00012|dpdk|INFO|EAL: VFIO support initialized 2017-10-09T15:01:23.638Z|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied 2017-10-09T15:01:23.638Z|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function. 2017-10-09T15:01:23.638Z|00015|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB 2017-10-09T15:01:23.638Z|00016|dpdk|ERR|EAL: Cannot init memory 2017-10-09T15:01:23.642Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted)) 2017-10-09T15:01:23.643Z|00003|daemon_unix|EMER|could not detach from foreground session 2017-10-09T15:01:23.879Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2017-10-09T15:01:23.882Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0 2017-10-09T15:01:23.882Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1 2017-10-09T15:01:23.882Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores 2017-10-09T15:01:23.882Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2017-10-09T15:01:23.883Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2017-10-09T15:01:23.884Z|00007|dpdk|INFO|DPDK Enabled - initializing... 2017-10-09T15:01:23.884Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch 2017-10-09T15:01:23.884Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8 2017-10-09T15:01:23.885Z|00010|dpdk|INFO|EAL: Detected 24 lcore(s) 2017-10-09T15:01:23.885Z|00011|dpdk|INFO|EAL: Probing VFIO support... 2017-10-09T15:01:23.885Z|00012|dpdk|INFO|EAL: VFIO support initialized 2017-10-09T15:01:23.888Z|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied 2017-10-09T15:01:23.888Z|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function. 2017-10-09T15:01:23.888Z|00015|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB 2017-10-09T15:01:23.888Z|00016|dpdk|ERR|EAL: Cannot init memory 2017-10-09T15:01:23.893Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted)) 2017-10-09T15:01:23.893Z|00003|daemon_unix|EMER|could not detach from foreground session 2017-10-09T15:01:24.129Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2017-10-09T15:01:24.132Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0 2017-10-09T15:01:24.132Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1 2017-10-09T15:01:24.132Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores 2017-10-09T15:01:24.132Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2017-10-09T15:01:24.132Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2017-10-09T15:01:24.134Z|00007|dpdk|INFO|DPDK Enabled - initializing... 2017-10-09T15:01:24.134Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch 2017-10-09T15:01:24.134Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8 2017-10-09T15:01:24.135Z|00010|dpdk|INFO|EAL: Detected 24 lcore(s) 2017-10-09T15:01:24.135Z|00011|dpdk|INFO|EAL: Probing VFIO support... 2017-10-09T15:01:24.135Z|00012|dpdk|INFO|EAL: VFIO support initialized 2017-10-09T15:01:24.138Z|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied 2017-10-09T15:01:24.138Z|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function. 2017-10-09T15:01:24.138Z|00015|dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 1024MB, available: 0MB 2017-10-09T15:01:24.138Z|00016|dpdk|ERR|EAL: Cannot init memory 2017-10-09T15:01:24.142Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted)) 2017-10-09T15:01:24.142Z|00003|daemon_unix|EMER|could not detach from foreground session ######openvswitch-2.7.2-10.git20170914.el7fdp.x86_64: 2017-10-09T15:14:58.928Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2017-10-09T15:14:58.931Z|00002|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 0 2017-10-09T15:14:58.931Z|00003|ovs_numa|INFO|Discovered 12 CPU cores on NUMA node 1 2017-10-09T15:14:58.931Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 24 CPU cores 2017-10-09T15:14:58.931Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2017-10-09T15:14:58.931Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2017-10-09T15:14:58.932Z|00007|dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable 2017-10-09T15:14:58.933Z|00008|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.7.2 2017-10-09T15:15:12.043Z|00009|memory|INFO|15284 kB peak resident set size after 13.1 seconds 2017-10-09T15:15:22.211Z|00010|dpdk|INFO|DPDK Enabled - initializing... 2017-10-09T15:15:22.211Z|00011|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch 2017-10-09T15:15:22.211Z|00012|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024 --log-level 8 2017-10-09T15:15:22.212Z|00013|dpdk|INFO|EAL: Detected 24 lcore(s) 2017-10-09T15:15:22.235Z|00014|dpdk|INFO|EAL: Probing VFIO support... 2017-10-09T15:15:22.235Z|00015|dpdk|INFO|EAL: VFIO support initialized 2017-10-09T15:15:30.851Z|00016|dpdk|INFO|EAL: PCI device 0000:81:00.0 on NUMA socket 1 2017-10-09T15:15:30.851Z|00017|dpdk|INFO|EAL: probe driver: 8086:154d net_ixgbe 2017-10-09T15:15:30.851Z|00018|dpdk|INFO|EAL: using IOMMU type 1 (Type 1) 2017-10-09T15:15:31.021Z|00019|dpdk|INFO|EAL: Ignore mapping IO port bar(2) addr: 8021 2017-10-09T15:15:31.277Z|00020|dpdk|INFO|EAL: PCI device 0000:81:00.1 on NUMA socket 1 2017-10-09T15:15:31.277Z|00021|dpdk|INFO|EAL: probe driver: 8086:154d net_ixgbe 2017-10-09T15:15:31.441Z|00022|dpdk|INFO|EAL: Ignore mapping IO port bar(2) addr: 8001 2017-10-09T15:15:31.697Z|00023|dpdk|INFO|EAL: PCI device 0000:83:00.0 on NUMA socket 1 2017-10-09T15:15:31.697Z|00024|dpdk|INFO|EAL: probe driver: 8086:1528 net_ixgbe 2017-10-09T15:15:31.697Z|00025|dpdk|INFO|EAL: PCI device 0000:83:00.1 on NUMA socket 1 2017-10-09T15:15:31.697Z|00026|dpdk|INFO|EAL: probe driver: 8086:1528 net_ixgbe 2017-10-09T15:15:31.697Z|00027|dpdk|INFO|EAL: PCI device 0000:84:00.0 on NUMA socket 1 2017-10-09T15:15:31.697Z|00028|dpdk|INFO|EAL: probe driver: 8086:1583 net_i40e 2017-10-09T15:15:31.994Z|00029|dpdk|INFO|PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.04 eetrack 80002505 2017-10-09T15:15:32.097Z|00030|dpdk|INFO|EAL: PCI device 0000:84:00.1 on NUMA socket 1 2017-10-09T15:15:32.097Z|00031|dpdk|INFO|EAL: probe driver: 8086:1583 net_i40e 2017-10-09T15:15:32.102Z|00032|dpdk|INFO|DPDK Enabled - initialized I've also seen permission issues with running the openvswitch service -- Unit ovs-vswitchd.service has begun starting up. Oct 09 06:28:25 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00013|dpdk|ERR|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied Oct 09 06:28:25 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00014|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function. Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00017|dpdk|ERR|EAL: Cannot open /dev/vfio/17: Permission denied Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00018|dpdk|ERR|EAL: Requested device 0000:04:00.0 cannot be used Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00021|dpdk|ERR|EAL: Cannot open /dev/vfio/17: Permission denied Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00022|dpdk|ERR|EAL: Requested device 0000:04:00.1 cannot be used Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00025|dpdk|ERR|EAL: Cannot open /dev/vfio/40: Permission denied Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00026|dpdk|ERR|EAL: Requested device 0000:81:00.0 cannot be used Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00029|dpdk|ERR|EAL: Cannot open /dev/vfio/41: Permission denied Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00030|dpdk|ERR|EAL: Requested device 0000:81:00.1 cannot be used Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00037|dpdk|ERR|EAL: Cannot open /dev/vfio/45: Permission denied Oct 09 06:28:33 netqe5.knqe.lab.eng.bos.redhat.com ovs-vswitchd[4996]: ovs|00038|dpdk|ERR|EAL: Requested device 0000:84:00.0 cannot be used and similar error, when having dpdk-init=true before starting ovs-vswitchd: 2017-10-09T16:45:08.473Z|00007|dpdk|INFO|DPDK Enabled - initializing... 2017-10-09T16:45:08.473Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch 2017-10-09T16:45:08.473Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x4 --socket-mem 1024,1024 --log-level 8 2017-10-09T16:45:08.474Z|00010|dpdk|INFO|EAL: Detected 28 lcore(s) 2017-10-09T16:45:08.474Z|00011|dpdk|INFO|EAL: Probing VFIO support... 2017-10-09T16:45:08.474Z|00012|dpdk|INFO|EAL: VFIO support initialized 2017-10-09T16:45:16.922Z|00013|dpdk|INFO|EAL: PCI device 0000:01:00.0 on NUMA socket 0 2017-10-09T16:45:16.922Z|00014|dpdk|INFO|EAL: probe driver: 8086:10fb net_ixgbe 2017-10-09T16:45:16.922Z|00015|dpdk|ERR|EAL: Cannot open /dev/vfio/15: Permission denied 2017-10-09T16:45:16.922Z|00016|dpdk|ERR|EAL: Requested device 0000:01:00.0 cannot be used 2017-10-09T16:45:16.922Z|00017|dpdk|INFO|EAL: PCI device 0000:01:00.1 on NUMA socket 0 2017-10-09T16:45:16.922Z|00018|dpdk|INFO|EAL: probe driver: 8086:10fb net_ixgbe 2017-10-09T16:45:16.922Z|00019|dpdk|ERR|EAL: Cannot open /dev/vfio/16: Permission denied 2017-10-09T16:45:16.922Z|00020|dpdk|ERR|EAL: Requested device 0000:01:00.1 cannot be used 2017-10-09T16:45:16.922Z|00021|dpdk|INFO|EAL: PCI device 0000:07:00.0 on NUMA socket 0 2017-10-09T16:45:16.922Z|00022|dpdk|INFO|EAL: probe driver: 8086:1521 net_e1000_igb 2017-10-09T16:45:16.922Z|00023|dpdk|INFO|EAL: PCI device 0000:07:00.1 on NUMA socket 0 2017-10-09T16:45:16.922Z|00024|dpdk|INFO|EAL: probe driver: 8086:1521 net_e1000_igb 2017-10-09T16:45:16.923Z|00025|dpdk|INFO|DPDK pdump packet capture enabled 2017-10-09T16:45:16.927Z|00026|dpdk|INFO|DPDK Enabled - initialized 2017-10-09T16:45:16.929Z|00027|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.8.0 Jean, I spoke to Aaron about this issue. He says there should be a reboot for upgrade from OVS 2.7 to OVS 2.8. Can you test with a reboot between them? Hi Kevin, I have provisioned a new system, netqe19(10.19.15.69), and installed OVS 2.8.0-1. Still, encountered the same issue. Please log in and debug there. Thanks! Jean I tested today on my lab machine with reboots. What I am seeing is that when ovs-vswitchd is run through systemd as user openvswitch, ovs-vswitchd can't connect to the qemu or libvirt/qemu created vhost sockets. Here's the deal: 1. QEMU, when started by libvirt, spawns the vhost chardev as a socket in server mode. Whenever a socket chardev is spawned, if it is a listen socket (which a server is) it is defaulted to 'wait for connection'. This means that the qemu process will *always* wait for a connection on a server socket for vhostuser unless a chardev option directs it not to. Hence the 'stuck' observation. 2. When libvirt starts QEMU, it first checks in /etc/libvirt/qemu.conf for a user/group to spawn the vm process (qemu). This defaults to qemu:qemu. To get rid of this, edit /etc/libvirt/qemu.conf and change the line: #group = "root" to group = "hugetlbfs" Please try this, and see if it gets you passed the problem. I don't think this is a bug in the code, but will need to be documented somewhere for client mode support. (In reply to Aaron Conole from comment #17) > Here's the deal: > > 1. QEMU, when started by libvirt, spawns the vhost chardev as a socket in > server mode. Whenever a socket chardev is spawned, if it is a listen socket > (which a server is) it is defaulted to 'wait for connection'. This means > that the qemu process will *always* wait for a connection on a server socket > for vhostuser unless a chardev option directs it not to. Hence the 'stuck' > observation. > > 2. When libvirt starts QEMU, it first checks in /etc/libvirt/qemu.conf for a > user/group to spawn the vm process (qemu). This defaults to qemu:qemu. > > To get rid of this, edit /etc/libvirt/qemu.conf and change the line: > > #group = "root" > > to > > group = "hugetlbfs" > > Please try this, and see if it gets you passed the problem. I don't think > this is a bug in the code, but will need to be documented somewhere for > client mode support. Hi Aaron, I tried as you instructed on netqe19, but still encountered the same issue. I even changed the user to openvswitch --- same issue. Feel free login into netqe19 and check. Thanks! Jean This is something that needs to be changed in libvirtd. We have a bug 1478791 to address this, but it has to be done from a triple-o / orchestration side. Thanks for the confirmation, Jean. |