When the DPDK (version 2.2) application (6wind) within the guest (rhel 7.2) starts it causes the host's process to segfault. sos_commands/logs/journalctl_--no-pager_--catalog_--boot ~~~ Jul 20 09:20:40 compute-6 kernel: ovs-vswitchd[629694]: segfault at 0 ip 00007fae81105016 sp 00007ffd87853608 error 4 in libc-2.17.so[7fae80fc7000+1c2000] Jul 20 09:20:41 compute-6 systemd[1]: ovs-vswitchd.service: main process exited, code=killed, status=11/SEGV ~~~ With 4 PMD in the guest it breaks, however with 8 it works.
Reminder: * issue is with 2.9.0-103 * The same test is working in openvswitch 2.9.0-56 and openvswitch 2.6 Things don't crash, it's just the port that's down once the DPDK application in the VM starts. When I unbing the igb_uio driver, then the port goes back up. Issue is when number of guest pmd cpu less than 8 with host queues 8: e.g. if guest 4 and host 8 then down If host queues is 4, then 4 guest PMDs is fine.
2.9.0-97 is the earliest version where the customer noticed it.
We downgrade on this system to -56 and could repeat the issue, hence this does not seem to be tied to the specific OVS version. The customer has another cluster with minor -56 where he cannot reproduce the issue.
This is from my lab, qemu-kvm-rhev runs from the nova_libvirt container: root@computeovsdpdk-0 qemu-test-rpm]# cat /proc/$(pgrep -f 4d498516-2e6a-473e-8595-310319bc5d54)/mountinfo 463 424 0:44 / / rw,relatime - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay2/l/6IWB576M62AOC35CXPKRMC4LAC:/var/lib/docker/overlay2/l/4VYJWYYFGPYEBOALNVSHLJ6VHA:/var/lib/docker/overlay2/l/BPAHILLCHSX2LLAZ6CGWHDKGAH:/var/lib/docker/overlay2/l/UFF6YDIKC7OBCFDYH2QKXVTAWI:/var/lib/docker/overlay2/l/BSNQ34QHDDH4Q4JRCHGNNXSRT7,upperdir=/var/lib/docker/overlay2/61883b7bd6fc3bea9afbc640de3e8b985a9cfd0c47aadafebf2eeb818bec7c7d/diff,workdir=/var/lib/docker/overlay2/61883b7bd6fc3bea9afbc640de3e8b985a9cfd0c47aadafebf2eeb818bec7c7d/work 464 463 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 465 463 0:17 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw,seclabel 466 465 0:15 / /sys/fs/selinux rw,relatime - selinuxfs selinuxfs rw 467 465 0:20 / /sys/fs/cgroup ro,nosuid,nodev,noexec - tmpfs tmpfs ro,seclabel,mode=755 468 467 0:21 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 469 467 0:23 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpuset 470 467 0:24 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,perf_event 471 467 0:25 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,net_prio,net_cls 472 467 0:26 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,devices 473 467 0:27 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpuacct,cpu 474 467 0:28 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory 475 467 0:29 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,pids 476 467 0:30 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,freezer 477 467 0:31 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,hugetlb 478 467 0:32 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,blkio 479 463 0:5 / /dev rw,nosuid - devtmpfs devtmpfs rw,seclabel,size=49061156k,nr_inodes=12265289,mode=755 480 479 0:18 / /dev/shm rw,nosuid,nodev - tmpfs tmpfs rw,seclabel 481 642 0:45 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw,seclabel,size=65536k 482 642 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000 483 642 0:37 / /dev/hugepages rw,relatime - hugetlbfs hugetlbfs rw,seclabel 484 642 0:14 / /dev/mqueue rw,relatime - mqueue mqueue rw,seclabel 485 642 0:5 /log /dev/log rw,nosuid - devtmpfs devtmpfs rw,seclabel,size=49061156k,nr_inodes=12265289,mode=755 486 463 0:19 / /run rw,nosuid,nodev - tmpfs tmpfs rw,seclabel,mode=755 526 486 0:39 / /run/user/975 rw,nosuid,nodev,relatime - tmpfs tmpfs rw,seclabel,size=13180676k,mode=700,uid=975,gid=971 527 486 0:38 / /run/user/0 rw,nosuid,nodev,relatime - tmpfs tmpfs rw,seclabel,size=13180676k,mode=700 528 486 0:3 / /run/docker/netns/default rw,nosuid,nodev,noexec,relatime - proc proc rw 529 486 8:2 /var/lib/docker/containers/0caf80934ad90f075ce9ccac7a99d755bf8244f07a28145aa5e5843973931932/secrets//deleted /run/secrets rw,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 530 486 0:19 /libvirt /run/libvirt rw,nosuid,nodev - tmpfs tmpfs rw,seclabel,mode=755 531 463 8:2 /usr/lib/modules /usr/lib/modules ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 532 463 8:2 /etc/puppet /etc/puppet ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 533 463 8:2 /etc/libvirt /etc/libvirt rw,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 534 463 8:2 /usr/share/zoneinfo/UTC /usr/share/zoneinfo/UTC ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 535 463 8:2 /etc/hosts /etc/hosts ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 536 463 8:2 /var/lib/docker/containers/0caf80934ad90f075ce9ccac7a99d755bf8244f07a28145aa5e5843973931932/hostname /etc/hostname rw,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 537 463 8:2 /var/lib/docker/containers/0caf80934ad90f075ce9ccac7a99d755bf8244f07a28145aa5e5843973931932/resolv.conf /etc/resolv.conf rw,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 589 463 8:2 /var/log/containers/libvirt /var/log/libvirt rw,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 590 589 8:2 /var/log/libvirt/qemu /var/log/libvirt/qemu ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 591 463 8:2 /var/lib/nova /var/lib/nova rw,relatime master:1 - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 592 463 8:2 /etc/ssh/ssh_known_hosts /etc/ssh/ssh_known_hosts ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 593 463 8:2 /var/lib/libvirt /var/lib/libvirt rw,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 594 463 8:2 /var/lib/vhost_sockets /var/lib/vhost_sockets rw,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 595 463 8:2 /etc/pki/ca-trust/extracted /etc/pki/ca-trust/extracted ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 596 595 8:2 /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 597 596 8:2 /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 598 595 8:2 /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 599 463 8:2 /var/lib/config-data/puppet-generated/nova_libvirt /var/lib/kolla/config_files/src ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 600 463 8:2 /etc/pki/ca-trust/source/anchors /etc/pki/ca-trust/source/anchors ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 601 463 8:2 /var/lib/kolla/config_files/nova_libvirt.json /var/lib/kolla/config_files/config.json ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 641 463 8:2 /etc/ceph /var/lib/kolla/config_files/src-ceph ro,relatime - xfs /dev/sda2 rw,seclabel,attr2,inode64,noquota 642 479 0:85 / /dev rw,nosuid,relatime - tmpfs devfs rw,seclabel,size=64k,mode=755 [root@computeovsdpdk-0 qemu-test-rpm]# [root@computeovsdpdk-0 qemu-test-rpm]# /var/lib/docker/overlay2/l/4VYJWYYFGPYEBOALNVSHLJ6VHA/usr/libexec/qemu-kvm --version QEMU emulator version 2.12.0 (qemu-kvm-rhev-2.12.0-18.el7_6.1) Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers [root@computeovsdpdk-0 qemu-test-rpm]#
Hi Marc and Andreas, My understanding is that 6Wind has backported DPDK patches mentioned in Comment 51 in their VNF, and so issue is no more reproduced. Regarding QEMU/OVS-DPDK solution mentioned in Comment 40: " Regarding the solutions, the ideal one would be that we have a new vhost-user protocol feature to notify the backend that the driver set DRIVER_OK. But it would take time to get the spec accepted upstream and also get the implementation done, merged upstream and implemented. " This is addressed in Bz1548112, and will be available in OVS-DPDK v2.14. I propose to close as duplicate of Bz1548112 for the QEMU/OVS-DPDK part. *** This bug has been marked as a duplicate of bug 1548112 ***