Description of problem: Update openvswitch from 2.9 to 2.11 version on a node where OvS-DPDK is enabled. After package update, restart of openvswitch fails Version-Release number of selected component (if applicable): RHEL 7.7 Kernel - 3.10.0-1062.el7.x86_64 openvswitch2.11-2.11.0-14.el7fdp.x86_64 python-openvswitch2.11-2.11.0-14.el7fdp.x86_64 Steps to Reproduce: 1. Deploy OSP13 with Ovs-DPDK enabled 2. Ensure DPDK is enabled and ovs-vswitch service is running 3. Remove openvswitch and python-openvswitch with command "rpm -e --noscripts --nopreun --nopostun --notriggers --nodeps openvswitch python-openvswitch" 4. Install openvswitch2.11 from FDP channel 5. Restart openvswitch Actual results: ovs-vswitchd fails Expected results: ovs-vswitchd should be restarted succesfully Additional info: [root@computesriov-0 ~]# cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.10.0-1062.el7.x86_64 root=UUID=607531d3-71b1-4b48-aa56-7f0ecbcdafa5 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=2-19,22-39 skew_tick=1 nohz=on nohz_full=2-19,22-39 rcu_nocbs=2-19,22-39 tuned.non_isolcpus=00300003 intel_pstate=disable nosoftlockup [root@computesriov-0 ~]# cat /proc/meminfo | grep -i hugepage AnonHugePages: 8192 kB HugePages_Total: 32 HugePages_Free: 32 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB ovs-vswitchd log ---------------- 2019-08-06T06:42:33.209Z|00008|dpdk|INFO|DPDK Enabled - initializing... 2019-08-06T06:42:33.209Z|00009|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch 2019-08-06T06:42:33.209Z|00010|dpdk|INFO|IOMMU support for vhost-user-client disabled. 2019-08-06T06:42:33.209Z|00011|dpdk|INFO|Per port memory for DPDK devices disabled. 2019-08-06T06:42:33.209Z|00012|dpdk|INFO|EAL ARGS: ovs-vswitchd --socket-mem 1024,1024 --socket-limit 1024,1024 -l 0. 2019-08-06T06:42:33.214Z|00013|dpdk|INFO|EAL: Detected 40 lcore(s) 2019-08-06T06:42:33.214Z|00014|dpdk|INFO|EAL: Detected 2 NUMA nodes 2019-08-06T06:42:33.216Z|00015|dpdk|INFO|EAL: Multi-process socket /var/run/openvswitch/dpdk/rte/mp_socket 2019-08-06T06:42:33.255Z|00016|dpdk|INFO|EAL: Probing VFIO support... 2019-08-06T06:42:46.790Z|00017|dpdk|ERR|EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function. 2019-08-06T06:42:47.170Z|00018|dpdk|ERR|EAL: Cannot init memory 2019-08-06T06:42:47.170Z|00019|dpdk|EMER|Unable to initialize DPDK: Cannot allocate memory 2019-08-06T06:42:50.939Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted))
We are currently tracking problems in dpdk intialisation in bz1711739. The problem manifests when no pci device is configured so that dpdk can use them. Can you list the pci network devices on this system? - lspci |grep Ethernet Can you list which devices are bound to vfio-pci? - driverctl list-overrides
Another thing to check to confirm the issue is the same as bz1711739, is to set the following workaround dpdk configuration in ovs db: ovs-vsctl set Open_vSwitch . other_config:dpdk-extra="--iova-mode=va" systemctl restart openvswitch
sosreport - http://rhos-release.virt.bos.redhat.com/log/bz1737713/ > - driverctl list-overrides I don't have any ports added. Enabled DPDK in a existing regular Compute node (with hugepages). I have removed it as the deployment was failing. > Did cleaning up /dev/hugepages/rte_* before restarting new ovs-vswitchd help ?. The same issue occurs in a fresh deployment, when I have ovs2.11 in the overcloud-full image itself. > Another thing to check to confirm the issue is the same as bz1711739, is to set the following workaround dpdk configuration in ovs db: After this workaround, ovs-vswitchd is successful.
Please, could you have a try with the following test packages: http://brew-task-repos.usersys.redhat.com/repos/scratch/dmarchan/openvswitch2.11/2.11.0/20.el7fdn.bz1711739/
It worked. [root@computeovsdpdksriov-0 ~]# ovs-vsctl get Open_vSwitch . other_config {dpdk-init="true"} [root@computeovsdpdksriov-0 ~]# systemctl status ovs-vswitchd ● ovs-vswitchd.service - Open vSwitch Forwarding Unit Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled) Active: active (running) since Tue 2019-08-06 12:19:53 UTC; 3min 44s ago Process: 522478 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop (code=exited, status=0/SUCCESS) Process: 522639 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random ${OVSUSER} start $OPTIONS (code=exited, status=0/SUCCESS) Process: 522636 ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages (code=exited, status=0/SUCCESS) Process: 522634 ExecStartPre=/bin/sh -c /usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages (code=exited, status=0/SUCCESS) Main PID: 522678 (ovs-vswitchd) Tasks: 9 Memory: 33.2M CGroup: /system.slice/ovs-vswitchd.service └─522678 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/... Aug 06 12:19:39 computeovsdpdksriov-0 systemd[1]: Starting Open vSwitch Forwarding Unit... Aug 06 12:19:53 computeovsdpdksriov-0 ovs-ctl[522639]: Starting ovs-vswitchd [ OK ] Aug 06 12:19:53 computeovsdpdksriov-0 ovs-vsctl[522854]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=computeovsdp...caldomain Aug 06 12:19:53 computeovsdpdksriov-0 ovs-ctl[522639]: Enabling remote OVSDB managers [ OK ] Aug 06 12:19:53 computeovsdpdksriov-0 systemd[1]: Started Open vSwitch Forwarding Unit. Hint: Some lines were ellipsized, use -l to show in full. [root@computeovsdpdksriov-0 ~]# yum list openvswitch2.11 Loaded plugins: product-id, search-disabled-repos, subscription-manager Installed Packages openvswitch2.11.x86_64 2.11.0-20.el7fdn.bz1711739 @/openvswitch2.11-2.11.0-20.el7fdn.bz1711739.x86_64
Fixes are being pushed for 19.F in Fast Datapath channel. What should I do with this bz? Reassign it to your team for when you guys cross tag 19.F?