Bug 1737713 - OvS-DPDK with OvS2.11 is failing
Summary: OvS-DPDK with OvS2.11 is failing
Keywords:
Status: CLOSED DUPLICATE of bug 1711739
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: David Marchand
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-06 06:44 UTC by Saravanan KR
Modified: 2019-10-22 07:24 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-07 13:43:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Saravanan KR 2019-08-06 06:44:07 UTC
Description of problem:
Update openvswitch from 2.9 to 2.11 version on a node where OvS-DPDK is enabled. After package update, restart of openvswitch fails

Version-Release number of selected component (if applicable):
RHEL 7.7
Kernel - 3.10.0-1062.el7.x86_64
openvswitch2.11-2.11.0-14.el7fdp.x86_64
python-openvswitch2.11-2.11.0-14.el7fdp.x86_64


Steps to Reproduce:
1. Deploy OSP13 with Ovs-DPDK enabled
2. Ensure DPDK is enabled and ovs-vswitch service is running
3. Remove openvswitch and python-openvswitch with command "rpm -e --noscripts --nopreun --nopostun --notriggers --nodeps openvswitch python-openvswitch"
4. Install openvswitch2.11 from FDP channel
5. Restart openvswitch

Actual results:
ovs-vswitchd fails

Expected results:
ovs-vswitchd should be restarted succesfully

Additional info:

[root@computesriov-0 ~]# cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-3.10.0-1062.el7.x86_64 root=UUID=607531d3-71b1-4b48-aa56-7f0ecbcdafa5 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=2-19,22-39 skew_tick=1 nohz=on nohz_full=2-19,22-39 rcu_nocbs=2-19,22-39 tuned.non_isolcpus=00300003 intel_pstate=disable nosoftlockup

[root@computesriov-0 ~]# cat /proc/meminfo | grep -i hugepage
AnonHugePages:      8192 kB
HugePages_Total:      32
HugePages_Free:       32
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB

ovs-vswitchd log
----------------
2019-08-06T06:42:33.209Z|00008|dpdk|INFO|DPDK Enabled - initializing...
2019-08-06T06:42:33.209Z|00009|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2019-08-06T06:42:33.209Z|00010|dpdk|INFO|IOMMU support for vhost-user-client disabled.
2019-08-06T06:42:33.209Z|00011|dpdk|INFO|Per port memory for DPDK devices disabled.
2019-08-06T06:42:33.209Z|00012|dpdk|INFO|EAL ARGS: ovs-vswitchd --socket-mem 1024,1024 --socket-limit 1024,1024 -l 0.
2019-08-06T06:42:33.214Z|00013|dpdk|INFO|EAL: Detected 40 lcore(s)
2019-08-06T06:42:33.214Z|00014|dpdk|INFO|EAL: Detected 2 NUMA nodes
2019-08-06T06:42:33.216Z|00015|dpdk|INFO|EAL: Multi-process socket /var/run/openvswitch/dpdk/rte/mp_socket
2019-08-06T06:42:33.255Z|00016|dpdk|INFO|EAL: Probing VFIO support...
2019-08-06T06:42:46.790Z|00017|dpdk|ERR|EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.
2019-08-06T06:42:47.170Z|00018|dpdk|ERR|EAL: Cannot init memory
2019-08-06T06:42:47.170Z|00019|dpdk|EMER|Unable to initialize DPDK: Cannot allocate memory
2019-08-06T06:42:50.939Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted))

Comment 2 David Marchand 2019-08-06 07:01:18 UTC
We are currently tracking problems in dpdk intialisation in bz1711739.
The problem manifests when no pci device is configured so that dpdk can use them.

Can you list the pci network devices on this system?
- lspci |grep Ethernet

Can you list which devices are bound to vfio-pci?
- driverctl list-overrides

Comment 4 David Marchand 2019-08-06 07:55:47 UTC
Another thing to check to confirm the issue is the same as bz1711739, is to set the following workaround dpdk configuration in ovs db:

ovs-vsctl set Open_vSwitch . other_config:dpdk-extra="--iova-mode=va"
systemctl restart openvswitch

Comment 5 Saravanan KR 2019-08-06 08:42:03 UTC
sosreport - http://rhos-release.virt.bos.redhat.com/log/bz1737713/

> - driverctl list-overrides

I don't have any ports added. Enabled DPDK in a existing regular Compute node (with hugepages). I have removed it as the deployment was failing.

> Did cleaning up /dev/hugepages/rte_* before restarting new ovs-vswitchd help ?.

The same issue occurs in a fresh deployment, when I have ovs2.11 in the overcloud-full image itself.

> Another thing to check to confirm the issue is the same as bz1711739, is to set the following workaround dpdk configuration in ovs db:

After this workaround, ovs-vswitchd is successful.

Comment 6 David Marchand 2019-08-06 10:10:27 UTC
Please, could you have a try with the following test packages:
http://brew-task-repos.usersys.redhat.com/repos/scratch/dmarchan/openvswitch2.11/2.11.0/20.el7fdn.bz1711739/

Comment 7 Saravanan KR 2019-08-06 12:25:53 UTC
It worked.

[root@computeovsdpdksriov-0 ~]# ovs-vsctl get Open_vSwitch . other_config 
{dpdk-init="true"}

[root@computeovsdpdksriov-0 ~]# systemctl status ovs-vswitchd
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled)
   Active: active (running) since Tue 2019-08-06 12:19:53 UTC; 3min 44s ago
  Process: 522478 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop (code=exited, status=0/SUCCESS)
  Process: 522639 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random ${OVSUSER} start $OPTIONS (code=exited, status=0/SUCCESS)
  Process: 522636 ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages (code=exited, status=0/SUCCESS)
  Process: 522634 ExecStartPre=/bin/sh -c /usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages (code=exited, status=0/SUCCESS)
 Main PID: 522678 (ovs-vswitchd)
    Tasks: 9
   Memory: 33.2M
   CGroup: /system.slice/ovs-vswitchd.service
           └─522678 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/...

Aug 06 12:19:39 computeovsdpdksriov-0 systemd[1]: Starting Open vSwitch Forwarding Unit...
Aug 06 12:19:53 computeovsdpdksriov-0 ovs-ctl[522639]: Starting ovs-vswitchd [  OK  ]
Aug 06 12:19:53 computeovsdpdksriov-0 ovs-vsctl[522854]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=computeovsdp...caldomain
Aug 06 12:19:53 computeovsdpdksriov-0 ovs-ctl[522639]: Enabling remote OVSDB managers [  OK  ]
Aug 06 12:19:53 computeovsdpdksriov-0 systemd[1]: Started Open vSwitch Forwarding Unit.
Hint: Some lines were ellipsized, use -l to show in full.

[root@computeovsdpdksriov-0 ~]# yum list openvswitch2.11
Loaded plugins: product-id, search-disabled-repos, subscription-manager
Installed Packages
openvswitch2.11.x86_64                                     2.11.0-20.el7fdn.bz1711739                                     @/openvswitch2.11-2.11.0-20.el7fdn.bz1711739.x86_64

Comment 9 David Marchand 2019-08-07 12:37:40 UTC
Fixes are being pushed for 19.F in Fast Datapath channel.

What should I do with this bz?
Reassign it to your team for when you guys cross tag 19.F?


Note You need to log in before you can comment on or make changes to this bug.