Bug 2189684 - RHEL 8.8 XDP xdp_tools hangs on xdp-tools-dump-native-*
Summary: RHEL 8.8 XDP xdp_tools hangs on xdp-tools-dump-native-*
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: xdp-tools
Version: 8.8
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Toke Høiland-Jørgensen
QA Contact: Christian Trautman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-25 21:46 UTC by Jon Trossbach
Modified: 2023-07-27 09:29 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-155779 0 None None None 2023-04-25 21:46:41 UTC

Description Jon Trossbach 2023-04-25 21:46:06 UTC
Setting as urgent as this is happening in spotcheck. Will Update with verbose outputs when I have them.


Description of problem:
 xdp_tools mlx5-cx5 hangs on xdp-tools-dump-native-promis -- promiscuous mode testcase

Version-Release number of selected component (if applicable):
xdp-tools       x86_64       1.2.10-1.el8

How reproducible:
Always

Steps to Reproduce:
:: [ 22:45:00 ] :: [   LOG    ] :: Start recording CPU usage
:: [ 22:45:00 ] :: [  BEGIN   ] :: Running './CpuReporter -f xdp_tools_dump_native_snap.html &'
:: [ 22:45:00 ] :: [   PASS   ] :: Command './CpuReporter -f xdp_tools_dump_native_snap.html &' (Expected 0,1, got 0)
:: [ 22:45:00 ] :: [  BEGIN   ] :: Running 'sleep 10'
:: [ 22:45:10 ] :: [   PASS   ] :: Command 'sleep 10' (Expected 0, got 0)
x86_64
:: [ 22:45:10 ] :: [  BEGIN   ] :: Running 'xdp_tools_dump_native_snap'

Actual results:

8: ens1f0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 xdp qdisc mq state UP mode DEFAULT group default qlen 1000
8: ens1f0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 xdp qdisc mq state UP mode DEFAULT group default qlen 1000
wait for ens1f0 sec 0
8: ens1f0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 xdp qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:de:ad:de:ad:01 brd ff:ff:ff:ff:ff:ff permaddr 0c:42:a1:9d:04:52
    prog/xdp id 83 name xdp_dispatcher tag 94d5f00c20184d17 jited 
    altname enp59s0f0
Wait 0 secs until port becomes UP
SYNC_NC: sync_set client snapshot_and_write_test_base
SYNC_NC: sent "snapshot_and_write_test_base" to netqe1.knqe.lab.eng.bos.redhat.com
SYNC_NC: sync_wait client snapshot_and_write_test_base
SYNC_NC: waiting "netqe1.knqe.lab.eng.bos.redhat.com"
SYNC_NC: got "snapshot_and_write_test_base" from netqe1.knqe.lab.eng.bos.redhat.com
listening on ens1f0, ingress XDP program ID 90 func xdpfilt_alw_all, capture mode entry, capture size 20 bytes


Expected results:
Testcase passes

Additional info:
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/04/77650/7765031/13770441/159149284/taskout.log

Comment 2 Jon Trossbach 2023-05-09 15:04:01 UTC
Okay, weird update on this it could be that my machines. They are having trouble with xdp-tools-dump-native-promis sometimes on different cards. Unreproducibly, my machines are having trouble getting hung. Can't rule out a General x86 issue now.

Changing title as I have now seen this on i40e: https://beaker.engineering.redhat.com/jobs/7829687

And what seems like a related xdp-tools-dump-native-use-pcap issue: https://beaker.engineering.redhat.com/recipes/13865268#task159874824

These taken together point to a wider xdp-tools dumping issue possibly architecture related. Machines are PowerEdge R740.


Note You need to log in before you can comment on or make changes to this bug.