Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1767678

Summary: [mlx5_core] OVS offload: mlx5_core 0000:04:00.0: cmd_work_handler:855:(pid 87647): failed to allocate command entry
Product: Red Hat Enterprise Linux Fast Datapath Reporter: qding
Component: openvswitchAssignee: Alaa Hleihel (NVIDIA Mellanox) <ahleihel>
openvswitch sub component: ovs-hw-offload QA Contact: Amit Supugade <asupugad>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: high CC: ahleihel, atragler, ctrautma, jfargen, knweiss, mleitner, qding
Version: FDP 19.G   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-20 10:05:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
testout for log
none
console log
none
console2.log
none
taskout2.log none

Description qding 2019-11-01 02:06:04 UTC
Created attachment 1631299 [details]
testout for log

Description of problem:



[17144.914469] mlx5_core 0000:04:00.0: wait_func:976:(pid 55347): DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command resource 
[17144.928744] mlx5_core 0000:04:00.0: del_hw_fte:497:(pid 55347): flow steering can't delete fte in index 0 of flow group id 49 
[17204.936184] mlx5_core 0000:04:00.0: wait_func:976:(pid 55347): DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command resource 
[17204.950456] mlx5_core 0000:04:00.0: del_hw_fte:497:(pid 55347): flow steering can't delete fte in index 2 of flow group id 49 
[17204.963557] mlx5_core 0000:04:00.0: mlx5_cmd_check:745:(pid 55347): DEALLOC_PACKET_REFORMAT_CONTEXT(0x93e) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x179e84) 
[17264.979916] mlx5_core 0000:04:00.0: wait_func:976:(pid 55347): DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command resource 
[17264.994189] mlx5_core 0000:04:00.0: del_hw_fte:497:(pid 55347): flow steering can't delete fte in index 3 of flow group id 49 
[17325.001587] mlx5_core 0000:04:00.0: wait_func:976:(pid 55347): DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command resource 
[17325.015862] mlx5_core 0000:04:00.0: del_hw_fte:497:(pid 55347): flow steering can't delete fte in index 1 of flow group id 49 
[17325.028629] mlx5_core 0000:04:00.0: mlx5_cmd_check:745:(pid 55347): DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x563e2f) 
[17325.045131] mlx5_core 0000:04:00.0: del_hw_flow_group:535:(pid 55347): flow steering can't destroy fg 49 of ft 3145728 
[17325.061441] mlx5_core 0000:04:00.0: mlx5_cmd_check:745:(pid 55347): DESTROY_FLOW_TABLE(0x931) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x66d6c3) 
[17325.077457] mlx5_core 0000:04:00.0: del_hw_flow_table:406:(pid 55347): flow steering can't destroy ft 
[17325.087939] mlx5_core 0000:04:00.0: mlx5_cmd_check:745:(pid 87647): DEALLOC_FLOW_COUNTER(0x93a) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x4bc68e) 


beaker jobs:
https://beaker.engineering.redhat.com/jobs/3869652
https://beaker.engineering.redhat.com/jobs/3867011

Version-Release number of selected component (if applicable):

[root@dell-per730-04 tc_offload]# uname -r
3.10.0-1099.el7.x86_64
[root@dell-per730-04 tc_offload]# rpm -q openvswitch2.11
openvswitch2.11-2.11.0-26.el7fdp.x86_64
[root@dell-per730-04 tc_offload]# 

How reproducible: two beaker jobs, but cannot reproduce it manually


Steps to Reproduce:
1.
2.
3.

Actual results:
system crashes

Expected results:


Additional info:

[root@dell-per730-04 tc_offload]# ethtool -i p6p1
driver: mlx5_core
version: 5.0-0
firmware-version: 16.26.1040 (MT_0000000009)
expansion-rom-version: 
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[root@dell-per730-04 tc_offload]# lspci -m -s 04:00.0
04:00.0 "Ethernet controller" "Mellanox Technologies" "MT28800 Family [ConnectX-5 Ex]" "Mellanox Technologies" "Device 0002"
[root@dell-per730-04 tc_offload]#

Comment 1 qding 2019-11-01 02:06:40 UTC
Created attachment 1631311 [details]
console log

Comment 2 Marcelo Ricardo Leitner 2019-11-01 02:42:56 UTC
Sounds like a FW issue.

Comment 3 Marcelo Ricardo Leitner 2019-11-14 14:15:25 UTC
Ping Alaa.

Comment 4 Alaa Hleihel (NVIDIA Mellanox) 2019-11-18 16:41:33 UTC
Hi qding

does this still reproduce ?
if so, were you able to find the steps for reproducing it ?

Thanks,
Alaa

Comment 5 qding 2019-11-21 01:27:16 UTC
(In reply to Alaa Hleihel (Mellanox) from comment #4)
> Hi qding
> 
> does this still reproduce ?

Still reproduced with the latest rhel7.8 distro RHEL-7.8-20191119.0 in which the kernel is  kernel-3.10.0-1111.el7. Please see the beaker job https://beaker.engineering.redhat.com/jobs/3906372.

> if so, were you able to find the steps for reproducing it ?
> 

Unfortunately I haven't found steps to reproduce it but beaker jobs.

Thanks
Qijun

Comment 6 qding 2019-11-21 02:06:53 UTC
Created attachment 1638270 [details]
console2.log

Comment 7 qding 2019-11-21 02:07:26 UTC
Created attachment 1638271 [details]
taskout2.log

Comment 8 Alaa Hleihel (NVIDIA Mellanox) 2019-11-25 14:22:26 UTC
(In reply to qding from comment #5)
> (In reply to Alaa Hleihel (Mellanox) from comment #4)
> > Hi qding
> > 
> > does this still reproduce ?
> 
> Still reproduced with the latest rhel7.8 distro RHEL-7.8-20191119.0 in which
> the kernel is  kernel-3.10.0-1111.el7. Please see the beaker job
> https://beaker.engineering.redhat.com/jobs/3906372.
> 
> > if so, were you able to find the steps for reproducing it ?
> > 
> 
> Unfortunately I haven't found steps to reproduce it but beaker jobs.
> 
> Thanks
> Qijun

I see, thanks for the update.

Any chance to add the name of the test to dmesg output? 
maybe this will help us find the test that is triggering the issue,
something like this at the start and end of each test:
# echo "START: $0" > /dev/kmsg
# echo "END: $0" > /dev/kmsg

Comment 9 qding 2020-02-04 01:34:43 UTC
(In reply to Alaa Hleihel (Mellanox) from comment #8)
> (In reply to qding from comment #5)
> > (In reply to Alaa Hleihel (Mellanox) from comment #4)
> > > Hi qding
> > > 
> > > does this still reproduce ?
> > 
> > Still reproduced with the latest rhel7.8 distro RHEL-7.8-20191119.0 in which
> > the kernel is  kernel-3.10.0-1111.el7. Please see the beaker job
> > https://beaker.engineering.redhat.com/jobs/3906372.
> > 
> > > if so, were you able to find the steps for reproducing it ?
> > > 
> > 
> > Unfortunately I haven't found steps to reproduce it but beaker jobs.
> > 
> > Thanks
> > Qijun
> 
> I see, thanks for the update.
> 
> Any chance to add the name of the test to dmesg output? 
> maybe this will help us find the test that is triggering the issue,
> something like this at the start and end of each test:
> # echo "START: $0" > /dev/kmsg
> # echo "END: $0" > /dev/kmsg

The log file is as https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2020/02/40487/4048746/7860482/console.log
and the corresponding beaker task is https://beaker.engineering.redhat.com/jobs/4048746,
and this time is found with rhel8 latest distro and the latest openvswitch2.12-2.12.0-21.el8fdp.

the log when the problem happens as below. the test is for mirror to IPv4 GRE.

[23001.674455] START ovs_test_tc_offload_mirror_VF1_VF2_to_gre4()... 
[23037.002941] restraintd[4906]: *** Current Time: Mon Feb 03 10:54:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23036.624970] device ens1f0_0 entered promiscuous mode 
[23036.645669] device ens1f0_1 entered promiscuous mode 
[23036.665516] device gre_sys entered promiscuous mode 
[-- MARK -- Mon Feb  3 15:55:00 2020] 
[23097.010345] restraintd[4906]: *** Current Time: Mon Feb 03 10:55:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23116.163175] device gre_sys left promiscuous mode 
[23116.205363] device gre_sys entered promiscuous mode 
[23157.031139] restraintd[4906]: *** Current Time: Mon Feb 03 10:56:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23186.368626] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47863): DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command resource 
[23186.381404] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47863): flow steering can't delete fte in index 838860 of flow group id 119 
[23217.030082] restraintd[4906]: *** Current Time: Mon Feb 03 10:57:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23247.802778] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47872): DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command resource 
[23247.815557] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47872): flow steering can't delete fte in index 1 of flow group id 118 
[23277.029276] restraintd[4906]: *** Current Time: Mon Feb 03 10:58:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23309.237334] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47872): DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command resource 
[23309.250114] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47872): flow steering can't delete fte in index 3 of flow group id 118 
[23309.261870] mlx5_core 0000:3b:00.0: mlx5_cmd_check:752:(pid 47872): DEALLOC_PACKET_REFORMAT_CONTEXT(0x93e) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x179e84) 
[23337.027635] restraintd[4906]: *** Current Time: Mon Feb 03 10:59:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[-- MARK -- Mon Feb  3 16:00:00 2020] 
[23370.671726] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47873): DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command resource 
[23370.684508] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47873): flow steering can't delete fte in index 0 of flow group id 118 
[23397.062166] restraintd[4906]: *** Current Time: Mon Feb 03 11:00:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23432.106215] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47873): DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command resource 
[23432.119003] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47873): flow steering can't delete fte in index 2 of flow group id 118 
[23432.130555] mlx5_core 0000:3b:00.0: mlx5_cmd_check:752:(pid 47873): DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x563e2f) 
[23432.145319] mlx5_core 0000:3b:00.0: del_hw_flow_group:537:(pid 47873): flow steering can't destroy fg 118 of ft 3145728 
[23457.025603] restraintd[4906]: *** Current Time: Mon Feb 03 11:01:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23493.540964] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47873): DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command resource 
[23493.553745] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47873): flow steering can't delete fte in index 838861 of flow group id 119 
[23493.565669] mlx5_core 0000:3b:00.0: mlx5_cmd_check:752:(pid 47873): DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x563e2f) 
[23493.580432] mlx5_core 0000:3b:00.0: del_hw_flow_group:537:(pid 47873): flow steering can't destroy fg 119 of ft 3145728 
[23493.596152] mlx5_core 0000:3b:00.0: mlx5_cmd_check:752:(pid 47873): DESTROY_FLOW_TABLE(0x931) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x66d6c3) 
[23493.610497] mlx5_core 0000:3b:00.0: del_hw_flow_table:408:(pid 47873): flow steering can't destroy ft 
[23493.619945] mlx5_core 0000:3b:00.0: mlx5_cmd_check:752:(pid 23948): DEALLOC_FLOW_COUNTER(0x93a) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x4bc68e) 
[23517.003766] restraintd[4906]: *** Current Time: Mon Feb 03 11:02:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23577.014488] restraintd[4906]: *** Current Time: Mon Feb 03 11:03:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23597.785217] device gre_sys left promiscuous mode 
[23597.820456] device gre_sys entered promiscuous mode 
[23637.009296] restraintd[4906]: *** Current Time: Mon Feb 03 11:04:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[-- MARK -- Mon Feb  3 16:05:00 2020] 
[23697.028572] restraintd[4906]: *** Current Time: Mon Feb 03 11:05:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23757.036548] restraintd[4906]: *** Current Time: Mon Feb 03 11:06:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23817.027052] restraintd[4906]: *** Current Time: Mon Feb 03 11:07:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23877.043372] restraintd[4906]: *** Current Time: Mon Feb 03 11:08:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[23937.038729] restraintd[4906]: *** Current Time: Mon Feb 03 11:09:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[-- MARK -- Mon Feb  3 16:10:00 2020] 
[23997.040096] restraintd[4906]: *** Current Time: Mon Feb 03 11:10:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[24057.038179] restraintd[4906]: *** Current Time: Mon Feb 03 11:11:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[24117.037448] restraintd[4906]: *** Current Time: Mon Feb 03 11:12:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[24177.035836] restraintd[4906]: *** Current Time: Mon Feb 03 11:13:49 2020  Localwatchdog at: Tue Feb 04 04:43:49 2020 
[24208.253102] device ens1f0_0 left promiscuous mode 
[24208.271561] device ens1f0_1 left promiscuous mode 
[24208.285492] device gre_sys left promiscuous mode 
[24208.319878] END ovs_test_tc_offload_mirror_VF1_VF2_to_gre4() 
[24214.096662] START ovs_test_tc_offload_mirror_VF1_VF2_to_vxlan4()...

Comment 10 Alaa Hleihel (NVIDIA Mellanox) 2020-02-04 12:37:31 UTC
(In reply to qding from comment #9)
> (In reply to Alaa Hleihel (Mellanox) from comment #8)
> > (In reply to qding from comment #5)
> > > (In reply to Alaa Hleihel (Mellanox) from comment #4)
> > > > Hi qding
> > > > 
> > > > does this still reproduce ?
> > > 
> > > Still reproduced with the latest rhel7.8 distro RHEL-7.8-20191119.0 in which
> > > the kernel is  kernel-3.10.0-1111.el7. Please see the beaker job
> > > https://beaker.engineering.redhat.com/jobs/3906372.
> > > 
> > > > if so, were you able to find the steps for reproducing it ?
> > > > 
> > > 
> > > Unfortunately I haven't found steps to reproduce it but beaker jobs.
> > > 
> > > Thanks
> > > Qijun
> > 
> > I see, thanks for the update.
> > 
> > Any chance to add the name of the test to dmesg output? 
> > maybe this will help us find the test that is triggering the issue,
> > something like this at the start and end of each test:
> > # echo "START: $0" > /dev/kmsg
> > # echo "END: $0" > /dev/kmsg
> 
> The log file is as
> https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2020/02/
> 40487/4048746/7860482/console.log
> and the corresponding beaker task is
> https://beaker.engineering.redhat.com/jobs/4048746,
> and this time is found with rhel8 latest distro and the latest
> openvswitch2.12-2.12.0-21.el8fdp.
> 
> the log when the problem happens as below. the test is for mirror to IPv4
> GRE.
> 
> [23001.674455] START ovs_test_tc_offload_mirror_VF1_VF2_to_gre4()... 
> [23037.002941] restraintd[4906]: *** Current Time: Mon Feb 03 10:54:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23036.624970] device ens1f0_0 entered promiscuous mode 
> [23036.645669] device ens1f0_1 entered promiscuous mode 
> [23036.665516] device gre_sys entered promiscuous mode 
> [-- MARK -- Mon Feb  3 15:55:00 2020] 
> [23097.010345] restraintd[4906]: *** Current Time: Mon Feb 03 10:55:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23116.163175] device gre_sys left promiscuous mode 
> [23116.205363] device gre_sys entered promiscuous mode 
> [23157.031139] restraintd[4906]: *** Current Time: Mon Feb 03 10:56:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23186.368626] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47863):
> DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command
> resource 
> [23186.381404] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47863): flow
> steering can't delete fte in index 838860 of flow group id 119 
> [23217.030082] restraintd[4906]: *** Current Time: Mon Feb 03 10:57:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23247.802778] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47872):
> DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command
> resource 
> [23247.815557] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47872): flow
> steering can't delete fte in index 1 of flow group id 118 
> [23277.029276] restraintd[4906]: *** Current Time: Mon Feb 03 10:58:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23309.237334] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47872):
> DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command
> resource 
> [23309.250114] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47872): flow
> steering can't delete fte in index 3 of flow group id 118 
> [23309.261870] mlx5_core 0000:3b:00.0: mlx5_cmd_check:752:(pid 47872):
> DEALLOC_PACKET_REFORMAT_CONTEXT(0x93e) op_mod(0x0) failed, status bad
> resource state(0x9), syndrome (0x179e84) 
> [23337.027635] restraintd[4906]: *** Current Time: Mon Feb 03 10:59:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [-- MARK -- Mon Feb  3 16:00:00 2020] 
> [23370.671726] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47873):
> DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command
> resource 
> [23370.684508] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47873): flow
> steering can't delete fte in index 0 of flow group id 118 
> [23397.062166] restraintd[4906]: *** Current Time: Mon Feb 03 11:00:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23432.106215] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47873):
> DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command
> resource 
> [23432.119003] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47873): flow
> steering can't delete fte in index 2 of flow group id 118 
> [23432.130555] mlx5_core 0000:3b:00.0: mlx5_cmd_check:752:(pid 47873):
> DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource
> state(0x9), syndrome (0x563e2f) 
> [23432.145319] mlx5_core 0000:3b:00.0: del_hw_flow_group:537:(pid 47873):
> flow steering can't destroy fg 118 of ft 3145728 
> [23457.025603] restraintd[4906]: *** Current Time: Mon Feb 03 11:01:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23493.540964] mlx5_core 0000:3b:00.0: wait_func:985:(pid 47873):
> DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command
> resource 
> [23493.553745] mlx5_core 0000:3b:00.0: del_hw_fte:499:(pid 47873): flow
> steering can't delete fte in index 838861 of flow group id 119 
> [23493.565669] mlx5_core 0000:3b:00.0: mlx5_cmd_check:752:(pid 47873):
> DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource
> state(0x9), syndrome (0x563e2f) 
> [23493.580432] mlx5_core 0000:3b:00.0: del_hw_flow_group:537:(pid 47873):
> flow steering can't destroy fg 119 of ft 3145728 
> [23493.596152] mlx5_core 0000:3b:00.0: mlx5_cmd_check:752:(pid 47873):
> DESTROY_FLOW_TABLE(0x931) op_mod(0x0) failed, status bad parameter(0x3),
> syndrome (0x66d6c3) 
> [23493.610497] mlx5_core 0000:3b:00.0: del_hw_flow_table:408:(pid 47873):
> flow steering can't destroy ft 
> [23493.619945] mlx5_core 0000:3b:00.0: mlx5_cmd_check:752:(pid 23948):
> DEALLOC_FLOW_COUNTER(0x93a) op_mod(0x0) failed, status bad resource
> state(0x9), syndrome (0x4bc68e) 
> [23517.003766] restraintd[4906]: *** Current Time: Mon Feb 03 11:02:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23577.014488] restraintd[4906]: *** Current Time: Mon Feb 03 11:03:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23597.785217] device gre_sys left promiscuous mode 
> [23597.820456] device gre_sys entered promiscuous mode 
> [23637.009296] restraintd[4906]: *** Current Time: Mon Feb 03 11:04:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [-- MARK -- Mon Feb  3 16:05:00 2020] 
> [23697.028572] restraintd[4906]: *** Current Time: Mon Feb 03 11:05:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23757.036548] restraintd[4906]: *** Current Time: Mon Feb 03 11:06:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23817.027052] restraintd[4906]: *** Current Time: Mon Feb 03 11:07:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23877.043372] restraintd[4906]: *** Current Time: Mon Feb 03 11:08:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [23937.038729] restraintd[4906]: *** Current Time: Mon Feb 03 11:09:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [-- MARK -- Mon Feb  3 16:10:00 2020] 
> [23997.040096] restraintd[4906]: *** Current Time: Mon Feb 03 11:10:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [24057.038179] restraintd[4906]: *** Current Time: Mon Feb 03 11:11:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [24117.037448] restraintd[4906]: *** Current Time: Mon Feb 03 11:12:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [24177.035836] restraintd[4906]: *** Current Time: Mon Feb 03 11:13:49 2020 
> Localwatchdog at: Tue Feb 04 04:43:49 2020 
> [24208.253102] device ens1f0_0 left promiscuous mode 
> [24208.271561] device ens1f0_1 left promiscuous mode 
> [24208.285492] device gre_sys left promiscuous mode 
> [24208.319878] END ovs_test_tc_offload_mirror_VF1_VF2_to_gre4() 
> [24214.096662] START ovs_test_tc_offload_mirror_VF1_VF2_to_vxlan4()...


Great, thanks for the update!

Comment 12 Alaa Hleihel (NVIDIA Mellanox) 2021-02-18 09:17:00 UTC
Hi,

> [17144.914469] mlx5_core 0000:04:00.0: wait_func:976:(pid 55347): DELETE_FLOW_TABLE_ENTRY(0x938) timeout. Will cause a leak of a command resource 

Firmware commands are hitting timeout, so likely it's a FW bug.
We fixed a similar issue just recently in BZ #1916563.

Can you retest using the latest FW version (xx.29.2002)?

Thanks,
Alaa

Comment 13 qding 2021-02-20 10:05:29 UTC
No such issue with the lastest firmware-version: 16.29.2002 (MT_0000000012). So to close the bug.
Beaker job:
CX-5: https://beaker.engineering.redhat.com/jobs/5111774
CX-6 Dx: https://beaker.engineering.redhat.com/jobs/5109347

Comment 14 Alaa Hleihel (NVIDIA Mellanox) 2021-02-22 10:05:23 UTC
(In reply to qding from comment #13)
> No such issue with the lastest firmware-version: 16.29.2002 (MT_0000000012).
> So to close the bug.
> Beaker job:
> CX-5: https://beaker.engineering.redhat.com/jobs/5111774
> CX-6 Dx: https://beaker.engineering.redhat.com/jobs/5109347

Great, thanks a lot !