RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1798814 - [rhel-7.6.z] dat_ia_close() does not release the virtual function contexts for Mellanox ROCE ports
Summary: [rhel-7.6.z] dat_ia_close() does not release the virtual function contexts fo...
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: dapl
Version: 7.6
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Honggang LI
QA Contact: Infiniband QE
URL:
Whiteboard:
Depends On: 1784193
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-06 04:37 UTC by Honggang LI
Modified: 2020-02-12 08:30 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1784193
Environment:
Last Closed: 2020-02-12 08:30:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Honggang LI 2020-02-06 04:37:09 UTC
+++ This bug was initially created as a clone of Bug #1784193 +++

Description of problem:
Sequential execution of UDAP API Calls - Open/Close ROCE port,  breaks after 28 iteration. This indicates that the Close call does not actually release the connection. Tested and observed on IBM Z (s390x). However the connection leak does not seem to be architecture specific and must exist on x86 as well.

Similar test was performed with VERBS API calls using ibv_open_device /     ibv_close_device.  No error observed with 60 iterations.

Version-Release number of selected component (if applicable):
dapl 2.1.5-2.el7

How reproducible:
UDAPL code fails after 28 open/close iterations
 
  for( int i = 0 ; i < 60 ; i++ )
  {
     DAT_IA_HANDLE  iaHandle = DAT_HANDLE_NULL;
     DAT_EVD_HANDLE evdHandle   = DAT_HANDLE_NULL;
     cout << "open number " << i << endl ;
     status = dat_ia_open(gDevName, SVR_EVD_QLEN, &evdHandle, &iaHandle);
     if (DAT_SUCCESS != (status = dat_ia_close(iaHandle, DAT_CLOSE_GRACEFUL_FLAG) ))
     {
         printError("dat_ia_close", status);
         return 1;
     }
  }
 
 
./UdaplUtility ofa-v2-roe0
open number 0
open number 1
open number 2
open number 3
...
open number 27
open number 28
open number 29
host1:CMA:747b:a4377720: 3452 us(3452 us):  open_hca: rdma_bind ERR No such device. Is enP303p0s0.66 configured as IPoIB?
failure: dat_ia_open 0x120000

Steps to Reproduce:
1. Start the process
2. Open ROCE port via dat_ia_open() call
3. Close ROCE port via dat_ia_close() call
4. Repeat #2 for 60 times

Actual results:
UDAPL code fails after 28 open/close iterations

Expected results:
Since the connection is closed. There should be no limit in how many consecutive open/close can be executed successfully

Additional info:

Comment 2 Honggang LI 2020-02-06 04:39:28 UTC
IBM asks for fixes for rhel-7.7 and rhel-7.6. This bug opened for rhel-7.6.z .

Comment 3 Michal Schmidt 2020-02-12 08:30:26 UTC
To fix this issue in 7.6.z we must follow the z-stream process from bug 1784193.
This BZ is an improper clone. Closing.


Note You need to log in before you can comment on or make changes to this bug.