RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1353076 - [RFE] adapt to modules changes in upstream kernel 4.7
Summary: [RFE] adapt to modules changes in upstream kernel 4.7
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: rdma
Version: 7.3
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Jarod Wilson
QA Contact: zguo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-06 02:46 UTC by JianHong Yin
Modified: 2021-03-11 14:36 UTC (History)
7 users (show)

Fixed In Version: rdma-7.3_4.7_rc2-4.el7.noarch
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-09 12:53:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Article) 3353581 0 None None None 2018-02-14 16:18:48 UTC

Description JianHong Yin 2016-07-06 02:46:59 UTC
Description of problem:
when we do early rdma test on upstream kernel, service rdma start fail

Version-Release number of selected component (if applicable):
all

How reproducible:
always

Steps to Reproduce:
1. update kernel to upstream 4.7
2. restart rdma service

Actual results:
[root@hp-dl380pgen8-01 ~]# rpm -qf /usr/libexec/rdma-init-kernel^C
[root@hp-dl380pgen8-01 ~]# uname -r
4.7.0-rc6
[root@hp-dl380pgen8-01 ~]# LANG=C service rdma start 
Redirecting to /bin/systemctl start  rdma.service
Job for rdma.service failed because the control process exited with error code. See "systemctl status rdma.service" and "journalctl -xe" for details.
[root@hp-dl380pgen8-01 ~]# LANG=C service rdma status
Redirecting to /bin/systemctl status  rdma.service
* rdma.service - Initialize the iWARP/InfiniBand/RDMA stack in the kernel
   Loaded: loaded (/usr/lib/systemd/system/rdma.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2016-07-05 22:46:05 EDT; 6s ago
     Docs: file:/etc/rdma/rdma.conf
  Process: 33684 ExecStart=/usr/libexec/rdma-init-kernel (code=exited, status=3)
 Main PID: 33684 (code=exited, status=3)

Jul 05 22:46:05 hp-dl380pgen8-01.khw.lab.eng.bos.redhat.com rdma-init-kernel[33684]: modprobe: FATAL: Module ib_mad not found.
Jul 05 22:46:05 hp-dl380pgen8-01.khw.lab.eng.bos.redhat.com rdma-init-kernel[33684]: Failed to load module ib_mad
Jul 05 22:46:05 hp-dl380pgen8-01.khw.lab.eng.bos.redhat.com rdma-init-kernel[33684]: modprobe: FATAL: Module ib_sa not found.
Jul 05 22:46:05 hp-dl380pgen8-01.khw.lab.eng.bos.redhat.com rdma-init-kernel[33684]: Failed to load module ib_sa
Jul 05 22:46:05 hp-dl380pgen8-01.khw.lab.eng.bos.redhat.com rdma-init-kernel[33684]: modprobe: FATAL: Module ib_addr not found.
Jul 05 22:46:05 hp-dl380pgen8-01.khw.lab.eng.bos.redhat.com rdma-init-kernel[33684]: Failed to load module ib_addr
Jul 05 22:46:05 hp-dl380pgen8-01.khw.lab.eng.bos.redhat.com systemd[1]: rdma.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Jul 05 22:46:05 hp-dl380pgen8-01.khw.lab.eng.bos.redhat.com systemd[1]: Failed to start Initialize the iWARP/InfiniBand/RDMA stack in the kernel.
Jul 05 22:46:05 hp-dl380pgen8-01.khw.lab.eng.bos.redhat.com systemd[1]: Unit rdma.service entered failed state.
Jul 05 22:46:05 hp-dl380pgen8-01.khw.lab.eng.bos.redhat.com systemd[1]: rdma.service failed.

Expected results:
rdma start success

Additional info:

Comment 1 JianHong Yin 2016-07-06 02:53:32 UTC
see also https://bugzilla.kernel.org/show_bug.cgi?id=121501

Comment 2 JianHong Yin 2016-07-06 03:05:34 UTC
this patch works fine for me

--- /usr/libexec/rdma-init-kernel.orig	2015-06-24 05:11:00.000000000 -0400
+++ /usr/libexec/rdma-init-kernel	2016-07-05 23:01:23.089611452 -0400
@@ -14,7 +14,8 @@ MTRR_SCRIPT=/usr/libexec/rdma-fixup-mtrr
 LOAD_ULP_MODULES=""
 LOAD_CORE_USER_MODULES="ib_umad ib_uverbs ib_ucm rdma_ucm"
 LOAD_CORE_CM_MODULES="iw_cm ib_cm rdma_cm"
-LOAD_CORE_MODULES="ib_core ib_mad ib_sa ib_addr"
+LOAD_CORE_MODULES="ib_core"
+LOAD_CORE_MODULES_OPT="ib_mad ib_sa ib_addr"
 
 if [ -f $CONFIG ]; then
     . $CONFIG
@@ -233,6 +234,7 @@ load_hardware_modules
 RC=$[ $RC + $? ]
 load_modules $LOAD_CORE_MODULES
 RC=$[ $RC + $? ]
+load_modules $LOAD_CORE_MODULES_OPT
 load_modules $LOAD_CORE_CM_MODULES
 RC=$[ $RC + $? ]
 load_modules $LOAD_CORE_USER_MODULES

Comment 4 Doug Ledford 2016-07-06 13:24:35 UTC
I made changes to the rawhide RDMA package to resolve this issue.

Comment 5 Jarod Wilson 2016-07-07 18:18:50 UTC
(In reply to Doug Ledford from comment #4)
> I made changes to the rawhide RDMA package to resolve this issue.

Which change is that? We already have Yaakov's 'check module presence' patch in the latest RHEL7 rdma package, which so far as I can tell, should have kicked in to prevent this particular bug, unless testing was done with an older rdma package...

jiyin:

$ rpm -q rdma?

Please use at least rdma-7.3_4.7_rc2-4, which is the current latest package version.

Comment 6 JianHong Yin 2016-07-08 01:28:44 UTC
(In reply to Jarod Wilson from comment #5)
> (In reply to Doug Ledford from comment #4)
> > I made changes to the rawhide RDMA package to resolve this issue.
> 
> Which change is that?

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/infiniband/core/Makefile?id=e3f20f02864f6da1509c523bfa1e928619e59095

> We already have Yaakov's 'check module presence' patch
> in the latest RHEL7 rdma package, which so far as I can tell, should have
> kicked in to prevent this particular bug, unless testing was done with an
> older rdma package...
> 
> jiyin:
> 
> $ rpm -q rdma?
> 
> Please use at least rdma-7.3_4.7_rc2-4, which is the current latest package
> version.

Comment 7 Don Dutile (Red Hat) 2016-07-08 02:33:38 UTC
(In reply to Yin.JianHong from comment #6)
> (In reply to Jarod Wilson from comment #5)
> > (In reply to Doug Ledford from comment #4)
> > > I made changes to the rawhide RDMA package to resolve this issue.
> > 
> > Which change is that?
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> drivers/infiniband/core/Makefile?id=e3f20f02864f6da1509c523bfa1e928619e59095
> 

This is a kernel change, not RDMA pkg change.

It resolves the issue by avoiding the race that appears to occur when multiple module loads are done, by lumping two modules into one.

*but*, this bz is un-resolvable, as the pkg's module load specification is based on knowing what the .ko's are for a given kernel.  As this patch shows, this will break the RDMA pkg again, where a module it tries to load does not exist.  what needs to occur is that a given kernel version's .spec file will have to add a Requires: > rdma.<version>, so RDMA pkg is kept in sync with a kernel when its modules content changes.

Comment 8 JianHong Yin 2016-07-08 02:55:12 UTC
(In reply to Don Dutile from comment #7)
> (In reply to Yin.JianHong from comment #6)
> > (In reply to Jarod Wilson from comment #5)
> > > (In reply to Doug Ledford from comment #4)
> > > > I made changes to the rawhide RDMA package to resolve this issue.
> > > 
> > > Which change is that?
> > 
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> > drivers/infiniband/core/Makefile?id=e3f20f02864f6da1509c523bfa1e928619e59095
> > 
> 
> This is a kernel change, not RDMA pkg change.
> 
> It resolves the issue by avoiding the race that appears to occur when
> multiple module loads are done, by lumping two modules into one.
> 
> *but*, this bz is un-resolvable, as the pkg's module load specification is
> based on knowing what the .ko's are for a given kernel.  As this patch
> shows, this will break the RDMA pkg again, where a module it tries to load
> does not exist.  what needs to occur is that a given kernel version's .spec
> file will have to add a Requires: > rdma.<version>, so RDMA pkg is kept in
> sync with a kernel when its modules content changes.

I know.

We just hope there is a smart way to "forward compatibility".
  rdma would still works fine, when user/customer update/downgrade to new/old kernel.

Comment 9 Doug Ledford 2016-07-08 03:33:16 UTC
(In reply to Jarod Wilson from comment #5)
> (In reply to Doug Ledford from comment #4)
> > I made changes to the rawhide RDMA package to resolve this issue.
> 
> Which change is that?

I pulled in everything of value from rhel7.3 to fedora rawhide.  I think the only difference between what I did in Fedora and what was in rhel7.3 is that I made sure the mlx4 modprobe file would not fail if either mlx4_en or mlx4_ib was missing, whereas Yaakov's original patch only worked around the currently missing mlx4_ib.

> We already have Yaakov's 'check module presence' patch
> in the latest RHEL7 rdma package, which so far as I can tell, should have
> kicked in to prevent this particular bug, unless testing was done with an
> older rdma package...
> 
> jiyin:
> 
> $ rpm -q rdma?
> 
> Please use at least rdma-7.3_4.7_rc2-4, which is the current latest package
> version.

Comment 10 Jarod Wilson 2016-07-08 16:05:23 UTC
(In reply to Jarod Wilson from comment #5)
...
> jiyin:
> 
> $ rpm -q rdma?
> 
> Please use at least rdma-7.3_4.7_rc2-4, which is the current latest package
> version.

Still looking for this info.

Comment 11 JianHong Yin 2016-07-09 12:53:31 UTC
(In reply to Jarod Wilson from comment #10)
> (In reply to Jarod Wilson from comment #5)
> ...
> > jiyin:
> > 
> > $ rpm -q rdma?
> > 
> > Please use at least rdma-7.3_4.7_rc2-4, which is the current latest package
> > version.
> 
> Still looking for this info.

yes, latest version(rdma-7.3_4.7_rc2-4.el7.noarch) works fine.

Comment 12 pillala 2017-02-13 14:51:55 UTC
We have deployed with the Patch in our Environment. We are not seeing the KERNEL Tainting Issues anymore on the RHEL 7.3

Comment 13 JianHong Yin 2017-02-13 15:08:15 UTC
(In reply to pillala from comment #12)
> We have deployed with the Patch in our Environment. We are not seeing the
> KERNEL Tainting Issues anymore on the RHEL 7.3

Got it. thank you for the report :)


Note You need to log in before you can comment on or make changes to this bug.