Red Hat Bugzilla – Bug 613437
RDMA/OpenIB shutdown hangs if using NFSoRDMA
Last modified: 2011-05-02 12:49:47 EDT
Created attachment 431006 [details]
Patch to shut down and unload NFSoRDMA modules during RDMA stop
Description of problem:
If using NFSoRDMA, the reboot will hang while trying to stop the rdma service. This is due to not unloading the modules for NFSoRDMA prior to trying to unload the rest of the stack. It doesn't matter if the use of NFSoRDMA is as a client or as a server.
Version-Release number of selected component (if applicable):
The attached patch fixes this issue for me during shutdown.
Well, it is a partial fix -- I thought I had tested the patch with clients mounted, but it appears I did not. In that case, I still get the hangs, as svcrdma/xprtrdma do not unload.
So, more work to be done here.
I'm creating a new service, nfs-rdma, that's intended to be started after nfs
and stopped before nfs that enables and disables rdma support. The rdma init
script now checks to make sure that nfs-rdma support is disabled before downing
the rdma service. If the nfs-rdma service fails to stop (because things are in
use) then you can down the nfs service and then down the nfs-rdma service
(which is backward order, but should work as downing the nfs service will free
up the kernel modules to be unloaded). Will be present in rdma-1.0-8 or later.
That will let me remove my modifications to get NFSoRDMA starting, but I don't think it is going to help when stopping the rdma service -- I've looked around a bit, but I don't see a way to forcibly disconnect clients from the NFS server, which keeps the use count non-zero on the nfsd and svcrdma modules, preventing their removal. There may be a way; I just haven't found it yet -- or it may be a different issue than I think.
I've also seen a similar problem with the NFS client, as even after unmounting all of the NFS filesystems, I still cannot rmmod xprtrdma.
I'm starting to wonder if the best workaround at the moment is to see if the scripts can detect we're going to runlevel 0, 1, or 6 and just not try to remove the modules.
Fixed long ago but bug was not autoclosed.