Bug 490571 - modprobe -r (or rmmod ) can not unload tg3 module if we repeat load/unload tg3 module lot times.
Summary: modprobe -r (or rmmod ) can not unload tg3 module if we repeat load/unload tg...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Prarit Bhargava
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-03-17 02:27 UTC by Zhenyong(Jerry) Jiang
Modified: 2013-07-29 00:44 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-04-02 15:37:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Zhenyong(Jerry) Jiang 2009-03-17 02:27:00 UTC
Description of problem:

OS: RHEL5.3 i386

If I repeat running "modprobe tg3;rmmod tg3" or "modprobe tg3;modprobe -r tg3" continuously for a while (several minutes).  Sometimes after running "modprobe -r tg3" or "rmmod tg3", strace shows the call "delete_module" return success but tg3 still exists. 

munmap(0xb7f4a000, 4096)                = 0
write(2, "rmmod tg3, wait=no\n", 19)    = 19
delete_module("tg3", O_RDONLY|O_EXCL|O_NONBLOCK) = 0
exit_group(0)     

This issue can be workaround by putting an interval(eg,10s) between loading and unloading ( for example "modprobe tg3;sleep 10; rmmod tg3").

Customer can reproduce this case in HP/DELL servers which shipped with BMC NIC controller, and I can find it on my DELL workstation which has BMC5751 controller.

Customer had tried latest version of tg3 module download from BroadCom website and issue persists.

How reproducible:

Often.

Running scripts attached can help to reproduce this issue.

Steps to Reproduce:

Running scripts attached can help to reproduce this issue.

Actual results:

tg3 module exists after rmmod executed.

Expected results:

tg3 should be removed after rmmod executed.

Comment 7 Prarit Bhargava 2009-03-26 13:10:55 UTC
Hmm.  I can't seem to reproduce this on any system.  Jerry, could you run a sosreport on your Dell box?  I'll see if there is one in RHTS.

P.

Comment 8 Prarit Bhargava 2009-03-26 22:14:30 UTC
Jerry, I was finally able to reproduce this on a system in RHTS.

AFAICT, this only happens if the network interface is active when the module unload occurs.  If I do a 

service network stop

followed by the test, I never see a problem.

P.

Comment 9 Prarit Bhargava 2009-03-30 10:52:27 UTC
What I think is going on (but having a fun time trying to prove it :) ):

This doesn't happen if the network service is disabled.  I can run a test script which repeatedly modprobe and 'modprobe -r's the tg3 module over an entire weekend and I do not see this issue.

However, if the network service is enabled, I see this usually within about 5-10 minutes.

When issuing the modprobe -r, the network service does an "ifdown" on the interface.

When issuing the modprobe, the network service does an "ifup" on the interface.

These events are unsynchronized, so when doing many module add and removes it is possible to end up in this situation:

ifup ethX (immediately followed by)
rmmod tg3 (which ethX is using)

But ... proving this is a bit harder than I anticipated.  Adding systemtap scripts or debug output to the module remove code seems to perturb the system just enough that the problem doesn't occur.

I'm thinking about this one ... I'm beginning to wonder if the test is valid?

All I am doing is adding and removing the module repeatedly.  The real "bug" here is that the network service is unsynchronized with the module add and remove.

gospo -- any thoughts?

P.

Comment 11 Prarit Bhargava 2009-04-02 15:37:20 UTC
It turns out that this issue is a known issue within the module add and remove
code within the kernel.  The network service angle that I pointed out a few
days ago seems to be entirely coincidental.

When loading multiple modules there exists a small window where the first
module hasn't completed linking and the second module starts loading, or the
first module is unloading, and the second starts unloading.

This overlapping behavior has been resolved upstream and will be fixed in
RHEL6.


Note You need to log in before you can comment on or make changes to this bug.