Description of problem: OS: RHEL5.3 i386 If I repeat running "modprobe tg3;rmmod tg3" or "modprobe tg3;modprobe -r tg3" continuously for a while (several minutes). Sometimes after running "modprobe -r tg3" or "rmmod tg3", strace shows the call "delete_module" return success but tg3 still exists. munmap(0xb7f4a000, 4096) = 0 write(2, "rmmod tg3, wait=no\n", 19) = 19 delete_module("tg3", O_RDONLY|O_EXCL|O_NONBLOCK) = 0 exit_group(0) This issue can be workaround by putting an interval(eg,10s) between loading and unloading ( for example "modprobe tg3;sleep 10; rmmod tg3"). Customer can reproduce this case in HP/DELL servers which shipped with BMC NIC controller, and I can find it on my DELL workstation which has BMC5751 controller. Customer had tried latest version of tg3 module download from BroadCom website and issue persists. How reproducible: Often. Running scripts attached can help to reproduce this issue. Steps to Reproduce: Running scripts attached can help to reproduce this issue. Actual results: tg3 module exists after rmmod executed. Expected results: tg3 should be removed after rmmod executed.
Hmm. I can't seem to reproduce this on any system. Jerry, could you run a sosreport on your Dell box? I'll see if there is one in RHTS. P.
Jerry, I was finally able to reproduce this on a system in RHTS. AFAICT, this only happens if the network interface is active when the module unload occurs. If I do a service network stop followed by the test, I never see a problem. P.
What I think is going on (but having a fun time trying to prove it :) ): This doesn't happen if the network service is disabled. I can run a test script which repeatedly modprobe and 'modprobe -r's the tg3 module over an entire weekend and I do not see this issue. However, if the network service is enabled, I see this usually within about 5-10 minutes. When issuing the modprobe -r, the network service does an "ifdown" on the interface. When issuing the modprobe, the network service does an "ifup" on the interface. These events are unsynchronized, so when doing many module add and removes it is possible to end up in this situation: ifup ethX (immediately followed by) rmmod tg3 (which ethX is using) But ... proving this is a bit harder than I anticipated. Adding systemtap scripts or debug output to the module remove code seems to perturb the system just enough that the problem doesn't occur. I'm thinking about this one ... I'm beginning to wonder if the test is valid? All I am doing is adding and removing the module repeatedly. The real "bug" here is that the network service is unsynchronized with the module add and remove. gospo -- any thoughts? P.
It turns out that this issue is a known issue within the module add and remove code within the kernel. The network service angle that I pointed out a few days ago seems to be entirely coincidental. When loading multiple modules there exists a small window where the first module hasn't completed linking and the second module starts loading, or the first module is unloading, and the second starts unloading. This overlapping behavior has been resolved upstream and will be fixed in RHEL6.