Red Hat Bugzilla – Bug 22583
Default kernels cannot be used in production environments
Last modified: 2005-10-31 17:00:50 EST
When running unattended, the default Redhat kernels, which all use loadable
modules, cannot be used.
The problem occurred setting up a new ISP's Linux servers in their
Symptom: after a short while, connectiosn cannot be established to the
Prerequisites: no traffic at all .. this is a NEW network.
Cause: inactivity caused the kernel modules for the network drivers to be
released, effectively removing all network support.
Workarround: run ping at one packetet every few minutes to prevent modules
Interim solution: build kernels without modules support.
Long-term solution: modules must be able to mark themselves un-available
for removal. Critical modules, such as network support, must make use of
Hardware in this case: Compaq Proliant (rackmount) servers
Problem occurred on both the built-in (Intel) and add-in (3c905) network
Remove /etc/cron.d/kmod, and your problem should go away.
You're right, it's not a bug. It's a design oversight.
Consider: The intention is loadable modules is to support infrequently used
devices. While the Proliant does not have PCMCIA cards, I could easily puchase
a controller to add them to it. In this case, I would find my kernel growing
as devices are added, but the modules would never be freed since you would have
me remove the job which does that.
The design oversight is that some modules (either always or, better still,
under the control of the administrator building the kernel) must NEVER be
removed. ISTM when building the kernel, you should (don't remember if you
currently can) be able to select certain features which are NOT to be built as
In the default install case, however, using the Redhat prebuilt kernels,
everything is a module (and for good reason). What happens is the system
mysteriously ceases to function. If, for instance, there was a option (enabled
by default) which told the module-removal job to NOT remove keyboard, display,
floppy, CDROM, hard drive, and network controller modules, your system would
appear more stable right out of the box.
As things now stand, things are good for me and bad for my customer. Bad for
my customer because they wasted several weeks trying to determine why their
servers were not working reliably. Good for me because I get a nice, hefty
So, if you really thing Redhat Linux should frustrate new users, and increase
revenues for consultants, go ahead and re-close this bug.
But if, like me, you think Linux should be reliable and stable, even when used
by neophites, please leave it open until you actually RESOLVE it.
I agree that there must be some magic for devices whom "use" can be external
triggered (like NICs etc.).
But I think an ifconfig up'ed network device should always raise the use count
of the driver module. Stinks like a kernel/driver problem to me.
Disabling kmod is not actually a reasonable resolution to the problem.
I agree that droesen's comment.
I have commented 'rmmod -as' in /etc/cron.d/kmod.
but, I couldn't resolve my ploblem.
My linux box still have one 'ping' process.
Shouldn't the fact that the interfaces are 'up' keep the modules from being
removed? Isn't this the real bug here, these modules are 'in use' not becouse
there is traffic, but becouse the interfaces are 'up'. (The module use counters
come to mind here...).
Only modules that are marked as 'autoclean' are removed by rmmod -as. If you
perform a modprobe directly, this doesn't happen. Only modules that are
autoloaded by the kernel module loader are marked as 'autoclean'. The sound
initialization scripts do a modprobe to load the modules, and they are not
autoclean. If an ifup-eth script was added in /etc/sysconfig/network-scripts,
that checked $(/sbin/lsmod) for the module listed in /etc/modules.conf, and did
a modprobe if it wasn't in the kernel, that would solve the problem.
I have a very similar problem also using Redhat 7.0 with the 2.2.16-35 smp kernel and a 3c905 netcard. The network hangs after a varying time without
error messages (also very similar to bug 22717). The way to get it back working is to stop the network unload the netcard module, load it again and
restart the network. I agree that the time until the network hangs seems to depend on how much you use the network. Usage seems to prolong the time
and vice versa. I don't think it has to do with auto-cleaned modules though, but rather a kernel/driver problem. I get the same problem when I load the
modules by hand without the "auto clean" flag set. Removing the kmod script does not change the situation either. Also, when the network is hanging,
lsmod still reports the netcard module as loaded and used.
The problems still needs a solution though, since the networks always hangs after 1-3 days (depending on usage) of uptime on all my RedHat 7.0
machines. I would be greatful for hints on solution. The ping trick (every 5 minutes) doesn't work for me.
This seems like a different problem. If the 3c59x module shows the same, please
open a separate bug.