Bug 22583 - Default kernels cannot be used in production environments
Summary: Default kernels cannot be used in production environments
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Michael K. Johnson
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks: 22717
TreeView+ depends on / blocked
 
Reported: 2000-12-20 15:41 UTC by Need Real Name
Modified: 2005-10-31 22:00 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-06-05 22:50:18 UTC
Embargoed:


Attachments (Terms of Use)

Description Need Real Name 2000-12-20 15:41:08 UTC
When running unattended, the default Redhat kernels, which all use loadable
modules, cannot be used.

The problem occurred setting up a new ISP's Linux servers in their
co-location closet.

Symptom: after a short while, connectiosn cannot be established to the
servers.

Prerequisites: no traffic at all .. this is a NEW network.

Cause: inactivity caused the kernel modules for the network drivers to be
released, effectively removing all network support.

Workarround: run ping at one packetet every few minutes to prevent modules
being released.

Interim solution: build kernels without modules support.

Long-term solution: modules must be able to mark themselves un-available
for removal.  Critical modules, such as network support, must make use of
this feature.

Hardware in this case: Compaq Proliant (rackmount) servers

Problem occurred on both the built-in (Intel) and add-in (3c905) network
cards.

Comment 1 Bill Nottingham 2000-12-20 16:54:35 UTC
Remove /etc/cron.d/kmod, and your problem should go away.

Comment 2 Need Real Name 2000-12-21 00:02:50 UTC
You're right, it's not a bug.  It's a design oversight.

Consider: The intention is loadable modules is to support infrequently used 
devices.  While the Proliant does not have PCMCIA cards, I could easily puchase 
a controller to add them to it.  In this case, I would find my kernel growing 
as devices are added, but the modules would never be freed since you would have 
me remove the job which does that.

The design oversight is that some modules (either always or, better still, 
under the control of the administrator building the kernel) must NEVER be 
removed.  ISTM when building the kernel, you should (don't remember if you 
currently can) be able to select certain features which are NOT to be built as 
modules.

In the default install case, however, using the Redhat prebuilt kernels, 
everything is a module (and for good reason).  What happens is the system 
mysteriously ceases to function.  If, for instance, there was a option (enabled 
by default) which told the module-removal job to NOT remove keyboard, display, 
floppy, CDROM, hard drive, and network controller modules, your system would 
appear more stable right out of the box.

As things now stand, things are good for me and bad for my customer.  Bad for 
my customer because they wasted several weeks trying to determine why their 
servers were not working reliably.  Good for me because I get a nice, hefty 
consulting fee.

So, if you really thing Redhat Linux should frustrate new users, and increase 
revenues for consultants, go ahead and re-close this bug.

But if, like me, you think Linux should be reliable and stable, even when used 
by neophites, please leave it open until you actually RESOLVE it.

Comment 3 Daniel Roesen 2000-12-21 13:50:47 UTC
I agree that there must be some magic for devices whom "use" can be external
triggered (like NICs etc.).

But I think an ifconfig up'ed network device should always raise the use count
of the driver module. Stinks like a kernel/driver problem to me.

Disabling kmod is not actually a reasonable resolution to the problem.

Comment 4 Need Real Name 2001-01-03 14:12:59 UTC
I agree that droesen's comment.
I have commented 'rmmod -as' in /etc/cron.d/kmod.
but, I couldn't resolve my ploblem.
My linux box still have one 'ping' process.


Comment 5 Andrew Bartlett 2001-01-04 07:58:02 UTC
Shouldn't the fact that the interfaces are 'up' keep the modules from being
removed?  Isn't this the real bug here, these modules are 'in use' not becouse
there is traffic, but becouse the interfaces are 'up'.  (The module use counters
come to mind here...).

Comment 6 Perry Harrington 2001-03-15 02:55:35 UTC
Only modules that are marked as 'autoclean' are removed by rmmod -as.  If you
perform a modprobe directly, this doesn't happen.  Only modules that are
autoloaded by the kernel module loader are marked as 'autoclean'.  The sound
initialization scripts do a modprobe to load the modules, and they are not
autoclean.  If an ifup-eth script was added in /etc/sysconfig/network-scripts,
that checked $(/sbin/lsmod) for the module listed in /etc/modules.conf, and did
a modprobe if it wasn't in the kernel, that would solve the problem.

Comment 7 Need Real Name 2001-04-03 09:00:55 UTC
I have a very similar problem also using Redhat 7.0 with the 2.2.16-35 smp kernel and a 3c905 netcard. The network hangs after a varying time without 
error messages (also very similar to bug 22717). The way to get it back working is to stop the network unload the netcard module, load it again and 
restart the network. I agree that the time until the network hangs seems to depend on how much you use the network. Usage seems to prolong the time 
and vice versa. I don't think it has to do with auto-cleaned modules though, but rather a kernel/driver problem. I get the same problem when I load the 
modules by hand without the "auto clean" flag set. Removing the kmod script does not change the situation either. Also, when the network is hanging, 
lsmod still reports the netcard module as loaded and used.

The problems still needs a solution though, since the networks always hangs after 1-3 days (depending on usage) of uptime on all my RedHat 7.0 
machines. I would be greatful for hints on solution. The ping trick (every 5 minutes) doesn't work for me.

Comment 8 Arjan van de Ven 2001-04-03 13:26:32 UTC
This seems like a different problem. If the 3c59x module shows the same, please
open a separate bug.


Note You need to log in before you can comment on or make changes to this bug.