Bug 147750 - Kernel panic - not syncing ... net
Kernel panic - not syncing ... net
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
3
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
: 147867 147898 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-02-10 17:24 EST by Frode Tennebø
Modified: 2015-01-04 17:16 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-07-25 20:09:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Frode Tennebø 2005-02-10 17:24:41 EST
Description of problem:
It appears that there is a problem with the networking subsystem. After what 
could very well be heavy network activity I get:

"Kernel panic - not suncing: net/ipv4/tcp_timer.c:213: spin_lock (net/core/sock.
c:C9E41060) already locked by net/ipv4/tcp_ipv4.c/1793"

The machine, in addition to my workstation, is also an iptables firewall with 
forwarding for several machines on my net.

Version-Release number of selected component (if applicable):
kernel-2.6.10-1.760_FC3

however, due to missing support for eata drivers I have added that from the .src 
package and also done some more minor changes:

[ft@leia linux-2.6.10]$ diff .config configs/kernel-2.6.10-i686.config |grep ^\<
< # Linux kernel version: 2.6.10-prep
< # Sun Feb  6 01:25:19 2005
< # CONFIG_M686 is not set
< CONFIG_MPENTIUMII=y
< # CONFIG_PM is not set
< # CONFIG_ACPI is not set
< # CONFIG_CPU_FREQ is not set
< CONFIG_PCI_NAMES=y
< CONFIG_HOTPLUG_PCI=m
< # CONFIG_HOTPLUG_PCI_SHPC_PHPRM_LEGACY is not set
< CONFIG_SCSI_EATA=m
< CONFIG_SCSI_EATA_TAGGED_QUEUE=y
< CONFIG_SCSI_EATA_LINKED_COMMANDS=y
< CONFIG_SCSI_EATA_MAX_TAGS=16
[ft@leia linux-2.6.10]$         

How reproducible:
quite often

Steps to Reproduce:
1. It happens in about 25% of apt-get update/upgrade tries, but also on some 
other occasions.
2.
3.
  
Actual results:
kernel panic

Expected results:
no kernel panic

Additional info:
Comment 1 Frode Tennebø 2005-02-12 08:06:33 EST
*** Bug 147898 has been marked as a duplicate of this bug. ***
Comment 2 Frode Tennebø 2005-02-12 08:11:19 EST
*** Bug 147867 has been marked as a duplicate of this bug. ***
Comment 3 Jacco Ligthart 2005-03-04 08:27:15 EST
Just a "me too".

I've got exactly the same error with an unmodified kernel. It only happens when
working interactively over ssh on the machine. This machine has never heavy
networkload.

iptables is enabled with my personal ruleset.

Comment 4 Ian Neubert 2005-03-31 18:55:12 EST
Just wanted to say me too, but I have RHEL4 and a bit different line number:

Kernel panic - not syncing: net/ipv4/tcp_timer.c:422: 
spin_lock(net/ipv4/tcp_minisocks.c:dc5dd540) already
locked by net/ipv4/tcp_ipv4.c/1790

I'm running kernel 2.6.9-5.0.3.EL.
Comment 5 Ian Neubert 2005-03-31 19:05:23 EST
I should note that I am using iptables as well and passing packets to a
userspace program via QUEUE.
Comment 6 Dan Singletary 2005-05-04 01:32:20 EDT
I am running kernel 2.6.11-1.14_FC3.  I want to throw in another "me too" on
this one.  I have almost the exact same kernel panic... for me it occurs once or
twice a day... the system receives hourly backups from several sources, so
network load is high at times.

... This bug has been in the database for a few months now-- any progress?
Comment 7 Dan Singletary 2005-05-04 01:34:33 EDT
... it's me again.. wanted to also mention that I am ALSO using a userspace
queuing program with packets sent from iptables via the QUEUE target.
Comment 8 Frode Tennebø 2005-05-04 10:13:46 EDT
As the OP, let me say "me too" to the me-toos. I also use a userspace queuing 
program, dsl_qos_queue (which worked fine with the FC1 kernels - I asume that's 
the one you use as well, Dan? :-). After disabling it I no longer have this type 
of crash situation, but a much worse bandwith situation. :/

I also switch to NTP with FC3. Could there be a dead-lock due to NTP setting the 
clock at some critical point in the execution-path, either internally in the 
kernel or in the userspace program? I have been unable to test this part of my 
theory since I need this particular system to be stable system.
Comment 9 Dan Singletary 2005-05-04 13:24:14 EDT
In fact... not only am I using dsl_qos_queue... I wrote it =)

Is everyone here using a user-space QUEUE?  Perhaps there is a problem with the
2.6.10+ kernels related to user space queuing.

The reason that I didn't notice this problem until recently is because of a
power outage that caused me to have to shut down and reboot, thus loading up a
newer kernel that had been previously installed by yum.  For the past four
months prior to this recent reboot, I was running a 2.6.9 kernel.... I have
since reverted to the 2.6.9 kernel and I have observed stable operation so far.

Also, FWIW, I have been running NTP for a long time now, so there is no problem
with NTP and dsl_qos_queue, at least with 2.6.9 and earlier kernels.

How should we go about debugging this problem?

I think we should take a look at what changes were made between the .9 and the
.10 kernel.  This might help us narrow down the search... especially if we find
changes that were made to the code were the panic is being reported in...

-Dan
Comment 10 James Morris 2005-06-21 20:20:19 EDT
(In reply to comment #9)
> Also, FWIW, I have been running NTP for a long time now, so there is no problem
> with NTP and dsl_qos_queue, at least with 2.6.9 and earlier kernels.

Exactly which 2.6.9 kernel?  Is it an upstream one or a Red Hat one (if so, which?)
Comment 11 Dan Singletary 2005-06-22 20:52:31 EDT
This bug is still present on 2.6.11-1.27_FC3, which is the latest Red Hat Fedora
Core 3 kernel.

-Dan

(In reply to comment #10)
> (In reply to comment #9)
> > Also, FWIW, I have been running NTP for a long time now, so there is no problem
> > with NTP and dsl_qos_queue, at least with 2.6.9 and earlier kernels.
> 
> Exactly which 2.6.9 kernel?  Is it an upstream one or a Red Hat one (if so,
which?)

Comment 12 Dave Jones 2005-07-15 15:24:10 EDT
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.
Comment 13 Dan Singletary 2005-07-15 17:40:10 EDT
I just upgraded to the 2.6.12-1.1372_FC3 kernel, and I turned the QUEUE-using
program back on.  Generally, the system will not remain stable for more than 24
hours if the bug is still present.  I will report back as soon as I have a
problem, or in a few days if trouble-free.

-Dan
Comment 14 Dan Singletary 2005-07-21 14:22:46 EDT
Still running 2.6.12-1.1.1372_FC3 here with the QUEUE program back in use and
system is stable once again.  Looks like whatever was done for this kernel
release fixed the problem I was having with the past few releases.  Normally
running the QUEUE program would cause the system to panic within 24 hours-- it's
been almost a week now and system is still solid.

-Dan
Comment 15 Frode Tennebø 2005-07-26 12:29:22 EDT
I just want to say that I have also been running a QUEUE program for a few days 
and it seems to be stable. I'm running kernel-2.6.12-1.1398_FC4 (recompilled for 
my needs tho').

Note You need to log in before you can comment on or make changes to this bug.