Bug 245012 - system hangs without any obvious reasons
Summary: system hangs without any obvious reasons
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 7
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-06-20 15:02 UTC by Adrian Reber
Modified: 2015-01-04 22:29 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-09-16 09:18:31 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
lspci output (1.39 KB, text/plain)
2007-07-03 12:08 UTC, Paul Black
no flags Details

Description Adrian Reber 2007-06-20 15:02:19 UTC
Since we upgraded our system to Fedora 7 both available kernels seem very 
unstable. We have tried 2.6.21-1.3194.fc7 as well as 2.6.21-1.3228.fc7.

Unfortunately we cannot describe the error in more detail but that it just hangs.
Over the serial connection the system is not responding anymore as well as on
the VGA console.

It usually happens after about 36 hours and then the system is not reachable
anymore.

The system is a quad amd dual-core system:
processor       : 7
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8214
stepping        : 2
cpu MHz         : 2200.283
cache size      : 1024 KB
physical id     : 3
siblings        : 2
core id         : 1
cpu cores       : 2

Dell Poweredge 6950 8GB RAM

The load is most of the time rather high. This is a mirror server and we are
pushing about 300Mbit/s as an average. With much higher peaks possible.
There is a fiber channel RAID with 4TB attached and 1TB iscsi volume.

We used bonding during FC6 but due to
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=241719
we disabled it.

We have a dual port e1000 for our main traffic and the internal bnx2 dual port
is used for the iscsi connection.

Our system is stable with the latest 2.6.18 from FC6 but newer FC6 kernels were
never tried because of
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=225399

We are writing some values to proc with sysctl for performance reasons:
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 1800
net.core.wmem_max = 8388608
net.core.rmem_max = 8388608
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 87380 8388608
net.ipv4.tcp_max_syn_backlog = 4096
vm.min_free_kbytes = 65536

We have not tried changing these values because they have proofed to be good
with the 2.6.18 kernel. I have no idea if these values are the reason for our hangs.

I am aware that this is a bad bug report and you can close if you like. I just
wanted to report it. Currently we are happy with 2.6.18.

Comment 1 Paul Black 2007-07-03 12:08:50 UTC
Created attachment 158428 [details]
lspci output

I'm having a similar problem on a copule of FC7 machines, most recently on a
Dell Optiplex GX270 with kernel-2.6.21-1.3228.fc7 - complete lock up no output
on serial console. Has also happened at run level 3 (so no X). I've attached
the output of lspci for the GX270 as it might help correlate issues with
specific hardware.

Comment 2 Jeffrey Grace 2007-09-06 08:47:56 UTC
We've been having trouble with random system hangs on Acer Veriton 5800
workstations.

These have a pentium D (3.4Ghz) processor.  setting maxcpus=1 at boot, seems to
stop this from happening.  We;ve noticed the the same problem with FC6 machines
running a kernel later than 2.6.20.

Comment 3 Adrian Reber 2007-09-16 09:18:31 UTC
With 2.6.22.4-65.fc7 we have now a uptime of two weeks. Seems to be fixed.
Closing it.


Note You need to log in before you can comment on or make changes to this bug.