Bug 245012

Summary: system hangs without any obvious reasons
Product: [Fedora] Fedora Reporter: Adrian Reber <adrian>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 7CC: paul.0000.black, pfrields
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-16 09:18:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lspci output none

Description Adrian Reber 2007-06-20 15:02:19 UTC
Since we upgraded our system to Fedora 7 both available kernels seem very 
unstable. We have tried 2.6.21-1.3194.fc7 as well as 2.6.21-1.3228.fc7.

Unfortunately we cannot describe the error in more detail but that it just hangs.
Over the serial connection the system is not responding anymore as well as on
the VGA console.

It usually happens after about 36 hours and then the system is not reachable
anymore.

The system is a quad amd dual-core system:
processor       : 7
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8214
stepping        : 2
cpu MHz         : 2200.283
cache size      : 1024 KB
physical id     : 3
siblings        : 2
core id         : 1
cpu cores       : 2

Dell Poweredge 6950 8GB RAM

The load is most of the time rather high. This is a mirror server and we are
pushing about 300Mbit/s as an average. With much higher peaks possible.
There is a fiber channel RAID with 4TB attached and 1TB iscsi volume.

We used bonding during FC6 but due to
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=241719
we disabled it.

We have a dual port e1000 for our main traffic and the internal bnx2 dual port
is used for the iscsi connection.

Our system is stable with the latest 2.6.18 from FC6 but newer FC6 kernels were
never tried because of
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=225399

We are writing some values to proc with sysctl for performance reasons:
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 1800
net.core.wmem_max = 8388608
net.core.rmem_max = 8388608
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 87380 8388608
net.ipv4.tcp_max_syn_backlog = 4096
vm.min_free_kbytes = 65536

We have not tried changing these values because they have proofed to be good
with the 2.6.18 kernel. I have no idea if these values are the reason for our hangs.

I am aware that this is a bad bug report and you can close if you like. I just
wanted to report it. Currently we are happy with 2.6.18.

Comment 1 Paul Black 2007-07-03 12:08:50 UTC
Created attachment 158428 [details]
lspci output

I'm having a similar problem on a copule of FC7 machines, most recently on a
Dell Optiplex GX270 with kernel-2.6.21-1.3228.fc7 - complete lock up no output
on serial console. Has also happened at run level 3 (so no X). I've attached
the output of lspci for the GX270 as it might help correlate issues with
specific hardware.

Comment 2 Jeffrey Grace 2007-09-06 08:47:56 UTC
We've been having trouble with random system hangs on Acer Veriton 5800
workstations.

These have a pentium D (3.4Ghz) processor.  setting maxcpus=1 at boot, seems to
stop this from happening.  We;ve noticed the the same problem with FC6 machines
running a kernel later than 2.6.20.

Comment 3 Adrian Reber 2007-09-16 09:18:31 UTC
With 2.6.22.4-65.fc7 we have now a uptime of two weeks. Seems to be fixed.
Closing it.