Bug 58671

Summary: Tux 2.0, Samba 2.2.x, Apache 1.3.19
Product: [Retired] Red Hat Linux Reporter: william rose <wrose>
Component: kernelAssignee: Ingo Molnar <mingo>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.1   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-06-08 00:30:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description william rose 2002-01-22 17:13:50 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0b; Windows NT 5.0; .NET CLR 
1.0.2914)

Description of problem:
I have a Tux webserver with Apache and Samba running. Occasionally, the kernel 
locks up (oops), occasionally httpd hangs, and occasionally Tux has a thread on 
our dual processor machines run at 100% utilization only on one of the 
processors, #1. I have read in various places on the Internet that gcc 2.96 
creates a kernel and user space programs that may cause file corruption (see 
http://www.mysql.com/downloads/mysql-3.23.html bottom of the page). Samba 2.2.x 
uses a database as a scoreboard for the sub-processes now, and it appears that 
it is being corrupt somehow. Times when this happen show the file server to be 
not responding on the network. When the kernel oops happen (I will get a 
listing of all opps info the next time I hope), the machine appears to have 
been performing a paging operation. Furthermore, I believe the Apache problem 
is due to the scoreboard file becoming corrupt. We also have a Redhat 6.2 
machine with Samba 2.0.7 that has experienced 100% uptime in over 300 days. The 
Redhat 7.1 machines seems to fail every 18-35 days. The question becomes: is 
gcc 2.96 the root of these problems? I upgraded the kernel to 2.4.3-12 when 
that errata was released by Redhat. This did not fix the problem. I am 
reluctant to update further if the cause is gcc 2.96 as these machines are 
mission critical to the operation of our website.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. Normal operation, non-existent to heavy loads.
2.
3.
	

Actual Results:  During normal operation, Samba dies as it can't find it's own 
processes in the scoreboard database. The same occurs with Apache, but not at 
the same time (thus far). Tux seems to have a thread running 100% on processor 
#1 as reported via top. At times the whole system will not respond, and the 
machine's reset button must be used. These problems do not occur at the same 
time.

Expected Results:  Normal operation.

Additional info:

A previous problem I had with these servers was data corruption on a MegaRaid 
controlled array. The systems are HP Lpr's, one with dual PIII 550MHz, and 
another with 850MHz. The Redhat 6.2 machine running Samba 2.0.7 is the same 
(550Mhz) and has had 0% downtime. I scraped the MegaRaid cards and had to go 
with the built in Symbios in the Lpr's to get anything working. After reading 
reports of gcc 2.96, I feel this is the cause of the MegaRaid problem, the 
Apache scoreboard file being corrupt, the Samba scoreboard database being 
corrupt, Tux locking up on processor #1, and the system completely locking up 
at times. I work in a 99.995% required uptime environment and absolutley do not 
have time to sort through the oops printout or anything else for that matter 
during down periods. I will, however, grabwhat I can during the next episode 
and update this report.

Comment 1 Arjan van de Ven 2002-01-22 17:18:41 UTC
The cause is not gcc 2.96. 2.96 has proven to be a very stable compiler and is
even recommended (next to *one* other gcc version) by Linus Torvalds for kernel
use (in fact, Linus himself uses 2.96). 

We released a 2.4.9 kernel for 7.1 with a much upgraded TUX, it might be worth
upgrading to that...

Comment 2 Ingo Molnar 2002-06-10 16:26:49 UTC
do current kernels still produce this problem?

Comment 3 william rose 2002-06-10 16:33:12 UTC
I have upgraded both machines to 7.2 and then applied the 2.4.9-31 kernel along 
with the needed mod utils and newer tux userspace rpms. So far, everything is 
fine. I would say the problem was in the older tux, this can/should be closed...