58671 – Tux 2.0, Samba 2.2.x, Apache 1.3.19

Bug 58671 - Tux 2.0, Samba 2.2.x, Apache 1.3.19

Summary: Tux 2.0, Samba 2.2.x, Apache 1.3.19

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.1
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Ingo Molnar
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-01-22 17:13 UTC by william rose
Modified:	2007-04-18 16:39 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-06-08 00:30:45 UTC
Embargoed:

Attachments	(Terms of Use)

Description william rose 2002-01-22 17:13:50 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0b; Windows NT 5.0; .NET CLR 
1.0.2914)

Description of problem:
I have a Tux webserver with Apache and Samba running. Occasionally, the kernel 
locks up (oops), occasionally httpd hangs, and occasionally Tux has a thread on 
our dual processor machines run at 100% utilization only on one of the 
processors, #1. I have read in various places on the Internet that gcc 2.96 
creates a kernel and user space programs that may cause file corruption (see 
http://www.mysql.com/downloads/mysql-3.23.html bottom of the page). Samba 2.2.x 
uses a database as a scoreboard for the sub-processes now, and it appears that 
it is being corrupt somehow. Times when this happen show the file server to be 
not responding on the network. When the kernel oops happen (I will get a 
listing of all opps info the next time I hope), the machine appears to have 
been performing a paging operation. Furthermore, I believe the Apache problem 
is due to the scoreboard file becoming corrupt. We also have a Redhat 6.2 
machine with Samba 2.0.7 that has experienced 100% uptime in over 300 days. The 
Redhat 7.1 machines seems to fail every 18-35 days. The question becomes: is 
gcc 2.96 the root of these problems? I upgraded the kernel to 2.4.3-12 when 
that errata was released by Redhat. This did not fix the problem. I am 
reluctant to update further if the cause is gcc 2.96 as these machines are 
mission critical to the operation of our website.

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. Normal operation, non-existent to heavy loads.
2.
3.
	

Actual Results:  During normal operation, Samba dies as it can't find it's own 
processes in the scoreboard database. The same occurs with Apache, but not at 
the same time (thus far). Tux seems to have a thread running 100% on processor 
#1 as reported via top. At times the whole system will not respond, and the 
machine's reset button must be used. These problems do not occur at the same 
time.

Expected Results:  Normal operation.

Additional info:

A previous problem I had with these servers was data corruption on a MegaRaid 
controlled array. The systems are HP Lpr's, one with dual PIII 550MHz, and 
another with 850MHz. The Redhat 6.2 machine running Samba 2.0.7 is the same 
(550Mhz) and has had 0% downtime. I scraped the MegaRaid cards and had to go 
with the built in Symbios in the Lpr's to get anything working. After reading 
reports of gcc 2.96, I feel this is the cause of the MegaRaid problem, the 
Apache scoreboard file being corrupt, the Samba scoreboard database being 
corrupt, Tux locking up on processor #1, and the system completely locking up 
at times. I work in a 99.995% required uptime environment and absolutley do not 
have time to sort through the oops printout or anything else for that matter 
during down periods. I will, however, grabwhat I can during the next episode 
and update this report.

Comment 1 Arjan van de Ven 2002-01-22 17:18:41 UTC

The cause is not gcc 2.96. 2.96 has proven to be a very stable compiler and is
even recommended (next to *one* other gcc version) by Linus Torvalds for kernel
use (in fact, Linus himself uses 2.96). 

We released a 2.4.9 kernel for 7.1 with a much upgraded TUX, it might be worth
upgrading to that...

Comment 2 Ingo Molnar 2002-06-10 16:26:49 UTC

do current kernels still produce this problem?

Comment 3 william rose 2002-06-10 16:33:12 UTC

I have upgraded both machines to 7.2 and then applied the 2.4.9-31 kernel along 
with the needed mod utils and newer tux userspace rpms. So far, everything is 
fine. I would say the problem was in the older tux, this can/should be closed...

Note You need to log in before you can comment on or make changes to this bug.