177075 – httpd related kernel crash

Bug 177075 - httpd related kernel crash

Summary: httpd related kernel crash

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	3
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-01-06 00:19 UTC by William Lovaton
Modified:	2015-01-04 22:24 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-01-06 02:21:45 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
/var/log/messages from the crash with several httpd related backtraces (273.53 KB, text/plain) 2006-01-06 00:22 UTC, William Lovaton	no flags	Details
Output of lspci -v (6.19 KB, text/plain) 2006-01-06 00:24 UTC, William Lovaton	no flags	Details
Output of dmesg (25.35 KB, text/plain) 2006-01-06 00:26 UTC, William Lovaton	no flags	Details
View All

Description William Lovaton 2006-01-06 00:19:19 UTC

Today I got a FC3 kernel crash (2.6.12-1.1378_FC3smp) on my production server. 
It runs a huge web app with Apache and PHP and handles a big number of
concurrent users.

Certainly, those logs and traces are new to me, I have never experienced a
problem like that with all the operating systems we were using (RH9 -> FC2 ->
FC3) during the 3 years of life of our web system.  The only way to get the
system back was to make a hard reboot.

The server is an IBM xSeries 445, 4GB of RAM, 4 HT processors with a second
(disabled) bank of processors that would boost it to 18GB of RAM and 8 HT
processors. So we are kind of half powered here.

FC3 in general is very stable and these crashes are hard to reproduce.

I know this is not the latest kernel for FC3 but I'll try to update to the
latest software available in the repositories.

Any idea about the cause of this problem?  besides, is there any sign that there
will be a last FC3 kernel update just before FC3 EOL?

I'll attach the output of dmesg, lspci -v and /var/log/messages which shows
several kernel backtraces and some other kind of kernel info.

Comment 1 William Lovaton 2006-01-06 00:22:05 UTC

Created attachment 122856 [details]
/var/log/messages from the crash with several httpd related backtraces

Comment 2 William Lovaton 2006-01-06 00:24:22 UTC

Created attachment 122857 [details]
Output of lspci -v

Comment 3 William Lovaton 2006-01-06 00:26:47 UTC

Created attachment 122858 [details]
Output of dmesg

If you can seen any anomaly with the hardware and/or the kernel, could you
please point it out?

Comment 4 Dave Jones 2006-01-06 02:21:45 UTC

basically you ran out of memory.

here's how your memory is laid out..

  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 225280 pages, LIFO batch:31
  HighMem zone: 950272 pages, LIFO batch:31

When you put a lot of RAM into 32bit machines, the lower 896MB of memory (Your
'Normal zone' has to contain pagetable pointers for every 4KB of memory in the
system.

Certain allocations can also only work if they come from the lower 16MB of
memory (such as DMA for certain device drivers).

If an allocation for a 'normal zone' page fails, it falls back to the dma zone.
However with dma zone being so small, bad things happen when this gets depleted.
Because your normal zone is filled with pagetables, it's falling back to the dma
zone for more and more pages, and then when a real 'ZONE_DMA' request comes in,
there's nothing left.

Later kernels have had some zone balancing changes which may fix this (or at
least keep things running albeit at a crawl until the memory usage backs off).
The changes however are massive, and not really an option for backporting to the
FC3 kernel, which is only going to get an update now if some really bad security
problem came up.

Comment 5 William Lovaton 2006-01-06 12:31:55 UTC

Thanx for your insights... Any idea why the memory got so full? Never seen
something like this with these systems.  Is there a possibility of a DoS attack?
 Where should I look?

Comment 6 Dave Jones 2006-01-12 04:20:50 UTC

httpd logs maybe ?

Comment 7 William Lovaton 2006-01-12 13:38:45 UTC

How could that be?

Yes, our logs are huge, it gets about 8 millions hits per day and they are more
than 4 GB every week before rotate them (logrotate)... this is a very standard
FC3 server.

Is this a problem?

Note You need to log in before you can comment on or make changes to this bug.