Today I got a FC3 kernel crash (2.6.12-1.1378_FC3smp) on my production server. It runs a huge web app with Apache and PHP and handles a big number of concurrent users. Certainly, those logs and traces are new to me, I have never experienced a problem like that with all the operating systems we were using (RH9 -> FC2 -> FC3) during the 3 years of life of our web system. The only way to get the system back was to make a hard reboot. The server is an IBM xSeries 445, 4GB of RAM, 4 HT processors with a second (disabled) bank of processors that would boost it to 18GB of RAM and 8 HT processors. So we are kind of half powered here. FC3 in general is very stable and these crashes are hard to reproduce. I know this is not the latest kernel for FC3 but I'll try to update to the latest software available in the repositories. Any idea about the cause of this problem? besides, is there any sign that there will be a last FC3 kernel update just before FC3 EOL? I'll attach the output of dmesg, lspci -v and /var/log/messages which shows several kernel backtraces and some other kind of kernel info.
Created attachment 122856 [details] /var/log/messages from the crash with several httpd related backtraces
Created attachment 122857 [details] Output of lspci -v
Created attachment 122858 [details] Output of dmesg If you can seen any anomaly with the hardware and/or the kernel, could you please point it out?
basically you ran out of memory. here's how your memory is laid out.. DMA zone: 4096 pages, LIFO batch:1 Normal zone: 225280 pages, LIFO batch:31 HighMem zone: 950272 pages, LIFO batch:31 When you put a lot of RAM into 32bit machines, the lower 896MB of memory (Your 'Normal zone' has to contain pagetable pointers for every 4KB of memory in the system. Certain allocations can also only work if they come from the lower 16MB of memory (such as DMA for certain device drivers). If an allocation for a 'normal zone' page fails, it falls back to the dma zone. However with dma zone being so small, bad things happen when this gets depleted. Because your normal zone is filled with pagetables, it's falling back to the dma zone for more and more pages, and then when a real 'ZONE_DMA' request comes in, there's nothing left. Later kernels have had some zone balancing changes which may fix this (or at least keep things running albeit at a crawl until the memory usage backs off). The changes however are massive, and not really an option for backporting to the FC3 kernel, which is only going to get an update now if some really bad security problem came up.
Thanx for your insights... Any idea why the memory got so full? Never seen something like this with these systems. Is there a possibility of a DoS attack? Where should I look?
httpd logs maybe ?
How could that be? Yes, our logs are huge, it gets about 8 millions hits per day and they are more than 4 GB every week before rotate them (logrotate)... this is a very standard FC3 server. Is this a problem?