From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc3 Firefox/1.0.6 Description of problem: Our production file server crashed last night with the following message: Unable to handle kernel paging request at virtual address 4b038516 As I'm no expert in kernel panics, i cannot determine what this might be caused by, but it's quite annoying and we had similar problems when the machine was running AS 3. The machine was upgraded to AS 4 just a week ago. I'll attach the full kernel panic message, as obtained by netdump. I also got a vmcore kernel image, if you need that. Some info on the machine in question: [root@atlantis ~]# uname -a Linux atlantis 2.6.9-11.ELsmp #1 SMP Fri May 20 18:26:27 EDT 2005 i686 i686 i386 GNU/Linux [root@atlantis ~]# lsmod Module Size Used by nfs 200869 9 nfsd 205281 9 exportfs 10049 1 nfsd lockd 65257 3 nfs,nfsd md5 8001 1 ipv6 238817 16 parport_pc 27905 0 lp 15405 0 parport 37641 2 parport_pc,lp netconsole 10717 0 netdump 16097 0 autofs4 22085 6 i2c_dev 14273 0 i2c_core 25921 1 i2c_dev sunrpc 138789 31 nfs,nfsd,lockd ipt_REJECT 10561 2 iptable_filter 6721 1 ip_tables 21441 2 ipt_REJECT,iptable_filter dm_mod 58949 0 button 10449 0 battery 12869 0 ac 8773 0 hw_random 9557 0 e1000 84389 0 floppy 58065 0 ext3 118729 9 jbd 59481 1 ext3 i2o_block 16717 5 i2o_core 41949 1 i2o_block aic7xxx 146553 6 sd_mod 20545 12 scsi_mod 116429 2 aic7xxx,sd_mod [root@atlantis ~]# cat /etc/redhat-release Red Hat Enterprise Linux AS release 4 (Nahant Update 1) [root@atlantis ~]# cat /proc/meminfo MemTotal: 1034676 kB MemFree: 48908 kB Buffers: 119960 kB Cached: 514440 kB SwapCached: 0 kB Active: 518376 kB Inactive: 346932 kB HighTotal: 131008 kB HighFree: 252 kB LowTotal: 903668 kB LowFree: 48656 kB SwapTotal: 2048276 kB SwapFree: 2048116 kB Dirty: 656 kB Writeback: 0 kB Mapped: 244316 kB Slab: 104128 kB Committed_AS: 432888 kB PageTables: 3572 kB VmallocTotal: 106488 kB VmallocUsed: 4612 kB VmallocChunk: 101560 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB [root@atlantis ~]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) XEON(TM) CPU 2.40GHz stepping : 4 cpu MHz : 2392.876 cache size : 512 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4718.59 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) XEON(TM) CPU 2.40GHz stepping : 4 cpu MHz : 2392.876 cache size : 512 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4767.74 processor : 2 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) XEON(TM) CPU 2.40GHz stepping : 4 cpu MHz : 2392.876 cache size : 512 KB physical id : 3 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4767.74 processor : 3 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) XEON(TM) CPU 2.40GHz stepping : 4 cpu MHz : 2392.876 cache size : 512 KB physical id : 3 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4767.74 Version-Release number of selected component (if applicable): 2.6.9-11.ELsmp How reproducible: Didn't try Actual Results: Crashes at random intervals Expected Results: Not crash! Additional info:
Created attachment 118378 [details] Full kernel oops messages
This *might* be fixed in the u2 beta kernel. Can you please give 2.6.9-18.EL a whirl: http://people.redhat.com/~jbaron/rhel4/. thanks.
Sorry, I cannot let a beta kernel loose on our production fileserver... When are the U2 patchwork released as production?
fair enough, eta is currently about 2 weeks.
I'm sorry to say, that this new kernel didn't solve our problems. We have reason to believe, that this problem occours when there is high io, because it ususally happens when our backup system (tsm) is running, even in memory efficient mode (we have verified that the process doesn't consume more than 100M resident memory). We do have a brand new kernel dump, if you're interested. I'm also attaching the new netdump log...
Created attachment 120447 [details] Netdump log of oops from crash at 2005-10-27
We are still experiencing random crashes on our production machines, so this issue is of high importance to us. From one of the latest crashes, yesterday, I've got a kernel dump through netdump, and a log file, if you are interested in this. We are highly suspecting autofs4 to be the catalyst for this behaviour, since the problem has gone away on another server, after we disabled autofs. However, it is not possible to diable autofs on all servers, since the one that crashed yesterday is a web server that needs to mount public_html catalogues on user accounts on other nfs servers.
Created attachment 127612 [details] Ooops message from the syslog We've just seen this issue on a production fileserver that we recently transitioned to RHEL4. It's running kernel 2.6.9-22.0.2.ELsmp on dual Athlon MP 1900+ with 2GB of RAM. This machine also uses autofs fairly heavily.
note that this is very likely a duplicate of 173843, which we are actively working....thanks.
(In reply to comment #11) > note that this is very likely a duplicate of 173843, which we are actively > working....thanks. I agree--I just noticed this following in my syslogs a few minutes before the crash, which seems to be consistent with Bug 173843: Apr 10 22:07:14 fileserver kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day... Apr 10 22:07:14 fileserver kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day... Any ballpark yet for a resolution on Bug 173843?
This does appear to be a duplicate of 173843, so I will close it as such. If it does turn out to be a duplicate, then we can reopen this report and investigate it more then. *** This bug has been marked as a duplicate of 173843 ***