From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; it; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 Description of problem: I'm running FC4 on a biprocessor server (Intel(R) Xeon(TM) CPU 3.06GHz), with two scsi subsystem attached. I got weekly kernel errors such as this: ----------- begin paste from /var/log/messages ----------- Feb 9 05:07:32 adone kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Feb 9 05:07:32 adone kernel: printing eip: Feb 9 05:07:32 adone kernel: c0157614 Feb 9 05:07:32 adone kernel: *pde = 01e39001 Feb 9 05:07:32 adone kernel: Oops: 0000 [#1] Feb 9 05:07:32 adone kernel: SMP Feb 9 05:07:32 adone kernel: last sysfs file: /class/vc/vcsa2/dev Feb 9 05:07:32 adone kernel: Modules linked in: nfs nfsd exportfs lockd nfs_acl ipv6 parport_pc lp parport autofs4 sunrpc dm_mod video button battery ac uhci _hcd i2c_i801 i2c_core e1000 floppy ext3 jbd aic7xxx scsi_transport_spi sd_mod scsi_mod Feb 9 05:07:32 adone kernel: CPU: 0 Feb 9 05:07:32 adone kernel: EIP: 0060:[<c0157614>] Not tainted VLI Feb 9 05:07:32 adone kernel: EFLAGS: 00010246 (2.6.15-1.1830_FC4smp) Feb 9 05:07:32 adone kernel: EIP is at page_address+0x6/0x91 Feb 9 05:07:32 adone kernel: eax: 00000000 ebx: 00000000 ecx: 00000008 edx: 00000008 Feb 9 05:07:32 adone kernel: esi: f594b038 edi: 00000007 ebp: f7d67600 esp: f7f71f20 Feb 9 05:07:32 adone kernel: ds: 007b es: 007b ss: 0068 Feb 9 05:07:32 adone kernel: Process nfsd (pid: 2290, threadinfo=f7f71000 task=f7fd5000) Feb 9 05:07:32 adone kernel: Stack: 00001000 f594b038 00000007 f7d67600 f8c70e4f f594b000 f5958070 f7d67600 Feb 9 05:07:32 adone kernel: f8c91198 f8c70d46 d94c2014 f8c63666 f7f71f00 0bfb14ac f8bb2f27 f7d67600 Feb 9 05:07:32 adone kernel: f8c91198 f7d67600 f8c913d8 f7d67664 f8bb06c4 f7f71fd0 00000000 0000003d Feb 9 05:07:32 adone kernel: Call Trace: Feb 9 05:07:32 adone kernel: [<f8c70e4f>] nfs3svc_decode_readargs+0x109/0x16d [nfsd] [<f8c70d46>] nfs3svc_decode_readargs+0x0/0x16d [nfsd] Feb 9 05:07:32 adone kernel: [<f8c63666>] nfsd_dispatch+0x4d/0x1c7 [nfsd] [<f8bb2f27>] svc_authenticate+0x97/0xae [sunrpc] Feb 9 05:07:32 adone kernel: [<f8bb06c4>] svc_process+0x3a1/0x65d [sunrpc] [<f8c63458>] nfsd+0x184/0x345 [nfsd] Feb 9 05:07:32 adone kernel: [<c01040a2>] work_resched+0x5/0x16 [<f8c632d4>] nfsd+0x0/0x345 [nfsd] Feb 9 05:07:32 adone kernel: [<c010243d>] kernel_thread_helper+0x5/0xb Feb 9 05:07:32 adone kernel: Code: 0d 08 f1 4a c0 85 c9 75 ec 0f 0b e2 01 12 70 34 c0 eb e2 69 c0 01 00 37 9e c1 e8 19 c1 e0 07 05 80 f9 4a c0 c3 55 57 56 53 89 c3 <8b> 00 c1 e8 1e 8b 14 85 1c f9 3f c0 8b 82 0c 12 00 00 05 80 37 Feb 9 05:07:32 adone kernel: Continuing in 120 seconds. ------------- end paste from /var/log/messages ------------- The last line is new since kernel 2.6.15-1.1830, because with previous versions (kernel-2.6.12-1.1447_FC4, kernel-2.6.13-1.1532_FC4, kernel-2.6.14-1.1637_FC4) after 4 errors like the one above I had to manually reboot the server to have nfsd running again. Now it seems to automatically recover in some way, since I see the errors but nfs is actually working. I didn't manage to relate this errors to any particular server's task or process. I apologize in advance for my lack of knowledge, I'm sorry I don't know how to be more specific. Version-Release number of selected component (if applicable): kernel-2.6.15-1.1830_FC4smp How reproducible: Sometimes Steps to Reproduce: 1.install FC4 on a biprocessor (Intel(R) Xeon(TM) CPU 3.06GHz) with two scsi subsystems 2.export something in nfs 3.wait a couple of days Additional info: I don't know if it's useful... [root@adone ~]# cat /etc/exports /terabox svradar.metarpa(ro,async,no_subtree_check) /ambox sibilla.metarpa(ro,async,no_subtree_check) [root@adone ~]# df Filesystem blocchi di 1K Usati Disponib. Uso% Montato su /dev/sda2 18930940 8979636 8974152 51% / /dev/sda1 101086 35410 60457 37% /boot /dev/shm 515604 0 515604 0% /dev/shm /dev/sda4 1707407264 1106936560 513739384 69% /systera /dev/sdb1 1730598456 831203484 811485688 51% /terabox [root@adone ~]# lsmod Module Size Used by nfs 217001 0 nfsd 231377 15 exportfs 10305 1 nfsd lockd 64585 3 nfs,nfsd nfs_acl 7873 2 nfs,nfsd ipv6 273825 86 parport_pc 31877 1 lp 16905 0 parport 39561 2 parport_pc,lp autofs4 23621 2 sunrpc 150397 18 nfs,nfsd,lockd,nfs_acl dm_mod 61273 0 video 20165 0 button 10705 0 battery 13509 0 ac 8901 0 uhci_hcd 37073 0 i2c_i801 13005 0 i2c_core 25793 1 i2c_i801 e1000 111917 0 floppy 66181 0 ext3 135241 4 jbd 62037 1 ext3 aic7xxx 154229 5 scsi_transport_spi 25153 1 aic7xxx sd_mod 23105 7 scsi_mod 139497 3 aic7xxx,scsi_transport_spi,sd_mod
>Now it seems to automatically recover in some way, >since I see the errors but nfs is actually working. This was untrue. After two more errors like the one above, I have this situation: [root@adone ~]# service nfs status Arresto di NFS mountd: rpc.mountd (pid 31571) in esecuzione... nfsd interrotto rpc.rquotad (pid 31568) in esecuzione... [root@adone ~]# service nfs restart Arresto di NFS mountd: [ OK ] Arresto del demone NFS: [FALLITO] Arresto di quotas NFS: [ OK ] Arresto dei servizi NFS: [ OK ] Avvio dei servizi NFS: [ OK ] Avvio di quotas NFS: [ OK ] Avvio demone NFS: [ OK ] Avvio di NFS mountd: [ OK ] [root@adone ~]# tail /var/log/messages Feb 13 11:21:19 adone rpc.mountd: Caught signal 15, un-registering and exiting. Feb 13 11:21:20 adone nfsd[31569]: nfssvc: Setting version failed: errno 16 (Device or resource busy) Feb 13 11:21:20 adone rpc.idmapd: nfsdreopen: Opening '' failed: errno 2 (No such file or directory) [root@adone ~]# service nfs status Arresto di NFS mountd: rpc.mountd (pid 31571) in esecuzione... nfsd interrotto rpc.rquotad (pid 31568) in esecuzione...
- Still having problems under 2.6.16-1.2069 - Bugs happen when trying to copy relatively large amount of data (about 500mb) - The partition I'm actually exporting is 1.6 Tb, ext3. Everybody told me is a dumb thing to use ext3 for such a huge partition, but I'm unable to change the filesystem until I get another subsystem for backup. Could my problems be related to this?
ext3 on FC4 should be safe up to 8TB, and has been tested on such systems, so I have no reason to think that any NFS errors are related to the large ext3 fs.
[This comment added as part of a mass-update to all open FC4 kernel bugs] FC4 has now transitioned to the Fedora legacy project, which will continue to release security related updates for the kernel. As this bug is not security related, it is unlikely to be fixed in an update for FC4, and has been migrated to FC5. Please retest with Fedora Core 5. Thank you.
Eventually I found out that the problem was referable to an old alpha tru64 machine that was mounting the nfs partition. I needed badly to keep the server going, so I just removed that client from /etc/exports . At this point I can't tell if the problem was ascribable to the architecture of that particular nfs client or its net situation (a little bit messy) or maybe the whole thing was actually regarding some kernel bug. I'm changing this bug to "WORKSFORME", since I'm unable to do further investigations... Thank you very much Stephen for the information about large ext3 fs. Some dumb colleague told me that it was the main cause of my problems.