Description of problem: One can crash a NFS-server by simply making a sufficiently large file. Version-Release number of selected component (if applicable): Fedora 11 patched till Oct 28th 2009. kernels 2.6.30.5-43.fc11 and beyond: crash kernels 2.6.29.6-213.fc11 and earlier: no problems How reproducible: Always Steps to Reproduce: 1. mount a filesystem over nfs (mount cv01:/ /mnt) 2. dd if=/dev/zero of=/mnt/dump.dmp (or anything generating a suff. large file) 3. Wait; usually for 200MB to 20GB (long before FS fills up); see server hang Actual results: The NFS server hangs: the machines still pings, but is completely frozen. There is absolutely nothing in the logfiles. Boot is clean, FS recovers clean and file is there with more or less expected size. Expected results: No crash. Additional info: FS exported on NFS server is ext4 in LVM; network is ipv4 over GBit; Things tested: local mount over NFS on the server: crashes server too mounting ext4 on server with or without barriers makes no difference mounting in client over NFS3 of NFS4, makes no difference (nsf3 seems to crash faster) exporting NFS on server 'sync' seems to delay (but not avoid) the crash. kernel-2.6.30.9-94.fc11 from Koji has the problem too
Tested up to 2.6.31-0.94.rc4.fc12 from Koji. Have problem too. Tested mounting with quota switched off: makes no difference.
Tested on Debian kernel 2.6.30-2-686 vers. 2.6.30-8: not affected Tested on Fedora 11, but with vanilla 2.6.30.9: Crashed too.
Happens only if underlying FS is ext4. If the ext4 is mounted '-nodelalloc', the problem disappears. That is a sufficient (temporary) workaround for me.
I'll try to reproduce, but when things hang, doing: # echo w > /proc/sysrq-trigger # dmesg > dmesg.out will give us traces of all the stuck tasks.
Remote or on the console, nothing goes; so I cannot easily save the result to a file... (If really needed, I can attach a serial console) So with Alt-SysRq-w: That gives something along like (typed over from screen) <Alt-SysRq-w> hald-addon-storage automount nscd nfsd4 nfsd master After some time (minutes) of apparent idleness, this trace appears: (again, typed over, so ignore the typos) spin_unlock_bh rpc_execute rpc_execute rpc_run_task nfs_write_rpcsetup lookup_tag ext4_get_blocks_wrap mpage_da_map_blocks mpage_da_write_page write_cache_pages mpage_da_writepage ext4_da_writepages da_write_pages writeback_single_inode dm_any_congested generic_sb_inodes writeback_inodes background_writeout pdflush background_writeout pdflush kthread kthread kthread_helper I tried it several times: the 10 top function names change each time. Anything else I can do?
2.6.30.9-96.fc11 hangs too, after just a couple of GB written.
Ok, trying to reproduce this now; sorry for the delay, juggling a few bugs lately. -Eric
Cannot reproduce with 2.6.30.9-99.fc11. So maybe this was related with * Mon Nov 16 2009 Eric Sandeen <sandeen> 2.6.30.9-97 - Fix ext4 preallocation-related corruption (#513221) 2.6.31.6-145.fc12 is still affected though.
The patch for f11 came from 2.6.31, so I doubt that's the fix, if the fc12 kernel still has the problem. Also, nfs shouldn't be doing any preallocation AFAIK. I haven't yet been able to reproduce this but will keep trying ... -Eric
This also seems to manifest on RHEL5.4 GFS
2.6.31.6-162.fc12.i686.PAE (Fedora 12) is affected.
2.6.31.6-166.fc12.i686 (Fedora 12) is affected.
Alan, GFS is likely a completely different bug, please escalate the RHEL5.4 issue though your support contacts. Bert, thanks for the updates; I don't expect that incremental fc12 kernels -will- fix it, because the root cause has not yet been identified and fixed. If only I could reproduce it ... If there is any possible way to get sysrq-t output off the box when it's stuck, in unedited format, that would be helpful. Thanks, -eric
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.