Bug 246268
Summary: | NFS failover causes VM deadlock | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Kris Corwin <kris.corwin> | ||||
Component: | kernel | Assignee: | Jeff Layton <jlayton> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Martin Jenner <mjenner> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.0 | CC: | lwoodman, staubach, steved | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-07-06 20:01:18 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Kris Corwin
2007-06-29 18:09:10 UTC
Created attachment 158232 [details]
typescript of loop doing cat meminfo every 10 seconds
I forced a stack trace dump and executed a dmesg at the end of the meminfo attachment. The reproducer here involves a known problematic configuration -- namely, attempting to relocate an NFS service onto a client of that service. There are quite a few problems associated with this configuration -- both in kernel and userspace and so it's not a situation that we support. Peter Zijlstra has some changes upstream which may help some with this. However, they are pervasive and not a good candidate for backporting to any of our existing releases. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. What specific part of the configuration is unsupported? I can reproduce the hang using only 1 box. filesystem locally mounted and NFS exported. The same machine NFS mounts it. /dev/sda on /mnt/local type ext3 (rw) nfsd on /proc/fs/nfsd type nfsd (rw) nfsserver:/mnt/local on /mnt/NFS type nfs (rw,intr,addr=192.168.2.100) [root@f6 2.6.18-8.el5-i686]# cat /etc/exports /mnt/local *(rw,no_root_squash,async,insecure,fsid=3) Start writing a 4G file to the NFS mount. Stop NFS and try to unmount the local filesystem. It will block waiting for pages. ======================= umount D 50D9BE9E 1928 10010 9972 (NOTLB) f3635cec 00000082 00000001 50d9be9e 000227e5 000227d3 00000007 c21fe000 c2138aa0 50d9d052 000227e5 000011b4 00000001 c21fe10c c20144e0 c042cdf2 c213e000 f3635cf4 00000286 c042cf03 00000000 00000286 243cdf63 243cdf63 Call Trace: [<c042cdf2>] lock_timer_base+0x15/0x2f [<c042cf03>] __mod_timer+0x99/0xa3 [<c05fa67c>] schedule_timeout+0x71/0x8c [<c042c517>] process_timeout+0x0/0x5 [<c05fa141>] io_schedule_timeout+0x3b/0x61 [<c04d1beb>] blk_congestion_wait+0x53/0x67 [<c04352a1>] autoremove_wake_function+0x0/0x2d [<c0452876>] balance_dirty_pages_ratelimited_nr+0x147/0x1ad [<c044eed9>] generic_file_buffered_write+0x4be/0x5f1 [<c046b323>] __getblk+0x30/0x270 [<c0427f65>] current_fs_time+0x4a/0x55 [<c044f4b2>] __generic_file_aio_write_nolock+0x4a6/0x52a [<c0451c99>] get_page_from_freelist+0x2a6/0x310 [<c044f58d>] generic_file_aio_write+0x57/0xaa [<f888ee89>] ext3_file_write+0x19/0x83 [ext3] [<c04691e6>] do_sync_write+0xb6/0xf1 [<c04352a1>] autoremove_wake_function+0x0/0x2d [<c05fc7cc>] do_page_fault+0x2c7/0x5d5 [<c05fc841>] do_page_fault+0x33c/0x5d5 [<c0469daf>] generic_file_llseek+0x8f/0x9a [<c0469130>] do_sync_write+0x0/0xf1 [<c0469a9f>] vfs_write+0xa1/0x143 [<c046a091>] sys_write+0x3c/0x63 [<c0403eff>] syscall_call+0x7/0xb ======================= The part that is unsupported and does not work reliably is the part where the client and server are running on the same system, in the same operating system. This is a well known problem and is very difficult to fix and will not be fixed in an existing RHEL release. Thank you. Could you please confirm running the NFS client and server on the same box IS supported if they are not the same filesystem. Yes, there's no problem running a host that is a NFS client and server, as long as it's not serving to itself. |