Description of problem: I am having trouble sharing out a gfs filesystem via nfs. I have a two node cluster (active/passive) that is intended to provide nfs shares to a number of clients. Its appears that one of node crashes or both nodes hang when under a heavy (sustained reads and writes by 1 or more nfs clients for any length of time). The cluster appears to work fine for io directly on the nodes - for example I can run bonnie++ for several days on the nodes directly without problems, but running bonnie++ on the gfs file system over nfs causes a crash or a hang within a hour or so. The crashes result in a kernel "Opps" and require the crashed node to be reset. The hangs are a little more complicated - both nodes appear to "freeze" the gfs file system, and any gfs (gfs_tool df, umounts, etc ) related activity just hangs. I have been unable to find a clean way to recover from this situation - attempts to umount the file system just cause the umount to hang. The only way I have found to deal with the situation is to take down one of the nodes ethernet interface, so the other node notices that it is not receiving heartbeats, fences it, and then proceeds to continue on without any indication of any problems. I am using the "RHEL4 cluster" branch from cvs, and the 2.6.9-5.0.5.ELsmp kernel. I am using "lock_dlm" locking and the file system was created via: gfs_mkfs -r 1536 -j 3 -p lock_dlm -t ftp:dds_space /dev/mapper/ftp_space-erc1 My cluster configuration is pretty simple - sanbox2 fencing with two nodes and the two nodes option set (<cman two_node="1" expected_votes="1">). Version-Release number of selected component (if applicable): Cluster from cvs, current as of May 20, 2005, RHEL4 branch How reproducible: Every time. Steps to Reproduce: 1. mkgfs filesystem (gfs_mkfs -r 1536 -j 3 -p lock_dlm -t ftp:dds_space /dev/mapper/ftp_space-erc1) 2. mount the filesystem on two nodes (mount -f gfs /mnt/foo /dev/mapper/ftp_space-erc1 ) 3. mount the filesystem via NFS on a remote machine ( mount -f nfs host:/mnt/foo path-to-nfs-mounted-space) 4. Run bonnie++ on the filesystem on the remote machine ( bonnie++ -d path-to-nfs-mounted-space -f -x 1000) 5. Wait a hour or two for the master node to crash. Actual results: One of two things happen - Either a hang, or a kernel "Oops". In the case of the hangs, both nodes appear to "freeze" the gfs file system, and any gfs (gfs_tool df, umounts, etc ) related activity just hangs. I have been unable to find a clean way to recover from this situation - attempts to umount the file system just cause the umount to hang. The only way I have found to deal with the situation is to take down one of the nodes ethernet interface, so the other node notices that it is not receiving heartbeats, fences it, and then proceeds to continue on without any indication of any problems. In the case of the kernel "Oops", the master node crashes with this "Oops" message: Here is the output from one of the crashes: Jun 9 19:23:46 jin kernel: send_arp uses obsolete (PF_INET,SOCK_PACKET) Jun 9 19:28:06 jin kernel: Bad page state at prep_new_page (in process 'nfsd', page c159f4e0) Jun 9 19:28:06 jin kernel: flags:0x20001020 mapping:f6a300e0 mapcount:0 count:2 Jun 9 19:28:06 jin kernel: Backtrace: Jun 9 19:28:06 jin kernel: [<c013e669>] bad_page+0x58/0x89 Jun 9 19:28:06 jin kernel: [<c013e9ec>] prep_new_page+0x24/0x3a Jun 9 19:28:06 jin kernel: [<c013eef8>] buffered_rmqueue+0x17d/0x1a5 Jun 9 19:28:06 jin kernel: [<c013efd4>] __alloc_pages+0xb4/0x298 Jun 9 19:28:06 jin kernel: [<c013baa2>] find_lock_page+0x96/0x9d Jun 9 19:28:06 jin kernel: [<c013d16d>] generic_file_buffered_write+0x10d/0x47c Jun 9 19:28:06 jin kernel: [<c013bac1>] find_or_create_page+0x18/0x72 Jun 9 19:28:06 jin kernel: [<c013b775>] wake_up_page+0x9/0x29 Jun 9 19:28:06 jin kernel: [<c013d85e>] generic_file_aio_write_nolock+0x382/0x3b0 Jun 9 19:28:06 jin kernel: [<c013d910>] generic_file_write_nolock+0x84/0x99 Jun 9 19:28:06 jin kernel: [<f8f96e5f>] gfs_glock_nq+0xe3/0x116 [gfs] Jun 9 19:28:06 jin kernel: [<c011e8d2>] autoremove_wake_function+0x0/0x2d Jun 9 19:28:06 jin kernel: [<f8fb7658>] gfs_trans_begin_i+0xfd/0x15a [gfs] Jun 9 19:28:06 jin kernel: [<f8faadd2>] do_do_write_buf+0x268/0x3b4 [gfs] Jun 9 19:28:06 jin kernel: [<f8fab02e>] do_write_buf+0x110/0x152 [gfs] Jun 9 19:28:06 jin kernel: [<f8faa238>] walk_vm+0xd3/0xf7 [gfs] Jun 9 19:28:06 jin kernel: [<f8f9709a>] gfs_glock_dq+0x111/0x11f [gfs] Jun 9 19:28:06 jin kernel: [<f8fab10d>] gfs_write+0x9d/0xb6 [gfs] Jun 9 19:28:06 jin kernel: [<f8faaf1e>] do_write_buf+0x0/0x152 [gfs] Jun 9 19:28:06 jin kernel: [<f8faa238>] walk_vm+0xd3/0xf7 [gfs] Jun 9 19:28:06 jin kernel: [<f8f9709a>] gfs_glock_dq+0x111/0x11f [gfs] Jun 9 19:28:06 jin kernel: [<f8fab10d>] gfs_write+0x9d/0xb6 [gfs] Jun 9 19:28:06 jin kernel: [<f8faaf1e>] do_write_buf+0x0/0x152 [gfs] Jun 9 19:28:06 jin kernel: [<f8fab070>] gfs_write+0x0/0xb6 [gfs] Jun 9 19:28:06 jin kernel: [<c0155ba8>] do_readv_writev+0x1c5/0x21d Jun 9 19:28:06 jin kernel: [<c0154c92>] dentry_open+0xf0/0x1a5 Jun 9 19:28:06 jin kernel: [<c0155c7e>] vfs_writev+0x3e/0x43 Jun 9 19:28:06 jin kernel: [<f8c11b6b>] nfsd_write+0xeb/0x289 [nfsd] Jun 9 19:28:06 jin kernel: [<f8b2d5db>] svcauth_unix_accept+0x2d3/0x34a [sunrpc] Jun 9 19:28:06 jin kernel: [<f8c18356>] nfsd3_proc_write+0xbf/0xd5 [nfsd] Jun 9 19:28:06 jin kernel: [<f8c1a3a8>] nfs3svc_decode_writeargs+0x0/0x243 [nfsd] Jun 9 19:28:06 jin kernel: [<f8c0e5d7>] nfsd_dispatch+0xba/0x16f [nfsd] Jun 9 19:28:06 jin kernel: [<f8b2a446>] svc_process+0x420/0x6d6 [sunrpc] Jun 9 19:28:06 jin kernel: [<f8c0e3b7>] nfsd+0x1cc/0x332 [nfsd] Jun 9 19:28:06 jin kernel: [<f8c0e1eb>] nfsd+0x0/0x332 [nfsd] Jun 9 19:28:06 jin kernel: [<c01041f1>] kernel_thread_helper+0x5/0xb Jun 9 19:28:06 jin kernel: Trying to fix it up, but a reboot is needed Jun 9 19:30:34 jin kernel: ------------[ cut here ]------------ Jun 9 19:30:34 jin kernel: kernel BUG at mm/vmscan.c:377! Jun 9 19:30:34 jin kernel: invalid operand: 0000 [#1] Jun 9 19:30:34 jin kernel: SMP Jun 9 19:30:34 jin kernel: Modules linked in: lock_dlm(U) dlm(U) cman(U) gfs(U) lock_harness(U) dm_mod qla2300 qla2xxx scsi_transport_fc nfsd exportfs lockd autofs4 i2c_dev i2c_core md5 ipv6 sunrpc ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables button battery ac uhci_hcd ehci_hcd e1000 floppy ext3 jbd raid1 ata_piix libata sd_mod scsi_mod Jun 9 19:30:34 jin kernel: CPU: 1 Jun 9 19:30:34 jin kernel: EIP: 0060:[<c01447bd>] Tainted: GF B VLI Jun 9 19:30:34 jin kernel: EFLAGS: 00010202 (2.6.9-5.0.5.ELsmp) Jun 9 19:30:34 jin kernel: EIP is at shrink_list+0xa9/0x3ee Jun 9 19:30:34 jin kernel: eax: 20001049 ebx: f7cedecc ecx: c159f4f8 edx: c10f24d8 Jun 9 19:30:34 jin kernel: esi: c159f4e0 edi: 00000021 ebp: f7cedf58 esp: f7cede54 Jun 9 19:30:34 jin kernel: ds: 007b es: 007b ss: 0068 Jun 9 19:30:34 jin kernel: Process kswapd0 (pid: 44, threadinfo=f7ced000 task=f7d1b7b0) Jun 9 19:30:34 jin kernel: Stack: 00000001 00000000 00000000 00000000 f7cedecc f7cede68 f7cede68 00000000 Jun 9 19:30:34 jin kernel: 00000001 c12f4be0 c1204a00 00000246 f7ceded4 c0319e00 00000000 f7ceded4 Jun 9 19:30:34 jin kernel: c0143bc0 c10639f8 00000296 c1f479c0 c10639e0 00000000 00000020 f7ced000 Jun 9 19:30:34 jin kernel: Call Trace: Jun 9 19:30:34 jin kernel: [<c0143bc0>] __pagevec_release+0x15/0x1d Jun 9 19:30:34 jin kernel: [<c0144cdf>] shrink_cache+0x1dd/0x34d Jun 9 19:30:34 jin kernel: [<c014539d>] shrink_zone+0xa7/0xb6 Jun 9 19:30:34 jin kernel: [<c0145740>] balance_pgdat+0x1b6/0x2f8 Jun 9 19:30:34 jin kernel: [<c014594c>] kswapd+0xca/0xcc Jun 9 19:30:34 jin kernel: [<c011e8d2>] autoremove_wake_function+0x0/0x2d Jun 9 19:30:34 jin kernel: [<c02c6206>] ret_from_fork+0x6/0x14 Jun 9 19:30:34 jin kernel: [<c011e8d2>] autoremove_wake_function+0x0/0x2d Jun 9 19:30:34 jin kernel: [<c0145882>] kswapd+0x0/0xcc Jun 9 19:30:34 jin kernel: [<c01041f1>] kernel_thread_helper+0x5/0xb Jun 9 19:30:34 jin kernel: Code: 71 e8 89 50 04 89 02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 f0 0f ba 69 e8 00 19 c0 85 c0 0f 85 b8 02 00 00 8b 41 e8 a8 40 74 08 <0f> 0b 79 01 41 9a 2d c0 8b 41 e8 f6 c4 20 0f 85 96 02 00 Expected results: Not crashing or hanging. Additional info: here is my cluster.conf, in case it is needed: <?xml version="1.0"?> <cluster name="ftp" config_version="1"> <cman two_node="1" expected_votes="1"> </cman> <clusternodes> <clusternode name="jin-p"> <fence> <method name="single"> <device name="sanbox2" port="1"/> </method> </fence> </clusternode> <clusternode name="mugen-p"> <fence> <method name="single"> <device name="sanbox1" port="1"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="sanbox1" agent="fence_sanbox2" ipaddr="10.0.19.30" login="admin" passwd="p00-sm3llz"/> <fencedevice name="sanbox2" agent="fence_sanbox2" ipaddr="10.0.19.31" login="admin" passwd="p00-sm3llz"/> </fencedevices> <fence_daemon post_join_delay="20"> </fence_daemon> </cluster>
*** Bug 160225 has been marked as a duplicate of this bug. ***
*** Bug 160226 has been marked as a duplicate of this bug. ***
Are you still seeing this problem with the latest RHEL4 or -STABLE version of the software? We were not able to reproduce this one on our internal test environments for RHEL4 U1.
I am no longer having this problem once I upgraded to 2.6.9-11.EL and the RHEL4 branch of CVS from Jun 27.