Bug 633567

Summary:

[LSI 6.1 bug] [CR184101] - RHCS: Node reboot hangs on shut down and causes hung processes throughout the cluster

Product:

Red Hat Enterprise Linux 6

Reporter:

Sean Stewart <Sean.Stewart>

Component:

cluster

Assignee:

Ryan O'Hara <rohara>

Status:

CLOSED NOTABUG

QA Contact:

Cluster QE <mspqa-list>

Severity:

high

Docs Contact:

Priority:

low

Version:

6.1

CC:

abdel.sadek, andriusb, ccaulfie, chris.chavez, cluster-maint, dl-iop-bugzilla, jwest, lhh, rpacheco, rpeterso, Sean.Stewart, swhiteho, teigland

Target Milestone:

Target Release:

6.1

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-02-02 15:48:32 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

580566

Attachments:

Description	Flags
cluster.conf	none
Shows the soft panic and some of the messages around it, as a node leaves the cluster	none
Current cluster.conf file	none
Current cluster.conf file	none

Description Sean Stewart 2010-09-14 00:10:13 UTC

Description of problem:
I am running a RHCS cluster on RHEL6 Snapshot 13.  I have added the cluster.conf file as an attachment.  This configuration is running various 10Gb NICs/CNAs, set up for iSCSI and connected to two LSI storage arrays. There are 24 LUNs mapped to the host group, which are pooled and divided into 8 logical LUNs.
As the cluster.conf file shows, there are four nodes, four IP address resources, four services, and the nodes use SCSI reservation fencing.

Version-Release number of selected component (if applicable):


How reproducible:
Often. May not happen at first, but once the soft panic occurs, it seems like nodes will no longer reboot properly.

Steps to Reproduce:
1. Create a 4 Host RHCS cluster using SCSI reservation fencing as described in the cluster.conf file
2. Mount the NFS shares from another host and run I/O
3. Reboot a single node
  
Actual results:
Clustat indicates the IP service transfers within a minute of the node shutting down. At this point in the test, pinging the ip shows the ip address is reachable and ssh access proves it is with the correct node. Trying to access the mounted share will show that I/O for some reason cannot get through to the device. The I/O times out at the application layer after 10 minutes. 

Expected results:
The IP service should transfer to the next node as described in the failover domain for the service.  I/O should continue with no interruption as it is routed through the host that now owns the service.

Additional info:
As far as I can tell, I would guess that the hung tasks are not the cause of the problem, but rather are the victims. The symptoms are inconsistent, in that I have been able to shut down or reboot a node without a noticeable problem, whereas other times it will result in the system hanging in shut down, and all other nodes experiencing a soft panic from the hung tasks.

LOGS

On the node that is rebooting I have sometimes seen this message during the hang:

GFS2: fsid=clus-1284028638: gfs_vol5.3: gfs2_quotad: statfs error -5
INFO: task gfs2_quota:3496 blocked for more than 120 seconds


Here is an example of the soft panic (as seen on the other nodes in the cluster):

Sep 13 15:14:39 kswc-nightrod rgmanager[4026]: Member 2 shutting down
Sep 13 15:15:45 kswc-nightrod corosync[2128]:   [TOTEM ] A processor failed, forming new configuration.
Sep 13 15:15:55 kswc-nightrod corosync[2128]:   [QUORUM] Members[3]: 1 3 4
Sep 13 15:15:55 kswc-nightrod corosync[2128]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 13 15:15:55 kswc-nightrod kernel: dlm: closing connection to node 2
Sep 13 15:15:56 kswc-nightrod corosync[2128]:   [CPG   ] downlist received left_list: 1
Sep 13 15:15:56 kswc-nightrod corosync[2128]:   [CPG   ] downlist received left_list: 1
Sep 13 15:15:56 kswc-nightrod corosync[2128]:   [CPG   ] downlist received left_list: 1
Sep 13 15:15:56 kswc-nightrod corosync[2128]:   [CPG   ] chosen downlist from node r(0) ip(135.15.74.122) 
Sep 13 15:15:56 kswc-nightrod corosync[2128]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep 13 15:15:56 kswc-nightrod fenced[2931]: fencing node kswc-maverick
Sep 13 15:15:56 kswc-nightrod kernel: GFS2: fsid=clus-1284028638:gfs_vol1.1: jid=3: Trying to acquire journal lock...
Sep 13 15:15:56 kswc-nightrod kernel: GFS2: fsid=clus-1284028638:gfs_vol2.1: jid=3: Trying to acquire journal lock...
Sep 13 15:15:56 kswc-nightrod fenced[2931]: fence kswc-maverick dev 0.0 agent fence_scsi result: error from agent
Sep 13 15:15:56 kswc-nightrod fenced[2931]: fence kswc-maverick failed
Sep 13 15:15:59 kswc-nightrod fenced[2931]: fencing node kswc-maverick
Sep 13 15:15:59 kswc-nightrod fenced[2931]: fence kswc-maverick dev 0.0 agent fence_scsi result: error from agent
Sep 13 15:15:59 kswc-nightrod fenced[2931]: fence kswc-maverick failed
Sep 13 15:16:02 kswc-nightrod fenced[2931]: fencing node kswc-maverick
Sep 13 15:16:02 kswc-nightrod fenced[2931]: fence kswc-maverick dev 0.0 agent fence_scsi result: error from agent
Sep 13 15:16:02 kswc-nightrod fenced[2931]: fence kswc-maverick failed
Sep 13 15:18:12 kswc-nightrod kernel: INFO: task kslowd000:3314 blocked for more than 120 seconds.
Sep 13 15:18:12 kswc-nightrod kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 13 15:18:12 kswc-nightrod kernel: kslowd000     D ffff8801ffc24600     0  3314      2 0x00000080
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801db5e1968 0000000000000046 0000000000000000 ffff880000037c20
Sep 13 15:18:12 kswc-nightrod kernel: 0000000000000000 ffff880100000041 ffff8801db5e19a8 00000001001c248f
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801f63065f8 ffff8801db5e1fd8 0000000000010518 ffff8801f63065f8
Sep 13 15:18:12 kswc-nightrod kernel: Call Trace:
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814ca6b5>] rwsem_down_failed_common+0x95/0x1d0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8111e1c6>] ? __alloc_pages_nodemask+0xf6/0x810
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814ca846>] rwsem_down_read_failed+0x26/0x30
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff812602cc>] ? put_dec+0x10c/0x110
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81264224>] call_rwsem_down_read_failed+0x14/0x30
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c9d44>] ? down_read+0x24/0x30
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa04b8782>] dlm_lock+0x62/0x1e0 [dlm]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81155aaa>] ? kmem_getpages+0xba/0x170
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81262a54>] ? vsnprintf+0x484/0x5f0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05d39a2>] gdlm_lock+0xe2/0x120 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05d3ae0>] ? gdlm_ast+0x0/0x110 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05d39e0>] ? gdlm_bast+0x0/0x50 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b7aee>] do_xmote+0x16e/0x250 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81262cb4>] ? snprintf+0x34/0x40
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b7cc2>] run_queue+0xf2/0x170 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b7f41>] gfs2_glock_nq+0x171/0x370 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b8329>] gfs2_glock_nq_num+0x69/0x90 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05cbfc3>] gfs2_recover_work+0x93/0x7b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810116e0>] ? __switch_to+0xd0/0x320
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81059d12>] ? finish_task_switch+0x42/0xd0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b8321>] ? gfs2_glock_nq_num+0x61/0x90 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81103113>] slow_work_execute+0x233/0x310
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81103347>] slow_work_thread+0x157/0x3a0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff811031f0>] ? slow_work_thread+0x0/0x3a0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091936>] kthread+0x96/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810918a0>] ? kthread+0x0/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
Sep 13 15:18:12 kswc-nightrod kernel: INFO: task gfs2_quotad:3370 blocked for more than 120 seconds.
Sep 13 15:18:12 kswc-nightrod kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 13 15:18:12 kswc-nightrod kernel: gfs2_quotad   D ffff8801ffc24800     0  3370      2 0x00000080
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801e1733c20 0000000000000046 0000000000000000 ffffffffa04ba98d
Sep 13 15:18:12 kswc-nightrod kernel: 0000000000000000 ffff8801f6289800 ffff8801e1733c50 00000001001c0260
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801db6fa6b8 ffff8801e1733fd8 0000000000010518 ffff8801db6fa6b8
Sep 13 15:18:12 kswc-nightrod kernel: Call Trace:
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa04ba98d>] ? dlm_put_lockspace+0x1d/0x40 [dlm]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b5150>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b515e>] gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c929f>] __wait_on_bit+0x5f/0x90
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b5150>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c9348>] out_of_line_wait_on_bit+0x78/0x90
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091ce0>] ? wake_bit_function+0x0/0x50
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b6da6>] gfs2_glock_wait+0x36/0x40 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b7f5a>] gfs2_glock_nq+0x18a/0x370 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8107e02b>] ? try_to_del_timer_sync+0x7b/0xe0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05d0d08>] gfs2_statfs_sync+0x58/0x1b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c8f3c>] ? schedule_timeout+0x19c/0x2f0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05d0d00>] ? gfs2_statfs_sync+0x50/0x1b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8d37>] quotad_check_timeo+0x57/0xb0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8fc4>] gfs2_quotad+0x234/0x2b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8d90>] ? gfs2_quotad+0x0/0x2b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091936>] kthread+0x96/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810918a0>] ? kthread+0x0/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
Sep 13 15:18:12 kswc-nightrod kernel: INFO: task gfs2_quotad:3395 blocked for more than 120 seconds.
Sep 13 15:18:12 kswc-nightrod kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 13 15:18:12 kswc-nightrod kernel: gfs2_quotad   D ffff8801ffc24400     0  3395      2 0x00000080
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801dd815c20 0000000000000046 0000000000000000 ffffffffa04ba98d
Sep 13 15:18:12 kswc-nightrod kernel: 0000000000000000 ffff8801f57a4000 ffff8801dd815c50 00000001001c017d
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801e160c678 ffff8801dd815fd8 0000000000010518 ffff8801e160c678
Sep 13 15:18:12 kswc-nightrod kernel: Call Trace:
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa04ba98d>] ? dlm_put_lockspace+0x1d/0x40 [dlm]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b5150>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b515e>] gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c929f>] __wait_on_bit+0x5f/0x90
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b5150>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c9348>] out_of_line_wait_on_bit+0x78/0x90
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091ce0>] ? wake_bit_function+0x0/0x50
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b6da6>] gfs2_glock_wait+0x36/0x40 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b7f5a>] gfs2_glock_nq+0x18a/0x370 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8107e02b>] ? try_to_del_timer_sync+0x7b/0xe0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05d0d08>] gfs2_statfs_sync+0x58/0x1b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c8f3c>] ? schedule_timeout+0x19c/0x2f0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05d0d00>] ? gfs2_statfs_sync+0x50/0x1b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8d37>] quotad_check_timeo+0x57/0xb0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8fc4>] gfs2_quotad+0x234/0x2b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8d90>] ? gfs2_quotad+0x0/0x2b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091936>] kthread+0x96/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810918a0>] ? kthread+0x0/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
Sep 13 15:18:12 kswc-nightrod kernel: INFO: task gfs2_quotad:3440 blocked for more than 120 seconds.
Sep 13 15:18:12 kswc-nightrod kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 13 15:18:12 kswc-nightrod kernel: gfs2_quotad   D 0000000000000002     0  3440      2 0x00000080
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801d5c21c20 0000000000000046 ffff8801d5c21b90 ffffffffa04ba98d
Sep 13 15:18:12 kswc-nightrod kernel: 0000000000000000 ffff8801f504d000 ffff8801d5c21c50 ffffffffa04b87b8
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801db43d0e8 ffff8801d5c21fd8 0000000000010518 ffff8801db43d0e8
Sep 13 15:18:12 kswc-nightrod kernel: Call Trace:
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa04ba98d>] ? dlm_put_lockspace+0x1d/0x40 [dlm]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa04b87b8>] ? dlm_lock+0x98/0x1e0 [dlm]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b5150>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b515e>] gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c929f>] __wait_on_bit+0x5f/0x90
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b5150>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c9348>] out_of_line_wait_on_bit+0x78/0x90
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091ce0>] ? wake_bit_function+0x0/0x50
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b6da6>] gfs2_glock_wait+0x36/0x40 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b7f5a>] gfs2_glock_nq+0x18a/0x370 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8107e02b>] ? try_to_del_timer_sync+0x7b/0xe0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05d0d08>] gfs2_statfs_sync+0x58/0x1b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c8f3c>] ? schedule_timeout+0x19c/0x2f0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05d0d00>] ? gfs2_statfs_sync+0x50/0x1b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8d37>] quotad_check_timeo+0x57/0xb0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8fc4>] gfs2_quotad+0x234/0x2b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8d90>] ? gfs2_quotad+0x0/0x2b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091936>] kthread+0x96/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810918a0>] ? kthread+0x0/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
Sep 13 15:18:12 kswc-nightrod kernel: INFO: task gfs2_quotad:3455 blocked for more than 120 seconds.
Sep 13 15:18:12 kswc-nightrod kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 13 15:18:12 kswc-nightrod kernel: gfs2_quotad   D 0000000000000002     0  3455      2 0x00000080
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801d4eb3c20 0000000000000046 ffff8801d4eb3b90 ffffffffa04ba98d
Sep 13 15:18:12 kswc-nightrod kernel: 0000000000000000 ffff8801f5281000 ffff8801d4eb3c50 ffffffffa04b87b8
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801e1c1fad8 ffff8801d4eb3fd8 0000000000010518 ffff8801e1c1fad8
Sep 13 15:18:12 kswc-nightrod kernel: Call Trace:
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa04ba98d>] ? dlm_put_lockspace+0x1d/0x40 [dlm]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa04b87b8>] ? dlm_lock+0x98/0x1e0 [dlm]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b5150>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b515e>] gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c929f>] __wait_on_bit+0x5f/0x90
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b5150>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c9348>] out_of_line_wait_on_bit+0x78/0x90
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091ce0>] ? wake_bit_function+0x0/0x50
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b6da6>] gfs2_glock_wait+0x36/0x40 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b7f5a>] gfs2_glock_nq+0x18a/0x370 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8107e02b>] ? try_to_del_timer_sync+0x7b/0xe0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05d0d08>] gfs2_statfs_sync+0x58/0x1b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c8f3c>] ? schedule_timeout+0x19c/0x2f0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05d0d00>] ? gfs2_statfs_sync+0x50/0x1b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8d37>] quotad_check_timeo+0x57/0xb0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8fc4>] gfs2_quotad+0x234/0x2b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c8d90>] ? gfs2_quotad+0x0/0x2b0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091936>] kthread+0x96/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810918a0>] ? kthread+0x0/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
Sep 13 15:18:12 kswc-nightrod kernel: INFO: task nfsd:3666 blocked for more than 120 seconds.
Sep 13 15:18:12 kswc-nightrod kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 13 15:18:12 kswc-nightrod kernel: nfsd          D 0000000000000002     0  3666      2 0x00000080
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801d34115c0 0000000000000046 ffff8801d3411530 ffffffffa04ba98d
Sep 13 15:18:12 kswc-nightrod kernel: 0000000000000000 ffff8801f6289800 ffff8801d34115f0 ffffffffa04b87b8
Sep 13 15:18:12 kswc-nightrod kernel: ffff8801db49b0e8 ffff8801d3411fd8 0000000000010518 ffff8801db49b0e8
Sep 13 15:18:12 kswc-nightrod kernel: Call Trace:
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa04ba98d>] ? dlm_put_lockspace+0x1d/0x40 [dlm]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa04b87b8>] ? dlm_lock+0x98/0x1e0 [dlm]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b5150>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b515e>] gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c929f>] __wait_on_bit+0x5f/0x90
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b5150>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814c9348>] out_of_line_wait_on_bit+0x78/0x90
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091ce0>] ? wake_bit_function+0x0/0x50
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b6da6>] gfs2_glock_wait+0x36/0x40 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b7f5a>] gfs2_glock_nq+0x18a/0x370 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05bfeec>] gfs2_write_begin+0x6c/0x4c0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8110c8de>] generic_file_buffered_write+0x10e/0x2a0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8119519f>] ? list_move+0x1f/0x30
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8110e230>] __generic_file_aio_write+0x250/0x480
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff814287c8>] ? sch_direct_xmit+0x78/0x1c0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81413886>] ? dev_queue_xmit+0x146/0x4a0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8110e4cf>] generic_file_aio_write+0x6f/0xe0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c272e>] gfs2_file_aio_write+0x7e/0xb0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b8ee0>] ? iget_test+0x0/0x30 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81186330>] ? iput+0x30/0x70
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8118502d>] ? d_obtain_alias+0x4d/0x160
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05c26b0>] ? gfs2_file_aio_write+0x0/0xb0 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8116c3db>] do_sync_readv_writev+0xfb/0x140
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091c8f>] ? wake_up_bit+0x2f/0x40
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff811ff3b6>] ? security_file_permission+0x16/0x20
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8116d49f>] do_readv_writev+0xcf/0x1f0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b6353>] ? gfs2_holder_uninit+0x23/0x40 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa05b6d1e>] ? gfs2_glock_dq_uninit+0x1e/0x30 [gfs2]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81222606>] ? ima_counts_get+0xf6/0x130
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8116d606>] vfs_writev+0x46/0x60
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa064e2f7>] nfsd_vfs_write+0x107/0x430 [nfsd]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff812223f5>] ? process_measurement+0xc5/0xf0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81222440>] ? ima_file_check+0x20/0x30
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa0650bee>] ? nfsd_open+0x13e/0x220 [nfsd]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8105c394>] ? try_to_wake_up+0x284/0x380
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa0651087>] nfsd_write+0xe7/0x100 [nfsd]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa06581ff>] nfsd3_proc_write+0xaf/0x140 [nfsd]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa064943e>] nfsd_dispatch+0xfe/0x250 [nfsd]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa047f544>] svc_process_common+0x344/0x610 [sunrpc]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff8105c490>] ? default_wake_function+0x0/0x20
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa047fb50>] svc_process+0x110/0x150 [sunrpc]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa0649b82>] nfsd+0xc2/0x160 [nfsd]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffffa0649ac0>] ? nfsd+0x0/0x160 [nfsd]
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff81091936>] kthread+0x96/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810918a0>] ? kthread+0x0/0xa0
Sep 13 15:18:12 kswc-nightrod kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20

Comment 1 Sean Stewart 2010-09-14 00:11:44 UTC

Created attachment 447092 [details]
cluster.conf

Comment 2 Sean Stewart 2010-09-14 00:12:27 UTC

Created attachment 447093 [details]
Shows the soft panic and some of the messages around it, as a node leaves the cluster

Comment 4 David Teigland 2010-09-14 15:18:17 UTC

You're having fence_scsi problems:
fenced[2931]: fence kswc-snoopy dev 0.0 agent fence_scsi result: error from agent

You don't have fence_scsi configured correctly (it has changed for RHEL6).  We don't seem to have a good comprehensive description of how to configure fence_scsi (I'll look into this) -- for now the most relevant advice for you seems to be in the fence_node(8) man page.

Your cluster.conf should look like this:

<clusternode name="node1" nodeid="1">
  <fence>
  <method name="1">
    <device name="scsi"/>
  </method>
  </fence>
  <unfence>
    <device name="scsi" action="on"/>
  </unfence>
</clusternode>

<clusternode name="node2" nodeid="2">
  <fence>
  <method name="1">
    <device name="scsi"/>
  </method>
  </fence>
  <unfence>
    <device name="scsi" action="on"/>
  </unfence>
</clusternode>

<clusternode name="node3" nodeid="3">
  <fence>
  <method name="1">
    <device name="scsi"/>
  </method>
  </fence>
  <unfence>
    <device name="scsi" action="on"/>
  </unfence>
</clusternode>

<clusternode name="node4" nodeid="4">
  <fence>
  <method name="1">
  <device name="scsi"/>
  </method>
  </fence>
  <unfence>
    <device name="scsi" action="on"/>
  </unfence>
</clusternode>

<fencedevices>
  <fencedevice agent="fence_scsi" name="scsi"/>
</fencedevices>

Comment 6 Sean Stewart 2010-09-14 15:43:22 UTC

I initially tried configuring it exactly like we did in RHEL 5, which did not work, and then I tried the configuration I attached. Thanks for the direction, I'll give it a try sometime today, though I do have one more question:

When setting up fencing through the web interface (luci), it requires selecting a fence type, a name, and a nodename.  The first two should be "fence_scsi" and "scsi", but I am unsure what to put for "nodename". Does it actually make any difference? Your example leaves it off entirely within the <fencedevice \> tag.

Comment 7 Ryan O'Hara 2010-09-14 16:14:48 UTC

(In reply to comment #6)
> I initially tried configuring it exactly like we did in RHEL 5, which did not
> work, and then I tried the configuration I attached. Thanks for the direction,
> I'll give it a try sometime today, though I do have one more question:

As Dave pointed out in comment #4, the configuration  of fence_scsi has changed in RHEL6. Specifically, you'll need to add unfence for each clusternode.

> When setting up fencing through the web interface (luci), it requires selecting
> a fence type, a name, and a nodename.  The first two should be "fence_scsi" and
> "scsi", but I am unsure what to put for "nodename". Does it actually make any
> difference? Your example leaves it off entirely within the <fencedevice \> tag.

The cluster.conf attached to this BZ was created via luci? If so, this seems like a luci problem, since it is not able to configure fence_scsi correctly.

The "nodename" is no longer required, if I recall. It may be require in luci, but it should not be needed by fence_scsi in RHEL6. This was a way to pass the name of the node performing the fence op to the agent, which was necessary in RHEL5.

Comment 8 David Teigland 2010-09-14 18:09:37 UTC

nodename was used in RHEL5 to pass the victim's name to the fence agent.

In RHEL6, fenced automatically adds nodename of the victim to the arg list if it's not already present.  So putting nodename in cluster.conf will cause fenced to not add it automatically.

Comment 11 Sean Stewart 2010-09-14 19:38:12 UTC

I've made the changes described and I'm still having some problems.  I rebooted all four nodes just to be safe and now it appears they are all having trouble unfencing themselves as they try to join the cluster. Cman appears start, but clvmd refuses to.

I tried to manually unfence a node using fence_node and got the following output:
[root@kswc-nightrod ~]# fence_node kswc-nightrod -Uvv
unfence kswc-nightrod dev 0.0 agent fence_scsi result: error from agent
agent args: action=on nodename=kswc-nightrod agent=fence_scsi
unfence kswc-nightrod failed

From the messages file:

Sep 14 14:31:56 kswc-nightrod dlm_controld[12823]: dlm_join_lockspace no fence domain
Sep 14 14:31:56 kswc-nightrod dlm_controld[12823]: process_uevent online@ error -1 errno 11
Sep 14 14:31:56 kswc-nightrod kernel: dlm: clvmd: group join failed -1 -1
Sep 14 14:31:56 kswc-nightrod clvmd: Unable to create lockspace for CLVM: Operation not permitted
Sep 14 14:32:02 kswc-nightrod kernel: dlm: Using TCP for communications
Sep 14 14:32:02 kswc-nightrod dlm_controld[12823]: dlm_join_lockspace no fence domain
Sep 14 14:32:02 kswc-nightrod dlm_controld[12823]: process_uevent online@ error -1 errno 11
Sep 14 14:32:02 kswc-nightrod kernel: dlm: rgmanager: group join failed -1 -1

I bet I'm still missing something, here..

Comment 12 Sean Stewart 2010-09-14 19:39:43 UTC

Created attachment 447313 [details]
Current cluster.conf file

Current cluster.conf file

Comment 13 David Teigland 2010-09-14 20:48:38 UTC

It'll be easier to debug if you chkconfig off: cman, clvmd, gfs2, rgmanager
to minimize the "noise".  For now all we want to test is "service cman start"
on all nodes.

To get more debug information from the fence_scsi agent, please add the logfile
option:

<fencedevice agent="fence_scsi" name="scsi"
  logfile="/var/log/cluster/fence_scsi.log"/>

And then try 'service cman start' on all the nodes.

Comment 14 Jan Pokorný [poki] 2010-09-14 20:51:01 UTC

(In reply to comment #7)
Thanks for pointing out the problem in Luci, bug #633983 was filed. Support for generating "unfencing" sections by Luci is a subject of #622562.

Comment 15 Ryan O'Hara 2010-09-14 21:02:17 UTC

(In reply to comment #11)
> I've made the changes described and I'm still having some problems.  I rebooted
> all four nodes just to be safe and now it appears they are all having trouble
> unfencing themselves as they try to join the cluster. Cman appears start, but
> clvmd refuses to.

If clvmd refuses to start and you don't specify your devices manually, then there is nothing to do -- fence_scsi will not have any devices to register with.

> I tried to manually unfence a node using fence_node and got the following
> output:
> [root@kswc-nightrod ~]# fence_node kswc-nightrod -Uvv
> unfence kswc-nightrod dev 0.0 agent fence_scsi result: error from agent
> agent args: action=on nodename=kswc-nightrod agent=fence_scsi
> unfence kswc-nightrod failed

As Dave suggested in comment #13, using logfile parameter will be helpful.

Comment 16 David Teigland 2010-09-14 21:29:49 UTC

> If clvmd refuses to start and you don't specify your devices manually, then
> there is nothing to do -- fence_scsi will not have any devices to register
> with.

init.d/cman (which includes unfencing) starts before init.d/clvmd and cannot depend on clvmd.  This is one reason why fence_scsi overrides clvmd locking when it looks for clustered volumes.

The problem could be related to finding clustered lvm devices (the log file should show us), but clvmd should not be the cause.

Comment 17 Ryan O'Hara 2010-09-14 21:48:40 UTC

(In reply to comment #16)
> > If clvmd refuses to start and you don't specify your devices manually, then
> > there is nothing to do -- fence_scsi will not have any devices to register
> > with.
> 
> init.d/cman (which includes unfencing) starts before init.d/clvmd and cannot
> depend on clvmd.  This is one reason why fence_scsi overrides clvmd locking
> when it looks for clustered volumes.
> 
> The problem could be related to finding clustered lvm devices (the log file
> should show us), but clvmd should not be the cause.

The fence_scsi agent overrides clvmd locking, so unfencing can take place before clvmd starts. My statement in comment #15 is incorrect, as Dave pointed out.

I should have asked this earlier, but are we sure that the devices being used support SCSI persistent reservations? Can you manually register with the devices using sg_persist?

Comment 19 Sean Stewart 2010-09-15 14:48:07 UTC

I turned on logging as described.  When I start cman, the log shows this:

fence_scsi: [error] key cannot be zero

If you are asking if the devices support persistent reservations, in general, then the answer is yes.  I have seen persistent reservations placed on these devices under other operating systems.  

I also tried: sg_persist --register --out -v /dev/sdi and it outputs
inquiry cdb: 12 00 00 00 24 00
LSI   VirtualDisk   9775
Peripheral device type: disk
Persistent Reservation Out cmd: 5f 00 00 00 00 00 00 00 18 00
PR out: command (Register) successful

but I do not see any reservation and running sg_persist -k /dev/sdi shows there is still no registered key.

Comment 20 Ryan O'Hara 2010-09-15 19:05:11 UTC

(In reply to comment #19)
> I turned on logging as described.  When I start cman, the log shows this:
> 
> fence_scsi: [error] key cannot be zero

Can you run cman_tool? Since you do not have keys manuall defined (which if fine), fence_scsi should be generating keys cluster cluster_id and nodeid.

Get cluster_id:
% cman_tool status

Get nodeid:
% cman_tool nodes -n nodename -F id

> If you are asking if the devices support persistent reservations, in general,
> then the answer is yes.  I have seen persistent reservations placed on these
> devices under other operating systems.  
> 
> I also tried: sg_persist --register --out -v /dev/sdi and it outputs
> inquiry cdb: 12 00 00 00 24 00
> LSI   VirtualDisk   9775
> Peripheral device type: disk
> Persistent Reservation Out cmd: 5f 00 00 00 00 00 00 00 18 00
> PR out: command (Register) successful
> 
> but I do not see any reservation and running sg_persist -k /dev/sdi shows there
> is still no registered key.

The command listed above does not specify a key value.

Comment 21 Ryan O'Hara 2010-09-15 19:07:51 UTC

(In reply to comment #20)
> (In reply to comment #19)
> > I turned on logging as described.  When I start cman, the log shows this:
> > 
> > fence_scsi: [error] key cannot be zero
> 
> Can you run cman_tool? Since you do not have keys manuall defined (which if
> fine), fence_scsi should be generating keys cluster cluster_id and nodeid.

Note that I assume the cluster.conf file attached in comment #12 is the config file you are currently using.

Comment 22 Sean Stewart 2010-09-15 19:27:26 UTC

Correct, that is the configuration file I am currently using.
Here is the output of the two commands:

[root@kswc-nightrod home]# cman_tool status
Version: 6.2.0
Config Version: 33
Cluster Name: clus-1284566409
Cluster Id: 62085
Cluster Member: Yes
Cluster Generation: 660
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Node votes: 1
Quorum: 3
Active subsystems: 7
Flags:
Ports Bound: 0
Node name: kswc-nightrod
Node ID: 1
Multicast addresses: 239.192.242.120
Node addresses: 135.15.74.122


[root@kswc-nightrod home]# cman_tool nodes -n kswc-nightrod -F id
1

Comment 23 Ryan O'Hara 2010-09-15 23:48:20 UTC

Thanks for the information.

Could you also check to see if selinux is enabled and enforcing? You might also want to check for any AVC denials in the audit log.

It is possible that the reason that the key is not being written to file (/var/lib/cluster/fence_scsi.key) when unfencing occurs. Check to see that the file exists. It should contain the local node's key value.

I tested fence_scsi this evening with the latest RHEL6 build and it was working as expected, but only when selinux was disabled/permissive. The BZ for the selinux issue can be found here:

https://bugzilla.redhat.com/show_bug.cgi?id=634357

Comment 24 Ryan O'Hara 2010-09-16 21:47:41 UTC

I also recommend listing the cluster volume groups. Since no devices are manually defined in the cluster.conf file, fence_scsi will use all devices that exist in cluster volumes ("c" attribute).

I thought I had recreated the problem today on my own cluster, but that may not be the case. What I do know is that unfencing will fail if no devices are found. That means that if you don't have "devices=" configured for fence_scsi and you don't have any cluster volumes, fence_scsi unfencing will fail. This is correct behavior.

If unfencing fails, then the node will not join the fence domain and it will not start dlm_controld, and thus you will not be able to start clvmd. This sounds exactly like the scenario I recreated today, which was due to the fact that fence_scsi found no devices when via vgs command. My advice is to run this:

% vgs --config 'global { locking_type = 0 }'

Look for VGs with the 'c' attribute.

I am confused about the "key cannot be zero" you reported in comment #19. I've not been able to recreate that.

Comment 25 Sean Stewart 2010-09-17 14:27:32 UTC

Here is the output of the above command.

[root@kswc-nightrod ~]# vgs --config 'global { locking_type = 0 }'
  WARNING: Locking disabled. Be careful! This could corrupt your metadata.

  VG                #PV #LV #SN Attr   VSize   VFree
  lvm_vg             24   8   0 wz--nc 119.91g 15.81g
  vg_dhcp1351574122   1   3   0 wz--n-  67.88g     0

lvm_vg is the cluster vg, and it looks like it has the c attribute.  I have locking_type = 3 in /etc/lvm/lvm.conf

Also I checked yesterday and I do not believe /var/lib/cluster/fence_scsi.key existed when I tried.  Selinux is set to disabled in our kickstart file, so I don't think that can be it either.

Comment 26 Ryan O'Hara 2010-09-17 16:20:56 UTC

Does unfencing work if you explicitly define keys for each node in your cluster.conf? That information will help narrow down the possibilities.

Comment 27 David Teigland 2010-09-17 16:28:12 UTC

Example for comment 26:

<clusternode name="node1" nodeid="1">
  <fence>
  <method name="1">
    <device name="scsi" key="1"/>
  </method>
  </fence>
  <unfence>
    <device name="scsi" key="1" action="on"/>
  </unfence>
</clusternode>

<clusternode name="node2" nodeid="2">
  <fence>
  <method name="1">
    <device name="scsi" key="2"/>
  </method>
  </fence>
  <unfence>
    <device name="scsi" key="2" action="on"/>
  </unfence>
</clusternode>

...

<fencedevice agent="fence_scsi" name="scsi"
  logfile="/var/log/cluster/fence_scsi.log"/>

Comment 28 Sean Stewart 2010-09-17 21:49:13 UTC

I tried the above changes, and the nodes are now placing persistent reservations, and fencing / unfencing seems to work properly. I rebooted a node and the resources transfered, and there was no I/O timeout this time.  I guess it must be because the host is not automatically generating a key like it should.

The nodes will still indefinitely hang on shutdown.. It looks like the node will fail to leave the cluster domain because some of the gfs2 filesystems are still active. I'll have to see if I can get console redirection of that.

Comment 29 Sean Stewart 2010-09-17 22:33:20 UTC

Update:  I was actually able to reboot one of the nodes and it shut down, and everything happened as expected. It's almost looking hardware specific: All three of the servers that hang are one brand, and the server that worked is a different brand.

Comment 30 Sean Stewart 2010-09-17 23:05:12 UTC

On one of the hung nodes, I waited about 30 minutes for it to shut down, and it didn't. At that time I tried hitting ctrl+alt+del and it started showing messages like the following, again:

GFS2: fsid=clus-1284028638: gfs_vol5.3: gfs2_quotad: statfs error -5

Looks like we're back to the original issue, only now it no longer causes the entire cluster to hang, or cause I/O to time out.  What does statfs error -5 mean?

Comment 31 Sean Stewart 2010-09-23 00:28:57 UTC

I installed RH6 RC1 and I am having some issues. If I have the "<unfence>" XML block in cluster.conf, as specified above, cman will fail to start when the node tries to unfence itself.  The log file gives the following: "fence_scsi: [error] no devices found". I am not sure what it is expecting, as there are 24 LUNs mapped to the host group, and 8 logical volumes.

Also, with the fence method defined like this:
<method name="1">
    <device name="Persistent_Reserve" key="1"/>
</method>
the host does not appear to place persistent reservations on the LUNs, again.

Comment 32 Ryan O'Hara 2010-09-23 13:08:50 UTC

(In reply to comment #31)
> I installed RH6 RC1 and I am having some issues. If I have the "<unfence>" XML
> block in cluster.conf, as specified above, cman will fail to start when the
> node tries to unfence itself.  The log file gives the following: "fence_scsi:
> [error] no devices found". I am not sure what it is expecting, as there are 24
> LUNs mapped to the host group, and 8 logical volumes.
> 
> Also, with the fence method defined like this:
> <method name="1">
>     <device name="Persistent_Reserve" key="1"/>
> </method>
> the host does not appear to place persistent reservations on the LUNs, again.

It isn't going to put any registrations or reservations on the LUNs because it is not finding the devices. Please check that lvm filters are not interfering.

I am going to assume that you did not manually configure devices, so that means fence_scsi will discover cluster volumes and the devices that comprise those volumes.

Run this command:

# vgs --options vg_name,vg_attr,pv_name --config 'global { locking_type = 0 }'

You should a list of all volume groups and all devices (pvs) that exist in those volume groups. Look for volumes with the 'c' attribute.

Comment 33 Sean Stewart 2010-09-23 14:47:59 UTC

Okay, I tried something different..  For our testing our script generates cluster.conf, starts the services, and then creates the lv's, gfs2 filesystems, then mounts them.  The volumes did not seem to have clustered attribute, so I tried re-creating the cluster, without the <unfence> tags, so that cman would start.  Sure enough, the volumes now show the clustered attribute:

[root@kswc-snoopy home]#  vgs --options vg_name,vg_attr,pv_name --config 'global { locking_type = 0 }'
  WARNING: Locking disabled. Be careful! This could corrupt your metadata.
  VG               Attr   PV
  lvm_vg           wz--nc /dev/sdn
  lvm_vg           wz--nc /dev/sdo
  lvm_vg           wz--nc /dev/sdp
  lvm_vg           wz--nc /dev/sdq
  lvm_vg           wz--nc /dev/sdr
  lvm_vg           wz--nc /dev/sds
  lvm_vg           wz--nc /dev/sdt
  lvm_vg           wz--nc /dev/sdu
  lvm_vg           wz--nc /dev/sdv
  lvm_vg           wz--nc /dev/sdw
  lvm_vg           wz--nc /dev/sdx
  lvm_vg           wz--nc /dev/sdy
  lvm_vg           wz--nc /dev/sdb
  lvm_vg           wz--nc /dev/sdc
  lvm_vg           wz--nc /dev/sdd
  lvm_vg           wz--nc /dev/sde
  lvm_vg           wz--nc /dev/sdf
  lvm_vg           wz--nc /dev/sdg
  lvm_vg           wz--nc /dev/sdh
  lvm_vg           wz--nc /dev/sdi
  lvm_vg           wz--nc /dev/sdj
  lvm_vg           wz--nc /dev/sdk
  lvm_vg           wz--nc /dev/sdl
  lvm_vg           wz--nc /dev/sdm
  vg_dhcp135157468 wz--n- /dev/sda2

I added <unfence> back into cluster.conf and rebooted a node. Now that node cannot join the cluster. I added the logfile="" tag again to see if that would give some more information, but now when I start cman, it just says unfencing of the node failed, and the log file is not created.

My other question is regarding this having to manually specify keys: is this a bug for which I need to submit another bugzilla?  In RHEL5, we've never had to do anything like that, and unfencing did not need to be specified.

Comment 34 Sean Stewart 2010-09-23 14:55:44 UTC

Actually, it is working now. However, my questions still stand:

1. Our setup script usually generates the entire cluster.conf file before starting the services.  Unfencing seems to fail at this point because the devices will not already be set up. Is it necessary to add in the <unfence> attribute later? Or is there some way to skip the unfencing step on the first startup of cman?

2. Same question as in comment 33:  Should the OS be generating the keys, itself? What could cause that to fail?  I can file another bug for this, if necessary.

Thanks.

Comment 35 Ryan O'Hara 2010-09-23 19:03:38 UTC

(In reply to comment #34)
> Actually, it is working now. However, my questions still stand:

Things have changed in RHEL6. Specifically, the addition of unfencing (and removal of scsi_reserve) andt he option to manually define keys/devices. The ability to manually specify keys and/or devices was added to provide greater control, if desired. If these config options are omitted

> 1. Our setup script usually generates the entire cluster.conf file before
> starting the services.  Unfencing seems to fail at this point because the
> devices will not already be set up. Is it necessary to add in the <unfence>
> attribute later? Or is there some way to skip the unfencing step on the first
> startup of cman?

By "set up" do you mean exist? The devices must exist before fence_scsi unfencing cans succeed. I don't know if you can omit the unfence section and add it later. Are you creating your volumes in this setup script? Perhaps you need to set to cluster flag when you created the volume group.

> 2. Same question as in comment 33:  Should the OS be generating the keys,
> itself? What could cause that to fail?  I can file another bug for this, if
> necessary.

This should be explained in the kbase article in comment #18. Defining keys is optional. If you don't specify the a key then one will be generated, just as in RHEL5. If key generation is still failing, then that is a bug that I have not been able to reproduce. Now that you have things semi-working, can you try removing the manually defined keys and test again?

Please don't file a new bug yet, since I'm not sure what the state of this bug is.

Comment 36 Sean Stewart 2010-09-23 19:41:30 UTC

> By "set up" do you mean exist? The devices must exist before fence_scsi
> unfencing cans succeed. I don't know if you can omit the unfence section and
> add it later. Are you creating your volumes in this setup script? Perhaps you
> need to set to cluster flag when you created the volume group.

All of the hosts in the cluster see the volumes on the array prior to starting the setup script (They see 24 uninitialized physical volumes on the storage array).

The script generates the conf file, starts all of the services, and then issues the pvcreate command to initialize the 24 LUNs, vgcreate to create the volume group, and then lvcreate to create 8 Logical Volumes out of that volume group.

I suppose pvcreate could be run before starting the services, but when I did that without clvmd running (when unfence did not work), the clustered attribute was not set. Is there some way to explicitly set the flag?

> This should be explained in the kbase article in comment #18. Defining keys is
> optional. If you don't specify the a key then one will be generated, just as in
> RHEL5. If key generation is still failing, then that is a bug that I have not
> been able to reproduce. Now that you have things semi-working, can you try
> removing the manually defined keys and test again?
> Please don't file a new bug yet, since I'm not sure what the state of this bug
> is.

Comment #18 does not appear to be there. 
I can try again in a bit.  Though, I believe the key problem is a separate problem from the one I am experiencing, here.

This bug is regarding how a host can sometimes hang on shutdown, or when both host to array I/O cables are pulled.  The host seems to have trouble unmounting the GFS2 filesystems, and when the shutdown hang occurs I'll see a message like:
GFS2: fsid=clus-1284028638: gfs_vol5.3: gfs2_quotad: statfs error -5

and this message will print out indefinitely, until I power cycle the host.

Comment 37 Ryan O'Hara 2010-09-23 21:40:32 UTC

(In reply to comment #36)
> > By "set up" do you mean exist? The devices must exist before fence_scsi
> > unfencing cans succeed. I don't know if you can omit the unfence section and
> > add it later. Are you creating your volumes in this setup script? Perhaps you
> > need to set to cluster flag when you created the volume group.
> 
> All of the hosts in the cluster see the volumes on the array prior to starting
> the setup script (They see 24 uninitialized physical volumes on the storage
> array).

Right. But they have to see the volume *and* the cluster bit must be set in order for fence_scsi to use them.

> The script generates the conf file, starts all of the services, and then issues
> the pvcreate command to initialize the 24 LUNs, vgcreate to create the volume
> group, and then lvcreate to create 8 Logical Volumes out of that volume group.
> 
> I suppose pvcreate could be run before starting the services, but when I did
> that without clvmd running (when unfence did not work), the clustered attribute
> was not set. Is there some way to explicitly set the flag?

vgcreate -cy <vg_name> pv [pv, ... ]

> > This should be explained in the kbase article in comment #18. Defining keys is
> > optional. If you don't specify the a key then one will be generated, just as in
> > RHEL5. If key generation is still failing, then that is a bug that I have not
> > been able to reproduce. Now that you have things semi-working, can you try
> > removing the manually defined keys and test again?
> > Please don't file a new bug yet, since I'm not sure what the state of this bug
> > is.
> 
> Comment #18 does not appear to be there.

Ah. Sorry about that. Try this:

https://access.redhat.com/kb/docs/DOC-40127/version

> I can try again in a bit.  Though, I believe the key problem is a separate
> problem from the one I am experiencing, here.
> 
> This bug is regarding how a host can sometimes hang on shutdown, or when both
> host to array I/O cables are pulled.  The host seems to have trouble unmounting
> the GFS2 filesystems, and when the shutdown hang occurs I'll see a message
> like:
> GFS2: fsid=clus-1284028638: gfs_vol5.3: gfs2_quotad: statfs error -5
> 
> and this message will print out indefinitely, until I power cycle the host.

Fair enough, but lets if we can get scsi reservations working correctly and then address the gfs2 problem.

Comment 38 Ryan O'Hara 2010-09-23 21:46:45 UTC

(In reply to comment #37)
> > Comment #18 does not appear to be there.
> 
> Ah. Sorry about that. Try this:
> 
> https://access.redhat.com/kb/docs/DOC-40127/version
> 

Oops. Try this instead:

https://access.redhat.com/kb/docs/DOC-40127

Comment 39 Sean Stewart 2010-09-24 19:42:51 UTC

Okay, I have figured out how to set it up so that automatically generated keys will work.  The script now creates the volume group, starts the services, and then creates LUNs from there.  It now works without specifying keys.

This brings us back to the original problem:  I tried rebooting a node, and although its getting fenced out, as expected, its hanging on shutdown, saying: GFS2: fsid=clus-1285337966:gfs_vol2.0: gfs2_quotad: statfs error -5

Comment 40 Ryan O'Hara 2010-09-24 20:48:14 UTC

(In reply to comment #39)
> Okay, I have figured out how to set it up so that automatically generated keys
> will work.  The script now creates the volume group, starts the services, and
> then creates LUNs from there.  It now works without specifying keys.

What did you have to do? It should be just a matter of *not* putting key="X" in the config.

Comment 41 Sean Stewart 2010-09-24 21:02:34 UTC

I had to run vgcreate (with -c y) before starting the services on the nodes. Originally, vgcreate would be run after starting the services.  Keys would not be generated automatically in that case.

Comment 42 Robert Peterson 2010-09-28 13:55:54 UTC

In answer to question #30, in the message:

GFS2: fsid=clus-1284028638: gfs_vol5.3: gfs2_quotad: statfs error -5

Error -5 is -EIO being returned while gfs2 was trying to perform
a statfs_sync operation.  That means gfs2 was either unable to
get an exclusive lock on the master statfs system file, or unable
to start a transaction (which can only happen if gfs2 is unable
to acquire a shared lock on the transaction glock or unable to
reserve space in the journal for the transaction).
The call traces given above lead me to believe that gfs2 is
simply waiting for the dlm layer below it to respond to its
lock requests, which is not unexpected.

In theory, dlm should perform lock recovery and eventually
respond to gfs2 once it knows the cluster is sane.  I'd like to
figure out where the -EIO is coming from to see if dlm is doing
lock recovery but gfs2's glock layer isn't handling that properly.

Comment 43 David Teigland 2010-09-28 14:52:36 UTC

Isn't this a simple case of reboot not stopping all the cluster-users properly before stopping the cluster?

Oftentimes, extra care needs to be taken when shutting down a node.
Try manually doing:
service rgmanager stop
service gfs2 stop
umount -a -t gfs2
service clvmd stop
service cman stop
reboot

Comment 44 Sean Stewart 2010-09-28 16:10:19 UTC

This problem also occurs when both I/O cables are pulled from the host, and the host loses access to the LUNs. 

I also ran into a case where I rebooted a node, and it hung during startup, trying to mount the gfs2 filesystems.  I let it sit all night, and it still did not come up. I had to reboot all of the nodes to get the cluster to come back up.

Comment 45 David Teigland 2010-09-28 16:27:13 UTC

> This problem also occurs when both I/O cables are pulled from the host, and the
> host loses access to the LUNs. 

GFS is expected to either hang or panic the machine if the storage goes away
(I believe there are mount options to control which, the best option is to panic,
which allows remaining nodes to recover for it and continue operating.)

> I also ran into a case where I rebooted a node, and it hung during startup,
> trying to mount the gfs2 filesystems.  I let it sit all night, and it still did
> not come up. I had to reboot all of the nodes to get the cluster to come back
> up.

To figure that out we'd need to see some diagnostic information, e.g.
cman_tool nodes
group_tool -n
/var/log/messages
ps ax -o pid,stat,cmd,wchan

Comment 46 Chris Chavez 2010-09-29 21:09:57 UTC

I noticed that this bug is currently filed against version 6.1.  Can someone from RedHat confirm that this feature is not targetted to be fixed in the 6.0 release? Thanks.

Comment 47 Andrius Benokraitis 2010-09-30 13:46:50 UTC

(In reply to comment #46)
> I noticed that this bug is currently filed against version 6.1.  Can someone
> from RedHat confirm that this feature is not targetted to be fixed in the 6.0
> release? Thanks.

Correct - targeted for RHEL 6.1.

Comment 48 Sean Stewart 2010-10-01 16:29:40 UTC

I am now trying to run the same cluster configuration, but this time with device mapper multipath for the failover.  I am running into a problem where cman tries to start, but fails because the node fails to unfence itself, and does not output anything to the specified logfile.

The hosts see the clustered volume group and the associated devices as follows:
[root@kswc-vfr1200 logs]#  vgs --options vg_name,vg_attr,pv_name --config 'global {locking_type = 0}'
  WARNING: Locking disabled. Be careful! This could corrupt your metadata.
  VG                Attr   PV
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475d800000ab6f4c87bbd1
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475d800000ab784c87bc06
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475d800000ab764c87bbf6
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475e1800000ed24c87bd6d
  lvm_vg            wz--nc /dev/mapper/360080e50001b0da000003e034c88786e
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475e1800000ed04c87bd5e
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475e1800000ed44c87bd79
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475d800000ab7a4c87bc14
  lvm_vg            wz--nc /dev/mapper/360080e50001b0e7a00002a6c4c8877ce
  lvm_vg            wz--nc /dev/mapper/360080e50001b0da000003e074c88787d
  lvm_vg            wz--nc /dev/mapper/360080e50001b0da000003e0b4c88788b
  lvm_vg            wz--nc /dev/mapper/360080e50001b0da000003e094c887884
  lvm_vg            wz--nc /dev/mapper/360080e50001b0e7a00002a684c8877be
  lvm_vg            wz--nc /dev/mapper/360080e50001b0e7a00002a6a4c8877c7
  lvm_vg            wz--nc /dev/mapper/360080e50001b0e7a00002a704c8877dd
  lvm_vg            wz--nc /dev/mapper/360080e50001b0e7a00002a724c8877e4
  lvm_vg            wz--nc /dev/mapper/360080e50001b0e7a00002a6e4c8877d5
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475d800000ab744c87bbea
  lvm_vg            wz--nc /dev/mapper/360080e50001b0da000003e004c887861
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475e1800000ec94c87bd39
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475e1800000ece4c87bd4f
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475d800000ab724c87bbde
  lvm_vg            wz--nc /dev/mapper/360080e50001b0da000003e054c887876
  lvm_vg            wz--nc /dev/mapper/3600a0b8000475e1800000ecc4c87bd43
  vg_dhcp1351575155 wz--n- /dev/sda2

Is it possible there needs to be some different configuration to run with the /dev/mapper/3600* devices?

Comment 49 Sean Stewart 2010-10-04 14:27:10 UTC

I completely re-created the cluster with DMMP as the failover, and it looks like the hosts now generate keys on their own again, but some sort of failure occurs during registration. The logs look something like this: 

Oct 01 16:32:24 fenced fenced 3.0.12 started
Oct  1 16:32:26 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdc)
Oct  1 16:32:26 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdp)
Oct  1 16:32:26 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdy)
Oct  1 16:32:27 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdan)
Oct  1 16:32:27 fence_scsi: [debug] main::do_reserve (host_key=1cd90001, dev=/dev/dm-17)
Oct  1 16:32:27 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdx)
Oct  1 16:32:28 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdam)
PR out: unit attention
[error]: main::do_register_ignore

I notice that a couple of the devices will have registrations from all the hosts, some will have one or two registrations, but most devices will have no registrations at all.

Also, I would like to clarify what kind of debug you would like for the GFS2 related issue.  I am not sure I can reproduce the hang during start up, as I have only seen it happen once.

Comment 50 Ryan O'Hara 2010-10-19 15:12:35 UTC

(In reply to comment #49)
> I completely re-created the cluster with DMMP as the failover, and it looks
> like the hosts now generate keys on their own again, but some sort of failure
> occurs during registration. The logs look something like this: 
> 
> Oct 01 16:32:24 fenced fenced 3.0.12 started
> Oct  1 16:32:26 fence_scsi: [debug] main::do_register_ignore
> (node_key=1cd90001, dev=/dev/sdc)
> Oct  1 16:32:26 fence_scsi: [debug] main::do_register_ignore
> (node_key=1cd90001, dev=/dev/sdp)
> Oct  1 16:32:26 fence_scsi: [debug] main::do_register_ignore
> (node_key=1cd90001, dev=/dev/sdy)
> Oct  1 16:32:27 fence_scsi: [debug] main::do_register_ignore
> (node_key=1cd90001, dev=/dev/sdan)
> Oct  1 16:32:27 fence_scsi: [debug] main::do_reserve (host_key=1cd90001,
> dev=/dev/dm-17)
> Oct  1 16:32:27 fence_scsi: [debug] main::do_register_ignore
> (node_key=1cd90001, dev=/dev/sdx)
> Oct  1 16:32:28 fence_scsi: [debug] main::do_register_ignore
> (node_key=1cd90001, dev=/dev/sdam)
> PR out: unit attention
> [error]: main::do_register_ignore
> 
> I notice that a couple of the devices will have registrations from all the
> hosts, some will have one or two registrations, but most devices will have no
> registrations at all.
> 
> Also, I would like to clarify what kind of debug you would like for the GFS2
> related issue.  I am not sure I can reproduce the hang during start up, as I
> have only seen it happen once.

The "PR out: unit attention" unit attention message is normal and it being tracked in BZ 640343.

We need to keep the fence_scsi issues and the gfs2 issues separate, if possible.

Comment 51 Ryan O'Hara 2010-10-19 15:48:59 UTC

As state above, we need to isolate the issues being reported and keep them in separated BZs. What is the current issue? This BZ inintially reported a problem with gfs2, but has since been mostly about fence_scsi. If there is a gfs2 problem, perhaps that should be reported in a difference bug and this BZ should be closed. The current fence_scsi issue is being tracked by BZ 640343.

Comment 52 Jeremy West 2010-10-19 16:03:35 UTC

Sean,

In order for us to ensure this issue gets the attention it deserves, please contact Red Hat support.  Information on how to do that can be found here: https://access.redhat.com/support/contact/  

We apologize for any inconvenience here, however bugzilla is not a support tool and all support requests should be routed through the Red Hat global support services (GSS) team.

If you have any questions, please feel free to contact me.

Thanks
Jeremy West
GSS Supervisor

Comment 55 Perry Myers 2010-10-19 18:18:22 UTC

Sean, can I ask a more fundamental question about your configuration?  In your cluster.conf you don't have any services listed aside from the ip addresses that move around.  You mention in the description that you're mounting NFS shares from cluster nodes and also that you're using GFS2.

Can you describe what the purpose of the cluster is?  Is it basically taking 4 RHCS nodes, with iSCSI backend storage and then exposing this backend storage via NFS on top of GFS2 running on the cluster nodes?

If so, are you aware that it's presently not supported to run multiple NFS servers accessing the same backend GFS2 filesystem due to various issues with locking?  NFS servers don't have any notion of clusterized locks.

If you could give a little more description of the use case here that would be useful.  Thanks.

Also, is there any aspect of hardware enablement here?  I'm unclear on how this relates to LSI's hardware portfolio.

Comment 56 Sean Stewart 2010-10-19 20:19:16 UTC

I am testing resource availability in various cases

If running this configuration is not supported, I am not aware of this, because this is how we have been running RHCS on RHEL4 and RHEL5, following a section of the document on http://sources.redhat.com/cluster/doc/nfscookbook.pdf

"Managed Virtual IP Service
This method of managing NFS failover is more flexible than using Managed NFS Service
because it allows you to handcraft
your NFS exports as you see fit. However, it is only designed to
work on GFS file systems. Using it for nonGFS
file systems is not recommended.
In this configuration, you must ensure that /etc/exports file is in sync on all nodes, and that
nfsd is always running on all nodes (it's not monitored by the cluster; the service doesn't use
<nfsexport> in this case), and that the GFS file system is mounted before NFS is started. Therefore, it
requires more planning, maintenance and monitoring."

It appears to be centered around running multiple hosts with the same exports and nfs configuration, with gfs, running only the virtual ip addresses as resources.

Do you mean that this configuration is not supported for GFS2 currently, or that this is not supported for both GFS and GFS2 on previous versions of RHEL as well?

Also, the "current" issue is FS2 hang issue (where statfs error -5 would be repeated indefinitely) that I described initially. I have worked around it by configuring APC power fencing as the second fence device. Sorry for any confusion, and thank you all for the help so far with configuration.

Comment 57 Perry Myers 2010-10-19 20:49:10 UTC

(In reply to comment #56)
> I am testing resource availability in various cases
> 
> If running this configuration is not supported, I am not aware of this, because
> this is how we have been running RHCS on RHEL4 and RHEL5, following a section
> of the document on http://sources.redhat.com/cluster/doc/nfscookbook.pdf

That web site is upstream focused and not official RHEL documentation.  The notes on this for RHEL official docs are available here:
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/additional_configuration_considerations.html

...
> It appears to be centered around running multiple hosts with the same exports
> and nfs configuration, with gfs, running only the virtual ip addresses as
> resources.
> 
> Do you mean that this configuration is not supported for GFS2 currently, or
> that this is not supported for both GFS and GFS2 on previous versions of RHEL
> as well?

It's not supported for GFS1 or GFS2 right now.  There are too many caveats to make it work in a reliable manner.  For example, you would need to disallow usage of NFS locks entirely since a lock taken on one host wouldn't migrate to another host (since NFS locks are not clusterized) and you would also need to turn off NFS passing plocks down to GFS/GFS2.

Steve Whitehouse (GFS lead developer) has a lot more information on the caveats and pitfalls here, so he might be able to better explain.

Comment 58 Steve Whitehouse 2010-10-20 09:54:04 UTC

Can you explain the exact use case here? I'm a bit confused reading down the bugzilla since it looks like some details might have changed while the initial issues were resolved. I'm looking for a quick description along the lines of:

1) How many nodes?
2) How many GFS2 mounts, and how are they exported via NFS
3) How is the shared storage attached and what is it? (e.g. an array connected by fibrechannel)
4) What are the GFS2 mount options?
5) What are the NFS export options?
6) What is the problem that you are trying to solve?

Using NFS with GFS2 can be problematic due to the handling of fcntl/POSIX locks. We do support NFS over GFS2, but only in very restricted circumstances (active/passive failover only) in order to avoid these problems.

We are actively working on resolving the issues, however this is a medium term project due to the complexity of the problem.

Comment 59 Sean Stewart 2010-10-20 15:21:22 UTC

1) 4 nodes in the cluster
2) 8 GFS2 mount points
3) Each node is connected to two LSI storage arrays via iSCSI. These arrays have a total of 24 volumes which are all pooled to create the logical volumes
4) (rw,relatime,hostdata=jid=1)
Journal ID, of course, is different for each node
5) *(rw,sync,no_root_squash,fsid=2345600)
Exported to the world because this is a test configuration
6) When I issue "reboot" on one of the nodes, it does not properly shut down the cluster services and reboot. Instead the shutdown hangs indefinitely, and gives the gfs2 statfs error I gave earlier.  It looks like the node fails to properly unmount the gfs2 mount points. SCSI reservation fencing removes the node's access to the devices, but the node continues to hang. I have been able to work around this issue by setting up APC power fencing. With that configuration, when the node hangs, another node simply shuts off power and powers the node back on.  However, simply issuing a reboot without apc power fencing should also result in the node successfully shutting down and coming back up.

Also, although every mount can be accessed through every IP address, I am only accessing each mount point through one IP address.  

This configuration does run active/passive failover, too.

Comment 60 Perry Myers 2010-10-21 19:03:52 UTC

Ok, that usage of NFS does sound like it would avoid the locking issues, since no two nodes are exposing the same NFS exports or same GFS2 filesystems even.  SteveW, correct me if that is wrong.

Just to eliminate variables here, can you reproduce this w/o NFS servers running at all?  i.e. just a cluster w/ GFS2 filesystems mounted and do a reboot to see if the filesystems cleanly unmount as part of the reboot process.  Thanks!

Comment 61 Steve Whitehouse 2010-10-22 08:56:38 UTC

Wrt to NFS, you must specify the localflocks mount option on the GFS2 mounts which are to be exported via NFS. Also, only one node should be NFS exporting a single GFS2 filesystem at once (i.e. active/passive failover).

Wrt to the shutdown issue, how are the filesystems being mounted. There is a known issue when GFS2 filesystems are mounted via a method other than via fstab and the gfs2 init script. In that case manually mounted filesystems which are not in fstab will not get umounted correctly at reboot time.

Another possible issue is that there is something still using the filesystem at reboot time which prevents the filesystem from being unmounted. That something might be NFS for example, or a local process.

Comment 62 Sean Stewart 2010-10-22 20:48:03 UTC

I've been busy recently working on another issue with the setup, but I have a couple of things to add here.

Previously, I was mounting the GFS2 filesystems via fstab, and they were exported by all of the hosts at all times.  

As per the suggestions in this thread, I have reconfigured the cluster so that rgmanager manages which gfs2 filesystems are mounted and exported on each host.  The results, so far, look promising, as I was able to reboot a server and have it come back up without manual intervention, or without it having to be power fenced.  So I am led to believe that the locking issues could be related to having all of the nodes mounting the filesystems at once. That particular issue was not observed in RHEL5.5. Running in this configuration generally seems smoother.

I will attach my current cluster.conf. Does this configuration look more "correct"?  Also, is it correct that when you say running multiple NFS servers from the same backend gfs2 filesystems is not currently supported that you intend for it be supported down the road?

Comment 63 Sean Stewart 2010-10-22 20:49:48 UTC

Created attachment 455196 [details]
Current cluster.conf file

Updated 10/22/10

Comment 64 Perry Myers 2010-10-27 20:14:43 UTC

(In reply to comment #62)
> I will attach my current cluster.conf. Does this configuration look more
> "correct"?  

Lon, can you review this?

> Also, is it correct that when you say running multiple NFS servers
> from the same backend gfs2 filesystems is not currently supported that you
> intend for it be supported down the road?

We'd like it to be, but don't have a timeline for when we'll be able to formally support it yet.

Comment 66 Ronald Pacheco 2011-01-28 22:11:39 UTC

Sean,

Is this still an issue with RHEL 6.0 GA

Comment 67 Sean Stewart 2011-01-28 22:26:46 UTC

It seems to me like this is not an issue in general.  It only happens when trying to run an active/active configuration with GFS2.  If that is not supported yet, at all, then maybe the bug should be closed?

Comment 70 Steve Whitehouse 2011-02-01 09:29:22 UTC

To make this 100% clear.... active/active NFS is unsupported on GFS2 at the current time, and will remain so until lockd can be made to work correctly during the recovery phase. Until then, NFS may only be exported from a maximum of one node of the cluster at once and the localflocks mount argument must be used on each GFS2 mount being exported via NFS.

Local workloads must not be mixed with NFS exports of the same filesystem.

This does not mean that other active/active workloads are unsupported. The above applies to NFS only. In general we support active/active workloads with the proviso that the performance characteristics of the workload are understood and that acceptable performance can be achieved (which is usually the case).

Comment 71 Sean Stewart 2011-02-01 14:46:19 UTC

Right, well, I wrote this bug specifically against a configuration with active/active NFS on GFS2. In this instance I can't really say whether or not it is a problem on other configurations.

Comment 72 Steve Whitehouse 2011-02-02 10:27:04 UTC

In which case can we close this as NOTABUG ?

Comment 73 Sean Stewart 2011-02-02 15:43:09 UTC

Sounds to me like that might be the thing to do.

Comment 74 Steve Whitehouse 2011-02-02 15:48:32 UTC

Ok, closing it now... if you find any other issues, please let us know.