Bug 633567
Summary: | [LSI 6.1 bug] [CR184101] - RHCS: Node reboot hangs on shut down and causes hung processes throughout the cluster | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Sean Stewart <Sean.Stewart> | ||||||||||
Component: | cluster | Assignee: | Ryan O'Hara <rohara> | ||||||||||
Status: | CLOSED NOTABUG | QA Contact: | Cluster QE <mspqa-list> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 6.1 | CC: | abdel.sadek, andriusb, ccaulfie, chris.chavez, cluster-maint, dl-iop-bugzilla, jwest, lhh, rpacheco, rpeterso, Sean.Stewart, swhiteho, teigland | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | 6.1 | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-02-02 15:48:32 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 580566 | ||||||||||||
Attachments: |
|
Description
Sean Stewart
2010-09-14 00:10:13 UTC
Created attachment 447092 [details]
cluster.conf
Created attachment 447093 [details]
Shows the soft panic and some of the messages around it, as a node leaves the cluster
You're having fence_scsi problems: fenced[2931]: fence kswc-snoopy dev 0.0 agent fence_scsi result: error from agent You don't have fence_scsi configured correctly (it has changed for RHEL6). We don't seem to have a good comprehensive description of how to configure fence_scsi (I'll look into this) -- for now the most relevant advice for you seems to be in the fence_node(8) man page. Your cluster.conf should look like this: <clusternode name="node1" nodeid="1"> <fence> <method name="1"> <device name="scsi"/> </method> </fence> <unfence> <device name="scsi" action="on"/> </unfence> </clusternode> <clusternode name="node2" nodeid="2"> <fence> <method name="1"> <device name="scsi"/> </method> </fence> <unfence> <device name="scsi" action="on"/> </unfence> </clusternode> <clusternode name="node3" nodeid="3"> <fence> <method name="1"> <device name="scsi"/> </method> </fence> <unfence> <device name="scsi" action="on"/> </unfence> </clusternode> <clusternode name="node4" nodeid="4"> <fence> <method name="1"> <device name="scsi"/> </method> </fence> <unfence> <device name="scsi" action="on"/> </unfence> </clusternode> <fencedevices> <fencedevice agent="fence_scsi" name="scsi"/> </fencedevices> I initially tried configuring it exactly like we did in RHEL 5, which did not work, and then I tried the configuration I attached. Thanks for the direction, I'll give it a try sometime today, though I do have one more question: When setting up fencing through the web interface (luci), it requires selecting a fence type, a name, and a nodename. The first two should be "fence_scsi" and "scsi", but I am unsure what to put for "nodename". Does it actually make any difference? Your example leaves it off entirely within the <fencedevice \> tag. (In reply to comment #6) > I initially tried configuring it exactly like we did in RHEL 5, which did not > work, and then I tried the configuration I attached. Thanks for the direction, > I'll give it a try sometime today, though I do have one more question: As Dave pointed out in comment #4, the configuration of fence_scsi has changed in RHEL6. Specifically, you'll need to add unfence for each clusternode. > When setting up fencing through the web interface (luci), it requires selecting > a fence type, a name, and a nodename. The first two should be "fence_scsi" and > "scsi", but I am unsure what to put for "nodename". Does it actually make any > difference? Your example leaves it off entirely within the <fencedevice \> tag. The cluster.conf attached to this BZ was created via luci? If so, this seems like a luci problem, since it is not able to configure fence_scsi correctly. The "nodename" is no longer required, if I recall. It may be require in luci, but it should not be needed by fence_scsi in RHEL6. This was a way to pass the name of the node performing the fence op to the agent, which was necessary in RHEL5. nodename was used in RHEL5 to pass the victim's name to the fence agent. In RHEL6, fenced automatically adds nodename of the victim to the arg list if it's not already present. So putting nodename in cluster.conf will cause fenced to not add it automatically. I've made the changes described and I'm still having some problems. I rebooted all four nodes just to be safe and now it appears they are all having trouble unfencing themselves as they try to join the cluster. Cman appears start, but clvmd refuses to. I tried to manually unfence a node using fence_node and got the following output: [root@kswc-nightrod ~]# fence_node kswc-nightrod -Uvv unfence kswc-nightrod dev 0.0 agent fence_scsi result: error from agent agent args: action=on nodename=kswc-nightrod agent=fence_scsi unfence kswc-nightrod failed From the messages file: Sep 14 14:31:56 kswc-nightrod dlm_controld[12823]: dlm_join_lockspace no fence domain Sep 14 14:31:56 kswc-nightrod dlm_controld[12823]: process_uevent online@ error -1 errno 11 Sep 14 14:31:56 kswc-nightrod kernel: dlm: clvmd: group join failed -1 -1 Sep 14 14:31:56 kswc-nightrod clvmd: Unable to create lockspace for CLVM: Operation not permitted Sep 14 14:32:02 kswc-nightrod kernel: dlm: Using TCP for communications Sep 14 14:32:02 kswc-nightrod dlm_controld[12823]: dlm_join_lockspace no fence domain Sep 14 14:32:02 kswc-nightrod dlm_controld[12823]: process_uevent online@ error -1 errno 11 Sep 14 14:32:02 kswc-nightrod kernel: dlm: rgmanager: group join failed -1 -1 I bet I'm still missing something, here.. Created attachment 447313 [details]
Current cluster.conf file
Current cluster.conf file
It'll be easier to debug if you chkconfig off: cman, clvmd, gfs2, rgmanager to minimize the "noise". For now all we want to test is "service cman start" on all nodes. To get more debug information from the fence_scsi agent, please add the logfile option: <fencedevice agent="fence_scsi" name="scsi" logfile="/var/log/cluster/fence_scsi.log"/> And then try 'service cman start' on all the nodes. (In reply to comment #7) Thanks for pointing out the problem in Luci, bug #633983 was filed. Support for generating "unfencing" sections by Luci is a subject of #622562. (In reply to comment #11) > I've made the changes described and I'm still having some problems. I rebooted > all four nodes just to be safe and now it appears they are all having trouble > unfencing themselves as they try to join the cluster. Cman appears start, but > clvmd refuses to. If clvmd refuses to start and you don't specify your devices manually, then there is nothing to do -- fence_scsi will not have any devices to register with. > I tried to manually unfence a node using fence_node and got the following > output: > [root@kswc-nightrod ~]# fence_node kswc-nightrod -Uvv > unfence kswc-nightrod dev 0.0 agent fence_scsi result: error from agent > agent args: action=on nodename=kswc-nightrod agent=fence_scsi > unfence kswc-nightrod failed As Dave suggested in comment #13, using logfile parameter will be helpful. > If clvmd refuses to start and you don't specify your devices manually, then
> there is nothing to do -- fence_scsi will not have any devices to register
> with.
init.d/cman (which includes unfencing) starts before init.d/clvmd and cannot depend on clvmd. This is one reason why fence_scsi overrides clvmd locking when it looks for clustered volumes.
The problem could be related to finding clustered lvm devices (the log file should show us), but clvmd should not be the cause.
(In reply to comment #16) > > If clvmd refuses to start and you don't specify your devices manually, then > > there is nothing to do -- fence_scsi will not have any devices to register > > with. > > init.d/cman (which includes unfencing) starts before init.d/clvmd and cannot > depend on clvmd. This is one reason why fence_scsi overrides clvmd locking > when it looks for clustered volumes. > > The problem could be related to finding clustered lvm devices (the log file > should show us), but clvmd should not be the cause. The fence_scsi agent overrides clvmd locking, so unfencing can take place before clvmd starts. My statement in comment #15 is incorrect, as Dave pointed out. I should have asked this earlier, but are we sure that the devices being used support SCSI persistent reservations? Can you manually register with the devices using sg_persist? I turned on logging as described. When I start cman, the log shows this: fence_scsi: [error] key cannot be zero If you are asking if the devices support persistent reservations, in general, then the answer is yes. I have seen persistent reservations placed on these devices under other operating systems. I also tried: sg_persist --register --out -v /dev/sdi and it outputs inquiry cdb: 12 00 00 00 24 00 LSI VirtualDisk 9775 Peripheral device type: disk Persistent Reservation Out cmd: 5f 00 00 00 00 00 00 00 18 00 PR out: command (Register) successful but I do not see any reservation and running sg_persist -k /dev/sdi shows there is still no registered key. (In reply to comment #19) > I turned on logging as described. When I start cman, the log shows this: > > fence_scsi: [error] key cannot be zero Can you run cman_tool? Since you do not have keys manuall defined (which if fine), fence_scsi should be generating keys cluster cluster_id and nodeid. Get cluster_id: % cman_tool status Get nodeid: % cman_tool nodes -n nodename -F id > If you are asking if the devices support persistent reservations, in general, > then the answer is yes. I have seen persistent reservations placed on these > devices under other operating systems. > > I also tried: sg_persist --register --out -v /dev/sdi and it outputs > inquiry cdb: 12 00 00 00 24 00 > LSI VirtualDisk 9775 > Peripheral device type: disk > Persistent Reservation Out cmd: 5f 00 00 00 00 00 00 00 18 00 > PR out: command (Register) successful > > but I do not see any reservation and running sg_persist -k /dev/sdi shows there > is still no registered key. The command listed above does not specify a key value. (In reply to comment #20) > (In reply to comment #19) > > I turned on logging as described. When I start cman, the log shows this: > > > > fence_scsi: [error] key cannot be zero > > Can you run cman_tool? Since you do not have keys manuall defined (which if > fine), fence_scsi should be generating keys cluster cluster_id and nodeid. Note that I assume the cluster.conf file attached in comment #12 is the config file you are currently using. Correct, that is the configuration file I am currently using. Here is the output of the two commands: [root@kswc-nightrod home]# cman_tool status Version: 6.2.0 Config Version: 33 Cluster Name: clus-1284566409 Cluster Id: 62085 Cluster Member: Yes Cluster Generation: 660 Membership state: Cluster-Member Nodes: 4 Expected votes: 4 Total votes: 4 Node votes: 1 Quorum: 3 Active subsystems: 7 Flags: Ports Bound: 0 Node name: kswc-nightrod Node ID: 1 Multicast addresses: 239.192.242.120 Node addresses: 135.15.74.122 [root@kswc-nightrod home]# cman_tool nodes -n kswc-nightrod -F id 1 Thanks for the information. Could you also check to see if selinux is enabled and enforcing? You might also want to check for any AVC denials in the audit log. It is possible that the reason that the key is not being written to file (/var/lib/cluster/fence_scsi.key) when unfencing occurs. Check to see that the file exists. It should contain the local node's key value. I tested fence_scsi this evening with the latest RHEL6 build and it was working as expected, but only when selinux was disabled/permissive. The BZ for the selinux issue can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=634357 I also recommend listing the cluster volume groups. Since no devices are manually defined in the cluster.conf file, fence_scsi will use all devices that exist in cluster volumes ("c" attribute). I thought I had recreated the problem today on my own cluster, but that may not be the case. What I do know is that unfencing will fail if no devices are found. That means that if you don't have "devices=" configured for fence_scsi and you don't have any cluster volumes, fence_scsi unfencing will fail. This is correct behavior. If unfencing fails, then the node will not join the fence domain and it will not start dlm_controld, and thus you will not be able to start clvmd. This sounds exactly like the scenario I recreated today, which was due to the fact that fence_scsi found no devices when via vgs command. My advice is to run this: % vgs --config 'global { locking_type = 0 }' Look for VGs with the 'c' attribute. I am confused about the "key cannot be zero" you reported in comment #19. I've not been able to recreate that. Here is the output of the above command. [root@kswc-nightrod ~]# vgs --config 'global { locking_type = 0 }' WARNING: Locking disabled. Be careful! This could corrupt your metadata. VG #PV #LV #SN Attr VSize VFree lvm_vg 24 8 0 wz--nc 119.91g 15.81g vg_dhcp1351574122 1 3 0 wz--n- 67.88g 0 lvm_vg is the cluster vg, and it looks like it has the c attribute. I have locking_type = 3 in /etc/lvm/lvm.conf Also I checked yesterday and I do not believe /var/lib/cluster/fence_scsi.key existed when I tried. Selinux is set to disabled in our kickstart file, so I don't think that can be it either. Does unfencing work if you explicitly define keys for each node in your cluster.conf? That information will help narrow down the possibilities. Example for comment 26: <clusternode name="node1" nodeid="1"> <fence> <method name="1"> <device name="scsi" key="1"/> </method> </fence> <unfence> <device name="scsi" key="1" action="on"/> </unfence> </clusternode> <clusternode name="node2" nodeid="2"> <fence> <method name="1"> <device name="scsi" key="2"/> </method> </fence> <unfence> <device name="scsi" key="2" action="on"/> </unfence> </clusternode> ... <fencedevice agent="fence_scsi" name="scsi" logfile="/var/log/cluster/fence_scsi.log"/> I tried the above changes, and the nodes are now placing persistent reservations, and fencing / unfencing seems to work properly. I rebooted a node and the resources transfered, and there was no I/O timeout this time. I guess it must be because the host is not automatically generating a key like it should. The nodes will still indefinitely hang on shutdown.. It looks like the node will fail to leave the cluster domain because some of the gfs2 filesystems are still active. I'll have to see if I can get console redirection of that. Update: I was actually able to reboot one of the nodes and it shut down, and everything happened as expected. It's almost looking hardware specific: All three of the servers that hang are one brand, and the server that worked is a different brand. On one of the hung nodes, I waited about 30 minutes for it to shut down, and it didn't. At that time I tried hitting ctrl+alt+del and it started showing messages like the following, again: GFS2: fsid=clus-1284028638: gfs_vol5.3: gfs2_quotad: statfs error -5 Looks like we're back to the original issue, only now it no longer causes the entire cluster to hang, or cause I/O to time out. What does statfs error -5 mean? I installed RH6 RC1 and I am having some issues. If I have the "<unfence>" XML block in cluster.conf, as specified above, cman will fail to start when the node tries to unfence itself. The log file gives the following: "fence_scsi: [error] no devices found". I am not sure what it is expecting, as there are 24 LUNs mapped to the host group, and 8 logical volumes. Also, with the fence method defined like this: <method name="1"> <device name="Persistent_Reserve" key="1"/> </method> the host does not appear to place persistent reservations on the LUNs, again. (In reply to comment #31) > I installed RH6 RC1 and I am having some issues. If I have the "<unfence>" XML > block in cluster.conf, as specified above, cman will fail to start when the > node tries to unfence itself. The log file gives the following: "fence_scsi: > [error] no devices found". I am not sure what it is expecting, as there are 24 > LUNs mapped to the host group, and 8 logical volumes. > > Also, with the fence method defined like this: > <method name="1"> > <device name="Persistent_Reserve" key="1"/> > </method> > the host does not appear to place persistent reservations on the LUNs, again. It isn't going to put any registrations or reservations on the LUNs because it is not finding the devices. Please check that lvm filters are not interfering. I am going to assume that you did not manually configure devices, so that means fence_scsi will discover cluster volumes and the devices that comprise those volumes. Run this command: # vgs --options vg_name,vg_attr,pv_name --config 'global { locking_type = 0 }' You should a list of all volume groups and all devices (pvs) that exist in those volume groups. Look for volumes with the 'c' attribute. Okay, I tried something different.. For our testing our script generates cluster.conf, starts the services, and then creates the lv's, gfs2 filesystems, then mounts them. The volumes did not seem to have clustered attribute, so I tried re-creating the cluster, without the <unfence> tags, so that cman would start. Sure enough, the volumes now show the clustered attribute: [root@kswc-snoopy home]# vgs --options vg_name,vg_attr,pv_name --config 'global { locking_type = 0 }' WARNING: Locking disabled. Be careful! This could corrupt your metadata. VG Attr PV lvm_vg wz--nc /dev/sdn lvm_vg wz--nc /dev/sdo lvm_vg wz--nc /dev/sdp lvm_vg wz--nc /dev/sdq lvm_vg wz--nc /dev/sdr lvm_vg wz--nc /dev/sds lvm_vg wz--nc /dev/sdt lvm_vg wz--nc /dev/sdu lvm_vg wz--nc /dev/sdv lvm_vg wz--nc /dev/sdw lvm_vg wz--nc /dev/sdx lvm_vg wz--nc /dev/sdy lvm_vg wz--nc /dev/sdb lvm_vg wz--nc /dev/sdc lvm_vg wz--nc /dev/sdd lvm_vg wz--nc /dev/sde lvm_vg wz--nc /dev/sdf lvm_vg wz--nc /dev/sdg lvm_vg wz--nc /dev/sdh lvm_vg wz--nc /dev/sdi lvm_vg wz--nc /dev/sdj lvm_vg wz--nc /dev/sdk lvm_vg wz--nc /dev/sdl lvm_vg wz--nc /dev/sdm vg_dhcp135157468 wz--n- /dev/sda2 I added <unfence> back into cluster.conf and rebooted a node. Now that node cannot join the cluster. I added the logfile="" tag again to see if that would give some more information, but now when I start cman, it just says unfencing of the node failed, and the log file is not created. My other question is regarding this having to manually specify keys: is this a bug for which I need to submit another bugzilla? In RHEL5, we've never had to do anything like that, and unfencing did not need to be specified. Actually, it is working now. However, my questions still stand: 1. Our setup script usually generates the entire cluster.conf file before starting the services. Unfencing seems to fail at this point because the devices will not already be set up. Is it necessary to add in the <unfence> attribute later? Or is there some way to skip the unfencing step on the first startup of cman? 2. Same question as in comment 33: Should the OS be generating the keys, itself? What could cause that to fail? I can file another bug for this, if necessary. Thanks. (In reply to comment #34) > Actually, it is working now. However, my questions still stand: Things have changed in RHEL6. Specifically, the addition of unfencing (and removal of scsi_reserve) andt he option to manually define keys/devices. The ability to manually specify keys and/or devices was added to provide greater control, if desired. If these config options are omitted > 1. Our setup script usually generates the entire cluster.conf file before > starting the services. Unfencing seems to fail at this point because the > devices will not already be set up. Is it necessary to add in the <unfence> > attribute later? Or is there some way to skip the unfencing step on the first > startup of cman? By "set up" do you mean exist? The devices must exist before fence_scsi unfencing cans succeed. I don't know if you can omit the unfence section and add it later. Are you creating your volumes in this setup script? Perhaps you need to set to cluster flag when you created the volume group. > 2. Same question as in comment 33: Should the OS be generating the keys, > itself? What could cause that to fail? I can file another bug for this, if > necessary. This should be explained in the kbase article in comment #18. Defining keys is optional. If you don't specify the a key then one will be generated, just as in RHEL5. If key generation is still failing, then that is a bug that I have not been able to reproduce. Now that you have things semi-working, can you try removing the manually defined keys and test again? Please don't file a new bug yet, since I'm not sure what the state of this bug is. > By "set up" do you mean exist? The devices must exist before fence_scsi > unfencing cans succeed. I don't know if you can omit the unfence section and > add it later. Are you creating your volumes in this setup script? Perhaps you > need to set to cluster flag when you created the volume group. All of the hosts in the cluster see the volumes on the array prior to starting the setup script (They see 24 uninitialized physical volumes on the storage array). The script generates the conf file, starts all of the services, and then issues the pvcreate command to initialize the 24 LUNs, vgcreate to create the volume group, and then lvcreate to create 8 Logical Volumes out of that volume group. I suppose pvcreate could be run before starting the services, but when I did that without clvmd running (when unfence did not work), the clustered attribute was not set. Is there some way to explicitly set the flag? > This should be explained in the kbase article in comment #18. Defining keys is > optional. If you don't specify the a key then one will be generated, just as in > RHEL5. If key generation is still failing, then that is a bug that I have not > been able to reproduce. Now that you have things semi-working, can you try > removing the manually defined keys and test again? > Please don't file a new bug yet, since I'm not sure what the state of this bug > is. Comment #18 does not appear to be there. I can try again in a bit. Though, I believe the key problem is a separate problem from the one I am experiencing, here. This bug is regarding how a host can sometimes hang on shutdown, or when both host to array I/O cables are pulled. The host seems to have trouble unmounting the GFS2 filesystems, and when the shutdown hang occurs I'll see a message like: GFS2: fsid=clus-1284028638: gfs_vol5.3: gfs2_quotad: statfs error -5 and this message will print out indefinitely, until I power cycle the host. (In reply to comment #36) > > By "set up" do you mean exist? The devices must exist before fence_scsi > > unfencing cans succeed. I don't know if you can omit the unfence section and > > add it later. Are you creating your volumes in this setup script? Perhaps you > > need to set to cluster flag when you created the volume group. > > All of the hosts in the cluster see the volumes on the array prior to starting > the setup script (They see 24 uninitialized physical volumes on the storage > array). Right. But they have to see the volume *and* the cluster bit must be set in order for fence_scsi to use them. > The script generates the conf file, starts all of the services, and then issues > the pvcreate command to initialize the 24 LUNs, vgcreate to create the volume > group, and then lvcreate to create 8 Logical Volumes out of that volume group. > > I suppose pvcreate could be run before starting the services, but when I did > that without clvmd running (when unfence did not work), the clustered attribute > was not set. Is there some way to explicitly set the flag? vgcreate -cy <vg_name> pv [pv, ... ] > > This should be explained in the kbase article in comment #18. Defining keys is > > optional. If you don't specify the a key then one will be generated, just as in > > RHEL5. If key generation is still failing, then that is a bug that I have not > > been able to reproduce. Now that you have things semi-working, can you try > > removing the manually defined keys and test again? > > Please don't file a new bug yet, since I'm not sure what the state of this bug > > is. > > Comment #18 does not appear to be there. Ah. Sorry about that. Try this: https://access.redhat.com/kb/docs/DOC-40127/version > I can try again in a bit. Though, I believe the key problem is a separate > problem from the one I am experiencing, here. > > This bug is regarding how a host can sometimes hang on shutdown, or when both > host to array I/O cables are pulled. The host seems to have trouble unmounting > the GFS2 filesystems, and when the shutdown hang occurs I'll see a message > like: > GFS2: fsid=clus-1284028638: gfs_vol5.3: gfs2_quotad: statfs error -5 > > and this message will print out indefinitely, until I power cycle the host. Fair enough, but lets if we can get scsi reservations working correctly and then address the gfs2 problem. (In reply to comment #37) > > Comment #18 does not appear to be there. > > Ah. Sorry about that. Try this: > > https://access.redhat.com/kb/docs/DOC-40127/version > Oops. Try this instead: https://access.redhat.com/kb/docs/DOC-40127 Okay, I have figured out how to set it up so that automatically generated keys will work. The script now creates the volume group, starts the services, and then creates LUNs from there. It now works without specifying keys. This brings us back to the original problem: I tried rebooting a node, and although its getting fenced out, as expected, its hanging on shutdown, saying: GFS2: fsid=clus-1285337966:gfs_vol2.0: gfs2_quotad: statfs error -5 (In reply to comment #39) > Okay, I have figured out how to set it up so that automatically generated keys > will work. The script now creates the volume group, starts the services, and > then creates LUNs from there. It now works without specifying keys. What did you have to do? It should be just a matter of *not* putting key="X" in the config. I had to run vgcreate (with -c y) before starting the services on the nodes. Originally, vgcreate would be run after starting the services. Keys would not be generated automatically in that case. In answer to question #30, in the message: GFS2: fsid=clus-1284028638: gfs_vol5.3: gfs2_quotad: statfs error -5 Error -5 is -EIO being returned while gfs2 was trying to perform a statfs_sync operation. That means gfs2 was either unable to get an exclusive lock on the master statfs system file, or unable to start a transaction (which can only happen if gfs2 is unable to acquire a shared lock on the transaction glock or unable to reserve space in the journal for the transaction). The call traces given above lead me to believe that gfs2 is simply waiting for the dlm layer below it to respond to its lock requests, which is not unexpected. In theory, dlm should perform lock recovery and eventually respond to gfs2 once it knows the cluster is sane. I'd like to figure out where the -EIO is coming from to see if dlm is doing lock recovery but gfs2's glock layer isn't handling that properly. Isn't this a simple case of reboot not stopping all the cluster-users properly before stopping the cluster? Oftentimes, extra care needs to be taken when shutting down a node. Try manually doing: service rgmanager stop service gfs2 stop umount -a -t gfs2 service clvmd stop service cman stop reboot This problem also occurs when both I/O cables are pulled from the host, and the host loses access to the LUNs. I also ran into a case where I rebooted a node, and it hung during startup, trying to mount the gfs2 filesystems. I let it sit all night, and it still did not come up. I had to reboot all of the nodes to get the cluster to come back up. > This problem also occurs when both I/O cables are pulled from the host, and the > host loses access to the LUNs. GFS is expected to either hang or panic the machine if the storage goes away (I believe there are mount options to control which, the best option is to panic, which allows remaining nodes to recover for it and continue operating.) > I also ran into a case where I rebooted a node, and it hung during startup, > trying to mount the gfs2 filesystems. I let it sit all night, and it still did > not come up. I had to reboot all of the nodes to get the cluster to come back > up. To figure that out we'd need to see some diagnostic information, e.g. cman_tool nodes group_tool -n /var/log/messages ps ax -o pid,stat,cmd,wchan I noticed that this bug is currently filed against version 6.1. Can someone from RedHat confirm that this feature is not targetted to be fixed in the 6.0 release? Thanks. (In reply to comment #46) > I noticed that this bug is currently filed against version 6.1. Can someone > from RedHat confirm that this feature is not targetted to be fixed in the 6.0 > release? Thanks. Correct - targeted for RHEL 6.1. I am now trying to run the same cluster configuration, but this time with device mapper multipath for the failover. I am running into a problem where cman tries to start, but fails because the node fails to unfence itself, and does not output anything to the specified logfile. The hosts see the clustered volume group and the associated devices as follows: [root@kswc-vfr1200 logs]# vgs --options vg_name,vg_attr,pv_name --config 'global {locking_type = 0}' WARNING: Locking disabled. Be careful! This could corrupt your metadata. VG Attr PV lvm_vg wz--nc /dev/mapper/3600a0b8000475d800000ab6f4c87bbd1 lvm_vg wz--nc /dev/mapper/3600a0b8000475d800000ab784c87bc06 lvm_vg wz--nc /dev/mapper/3600a0b8000475d800000ab764c87bbf6 lvm_vg wz--nc /dev/mapper/3600a0b8000475e1800000ed24c87bd6d lvm_vg wz--nc /dev/mapper/360080e50001b0da000003e034c88786e lvm_vg wz--nc /dev/mapper/3600a0b8000475e1800000ed04c87bd5e lvm_vg wz--nc /dev/mapper/3600a0b8000475e1800000ed44c87bd79 lvm_vg wz--nc /dev/mapper/3600a0b8000475d800000ab7a4c87bc14 lvm_vg wz--nc /dev/mapper/360080e50001b0e7a00002a6c4c8877ce lvm_vg wz--nc /dev/mapper/360080e50001b0da000003e074c88787d lvm_vg wz--nc /dev/mapper/360080e50001b0da000003e0b4c88788b lvm_vg wz--nc /dev/mapper/360080e50001b0da000003e094c887884 lvm_vg wz--nc /dev/mapper/360080e50001b0e7a00002a684c8877be lvm_vg wz--nc /dev/mapper/360080e50001b0e7a00002a6a4c8877c7 lvm_vg wz--nc /dev/mapper/360080e50001b0e7a00002a704c8877dd lvm_vg wz--nc /dev/mapper/360080e50001b0e7a00002a724c8877e4 lvm_vg wz--nc /dev/mapper/360080e50001b0e7a00002a6e4c8877d5 lvm_vg wz--nc /dev/mapper/3600a0b8000475d800000ab744c87bbea lvm_vg wz--nc /dev/mapper/360080e50001b0da000003e004c887861 lvm_vg wz--nc /dev/mapper/3600a0b8000475e1800000ec94c87bd39 lvm_vg wz--nc /dev/mapper/3600a0b8000475e1800000ece4c87bd4f lvm_vg wz--nc /dev/mapper/3600a0b8000475d800000ab724c87bbde lvm_vg wz--nc /dev/mapper/360080e50001b0da000003e054c887876 lvm_vg wz--nc /dev/mapper/3600a0b8000475e1800000ecc4c87bd43 vg_dhcp1351575155 wz--n- /dev/sda2 Is it possible there needs to be some different configuration to run with the /dev/mapper/3600* devices? I completely re-created the cluster with DMMP as the failover, and it looks like the hosts now generate keys on their own again, but some sort of failure occurs during registration. The logs look something like this: Oct 01 16:32:24 fenced fenced 3.0.12 started Oct 1 16:32:26 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdc) Oct 1 16:32:26 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdp) Oct 1 16:32:26 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdy) Oct 1 16:32:27 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdan) Oct 1 16:32:27 fence_scsi: [debug] main::do_reserve (host_key=1cd90001, dev=/dev/dm-17) Oct 1 16:32:27 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdx) Oct 1 16:32:28 fence_scsi: [debug] main::do_register_ignore (node_key=1cd90001, dev=/dev/sdam) PR out: unit attention [error]: main::do_register_ignore I notice that a couple of the devices will have registrations from all the hosts, some will have one or two registrations, but most devices will have no registrations at all. Also, I would like to clarify what kind of debug you would like for the GFS2 related issue. I am not sure I can reproduce the hang during start up, as I have only seen it happen once. (In reply to comment #49) > I completely re-created the cluster with DMMP as the failover, and it looks > like the hosts now generate keys on their own again, but some sort of failure > occurs during registration. The logs look something like this: > > Oct 01 16:32:24 fenced fenced 3.0.12 started > Oct 1 16:32:26 fence_scsi: [debug] main::do_register_ignore > (node_key=1cd90001, dev=/dev/sdc) > Oct 1 16:32:26 fence_scsi: [debug] main::do_register_ignore > (node_key=1cd90001, dev=/dev/sdp) > Oct 1 16:32:26 fence_scsi: [debug] main::do_register_ignore > (node_key=1cd90001, dev=/dev/sdy) > Oct 1 16:32:27 fence_scsi: [debug] main::do_register_ignore > (node_key=1cd90001, dev=/dev/sdan) > Oct 1 16:32:27 fence_scsi: [debug] main::do_reserve (host_key=1cd90001, > dev=/dev/dm-17) > Oct 1 16:32:27 fence_scsi: [debug] main::do_register_ignore > (node_key=1cd90001, dev=/dev/sdx) > Oct 1 16:32:28 fence_scsi: [debug] main::do_register_ignore > (node_key=1cd90001, dev=/dev/sdam) > PR out: unit attention > [error]: main::do_register_ignore > > I notice that a couple of the devices will have registrations from all the > hosts, some will have one or two registrations, but most devices will have no > registrations at all. > > Also, I would like to clarify what kind of debug you would like for the GFS2 > related issue. I am not sure I can reproduce the hang during start up, as I > have only seen it happen once. The "PR out: unit attention" unit attention message is normal and it being tracked in BZ 640343. We need to keep the fence_scsi issues and the gfs2 issues separate, if possible. As state above, we need to isolate the issues being reported and keep them in separated BZs. What is the current issue? This BZ inintially reported a problem with gfs2, but has since been mostly about fence_scsi. If there is a gfs2 problem, perhaps that should be reported in a difference bug and this BZ should be closed. The current fence_scsi issue is being tracked by BZ 640343. Sean, In order for us to ensure this issue gets the attention it deserves, please contact Red Hat support. Information on how to do that can be found here: https://access.redhat.com/support/contact/ We apologize for any inconvenience here, however bugzilla is not a support tool and all support requests should be routed through the Red Hat global support services (GSS) team. If you have any questions, please feel free to contact me. Thanks Jeremy West GSS Supervisor Sean, can I ask a more fundamental question about your configuration? In your cluster.conf you don't have any services listed aside from the ip addresses that move around. You mention in the description that you're mounting NFS shares from cluster nodes and also that you're using GFS2. Can you describe what the purpose of the cluster is? Is it basically taking 4 RHCS nodes, with iSCSI backend storage and then exposing this backend storage via NFS on top of GFS2 running on the cluster nodes? If so, are you aware that it's presently not supported to run multiple NFS servers accessing the same backend GFS2 filesystem due to various issues with locking? NFS servers don't have any notion of clusterized locks. If you could give a little more description of the use case here that would be useful. Thanks. Also, is there any aspect of hardware enablement here? I'm unclear on how this relates to LSI's hardware portfolio. I am testing resource availability in various cases If running this configuration is not supported, I am not aware of this, because this is how we have been running RHCS on RHEL4 and RHEL5, following a section of the document on http://sources.redhat.com/cluster/doc/nfscookbook.pdf "Managed Virtual IP Service This method of managing NFS failover is more flexible than using Managed NFS Service because it allows you to handcraft your NFS exports as you see fit. However, it is only designed to work on GFS file systems. Using it for nonGFS file systems is not recommended. In this configuration, you must ensure that /etc/exports file is in sync on all nodes, and that nfsd is always running on all nodes (it's not monitored by the cluster; the service doesn't use <nfsexport> in this case), and that the GFS file system is mounted before NFS is started. Therefore, it requires more planning, maintenance and monitoring." It appears to be centered around running multiple hosts with the same exports and nfs configuration, with gfs, running only the virtual ip addresses as resources. Do you mean that this configuration is not supported for GFS2 currently, or that this is not supported for both GFS and GFS2 on previous versions of RHEL as well? Also, the "current" issue is FS2 hang issue (where statfs error -5 would be repeated indefinitely) that I described initially. I have worked around it by configuring APC power fencing as the second fence device. Sorry for any confusion, and thank you all for the help so far with configuration. (In reply to comment #56) > I am testing resource availability in various cases > > If running this configuration is not supported, I am not aware of this, because > this is how we have been running RHCS on RHEL4 and RHEL5, following a section > of the document on http://sources.redhat.com/cluster/doc/nfscookbook.pdf That web site is upstream focused and not official RHEL documentation. The notes on this for RHEL official docs are available here: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/additional_configuration_considerations.html ... > It appears to be centered around running multiple hosts with the same exports > and nfs configuration, with gfs, running only the virtual ip addresses as > resources. > > Do you mean that this configuration is not supported for GFS2 currently, or > that this is not supported for both GFS and GFS2 on previous versions of RHEL > as well? It's not supported for GFS1 or GFS2 right now. There are too many caveats to make it work in a reliable manner. For example, you would need to disallow usage of NFS locks entirely since a lock taken on one host wouldn't migrate to another host (since NFS locks are not clusterized) and you would also need to turn off NFS passing plocks down to GFS/GFS2. Steve Whitehouse (GFS lead developer) has a lot more information on the caveats and pitfalls here, so he might be able to better explain. Can you explain the exact use case here? I'm a bit confused reading down the bugzilla since it looks like some details might have changed while the initial issues were resolved. I'm looking for a quick description along the lines of: 1) How many nodes? 2) How many GFS2 mounts, and how are they exported via NFS 3) How is the shared storage attached and what is it? (e.g. an array connected by fibrechannel) 4) What are the GFS2 mount options? 5) What are the NFS export options? 6) What is the problem that you are trying to solve? Using NFS with GFS2 can be problematic due to the handling of fcntl/POSIX locks. We do support NFS over GFS2, but only in very restricted circumstances (active/passive failover only) in order to avoid these problems. We are actively working on resolving the issues, however this is a medium term project due to the complexity of the problem. 1) 4 nodes in the cluster 2) 8 GFS2 mount points 3) Each node is connected to two LSI storage arrays via iSCSI. These arrays have a total of 24 volumes which are all pooled to create the logical volumes 4) (rw,relatime,hostdata=jid=1) Journal ID, of course, is different for each node 5) *(rw,sync,no_root_squash,fsid=2345600) Exported to the world because this is a test configuration 6) When I issue "reboot" on one of the nodes, it does not properly shut down the cluster services and reboot. Instead the shutdown hangs indefinitely, and gives the gfs2 statfs error I gave earlier. It looks like the node fails to properly unmount the gfs2 mount points. SCSI reservation fencing removes the node's access to the devices, but the node continues to hang. I have been able to work around this issue by setting up APC power fencing. With that configuration, when the node hangs, another node simply shuts off power and powers the node back on. However, simply issuing a reboot without apc power fencing should also result in the node successfully shutting down and coming back up. Also, although every mount can be accessed through every IP address, I am only accessing each mount point through one IP address. This configuration does run active/passive failover, too. Ok, that usage of NFS does sound like it would avoid the locking issues, since no two nodes are exposing the same NFS exports or same GFS2 filesystems even. SteveW, correct me if that is wrong. Just to eliminate variables here, can you reproduce this w/o NFS servers running at all? i.e. just a cluster w/ GFS2 filesystems mounted and do a reboot to see if the filesystems cleanly unmount as part of the reboot process. Thanks! Wrt to NFS, you must specify the localflocks mount option on the GFS2 mounts which are to be exported via NFS. Also, only one node should be NFS exporting a single GFS2 filesystem at once (i.e. active/passive failover). Wrt to the shutdown issue, how are the filesystems being mounted. There is a known issue when GFS2 filesystems are mounted via a method other than via fstab and the gfs2 init script. In that case manually mounted filesystems which are not in fstab will not get umounted correctly at reboot time. Another possible issue is that there is something still using the filesystem at reboot time which prevents the filesystem from being unmounted. That something might be NFS for example, or a local process. I've been busy recently working on another issue with the setup, but I have a couple of things to add here. Previously, I was mounting the GFS2 filesystems via fstab, and they were exported by all of the hosts at all times. As per the suggestions in this thread, I have reconfigured the cluster so that rgmanager manages which gfs2 filesystems are mounted and exported on each host. The results, so far, look promising, as I was able to reboot a server and have it come back up without manual intervention, or without it having to be power fenced. So I am led to believe that the locking issues could be related to having all of the nodes mounting the filesystems at once. That particular issue was not observed in RHEL5.5. Running in this configuration generally seems smoother. I will attach my current cluster.conf. Does this configuration look more "correct"? Also, is it correct that when you say running multiple NFS servers from the same backend gfs2 filesystems is not currently supported that you intend for it be supported down the road? Created attachment 455196 [details]
Current cluster.conf file
Updated 10/22/10
(In reply to comment #62) > I will attach my current cluster.conf. Does this configuration look more > "correct"? Lon, can you review this? > Also, is it correct that when you say running multiple NFS servers > from the same backend gfs2 filesystems is not currently supported that you > intend for it be supported down the road? We'd like it to be, but don't have a timeline for when we'll be able to formally support it yet. Sean, Is this still an issue with RHEL 6.0 GA It seems to me like this is not an issue in general. It only happens when trying to run an active/active configuration with GFS2. If that is not supported yet, at all, then maybe the bug should be closed? To make this 100% clear.... active/active NFS is unsupported on GFS2 at the current time, and will remain so until lockd can be made to work correctly during the recovery phase. Until then, NFS may only be exported from a maximum of one node of the cluster at once and the localflocks mount argument must be used on each GFS2 mount being exported via NFS. Local workloads must not be mixed with NFS exports of the same filesystem. This does not mean that other active/active workloads are unsupported. The above applies to NFS only. In general we support active/active workloads with the proviso that the performance characteristics of the workload are understood and that acceptable performance can be achieved (which is usually the case). Right, well, I wrote this bug specifically against a configuration with active/active NFS on GFS2. In this instance I can't really say whether or not it is a problem on other configurations. In which case can we close this as NOTABUG ? Sounds to me like that might be the thing to do. Ok, closing it now... if you find any other issues, please let us know. |