Bug 126971
Summary: | mount of GFS with type lock_gulmd segfaults | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
Component: | gfs | Assignee: | michael conrad tadpol tilstra <mtilstra> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-15 15:37:58 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2004-06-29 21:41:34 UTC
when it worked before, did you load both gulm and dlm lock modules? When I attempted to use gulm originally (and when it was worked for me) it is possible that I didn't have the dlm mod loaded and was just using a raw device, but maybe not, maybe I was using lvm2 then as well. I can't remeber anymore. :( When the seg fault was happening I did have dlm loaded as I was using lvm2. I haven't been able to reproduce this seg fault anymore, however every attempt to mount a filesystem using lock_gulm results in a hang, wheither it's a raw dev or an lvm2 volume. which code base is this from? latest code base in cluster tree as of last week, 2.6.8.1 ah, well all of that just changed this afternoon. So give it another go. This seems to be more of a lock_gulm state issue than a mount issue. I start lock_gulmd on all my nodes with the following cmd: lock_gulmd -s morph-01,morph-03,morph-05 -n morph-cluster But the servers get stuck either in "Arbitrating" or in "Pending", thus causing the mount to hang or time out due to a connection refused. [root@morph-01 root]# gulm_tool getstats $(hostname) I_am = Arbitrating quorum_has = 1 quorum_needs = 2 rank = 0 GenerationID = 1094764224845226 run time = 362 pid = 4187 verbosity = Default failover = enabled [root@morph-02 root]# gulm_tool getstats $(hostname) I_am = Client quorum_has = 1 quorum_needs = 2 rank = -1 GenerationID = 0 run time = 301 pid = 3995 verbosity = Default failover = enabled [root@morph-03 root]# gulm_tool getstats $(hostname) Command timed out. [root@morph-04 root]# gulm_tool getstats $(hostname) I_am = Client quorum_has = 1 quorum_needs = 2 rank = -1 GenerationID = 0 run time = 430 pid = 3883 verbosity = Default failover = enabled [root@morph-05 root]# gulm_tool getstats $(hostname) I_am = Pending quorum_has = 1 quorum_needs = 2 rank = 2 GenerationID = 0 run time = 276 pid = 3922 verbosity = Default failover = enabled [root@morph-06 root]# gulm_tool getstats $(hostname) I_am = Client quorum_has = 1 quorum_needs = 2 rank = -1 GenerationID = 0 run time = 431 pid = 3947 verbosity = Default failover = enabled This is with the lastest code. I also see these errors in all the syslogs: Sep 9 16:17:53 morph-01 lock_gulmd_core[4187]: ERROR [src/core_io.c:1317] Node (morph-03.lab.msp.redhat.com ::ffff:192.168.44.63) has been denied from connecting here. Sep 9 16:17:57 morph-01 lock_gulmd_core[4187]: ERROR [src/core_io.c:1317] Node (morph-05.lab.msp.redhat.com ::ffff:192.168.44.65) has been denied from connecting here. Sep 9 16:18:02 morph-01 lock_gulmd_core[4187]: ERROR [src/core_io.c:1317] Node (morph-04.lab.msp.redhat.com ::ffff:192.168.44.64) has been denied from connecting here. Sep 9 16:18:07 morph-01 lock_gulmd_core[4187]: ERROR [src/core_io.c:1317] Node (morph-02.lab.msp.redhat.com ::ffff:192.168.44.62) has been denied from connecting here. Sep 9 16:18:07 morph-01 lock_gulmd_core[4187]: ERROR [src/core_io.c:1317] Node (morph-06.lab.msp.redhat.com ::ffff:192.168.44.66) has been denied from connecting here. Is ccs running? or are you doing all gulm config via cmdline? ccsd is running and then I do the lock_gulmd cmdline I posted. ok. what does the nodes section of the ccs conf look like? [root@morph-01 root]# cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster name="morph-cluster" config_version="1"> <cman> </cman> <dlm> </dlm> <nodes> <node name="morph-01" votes="1"> <fcdriver>qla2300</fcdriver> <fence> <method name="single"> <device name="apc" switch="1" port="1"/> </method> </fence> </node> <node name="morph-02" votes="1"> <fcdriver>qla2300</fcdriver> <fence> <method name="single"> <device name="apc" switch="1" port="2"/> </method> </fence> </node> <node name="morph-03" votes="1"> <fcdriver>qla2300</fcdriver> <fence> <method name="single"> <device name="apc" switch="1" port="3"/> </method> </fence> </node> <node name="morph-04" votes="1"> <fcdriver>lpfc</fcdriver> <fence> <method name="single"> <device name="apc" switch="1" port="4"/> </method> </fence> </node> <node name="morph-05" votes="1"> <fcdriver>lpfc</fcdriver> <fence> <method name="single"> <device name="apc" switch="1" port="5"/> </method> </fence> </node> <node name="morph-06" votes="1"> <fcdriver>qla2300</fcdriver> <fence> <method name="single"> <device name="apc" switch="1" port="6"/> </method> </fence> </node> </nodes> <fence_devices> <device name="apc" agent="fence_apc" ipaddr="morph-apc" login="apc" passwd="apc"/> </fence_devices> <rm> </rm> </cluster> well fun. gulm is looking in ccs for a node called "morph-03.lab.msp.redhat.com", but it doesn't see any with that name. ("morph-03" != "morph-03.lab.msp.redhat.com") This is related to bz132222 no longer seg faults or is stuck due to FQDN after fix for 132222 err, little confused how a fix to cman fixed gulm, but hey, if it works it works i guess. |