Bug 253836
Summary: | qdiskd causes node to reboot | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Herbert L. Plankl <h.plankl> |
Component: | cman | Assignee: | Lon Hohberger <lhh> |
Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.0 | CC: | ccaulfie, cluster-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2007-0599 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-11-16 13:48:37 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 314641 | ||
Bug Blocks: |
Description
Herbert L. Plankl
2007-08-22 10:05:17 UTC
Visible Set: { 1 2 } Master Node ID: 1 Quorate Set: { 2 3 } ^^^^ This is strange; are your nodeids 1 and 2 in cluster.conf? (it shouldn't matter that they were virtual machines nor that you were using virtual disks) yes they are. I've set up the cluster using system-config-cluster, which btw. has a bug too: in the Managementtab it does not show the cluster-members but it shows the services correctly. Maybe this problem is related to the nodeids. What's strange about the nodeids? I guess they should be "0" and "1"? If the quorum joins the cluster (it is offline), clustat -x says the nodeid of the quorumdisk is "0". I'll try it with nodeid "0" and "1" and will report the results here.. I've stopped the cluster, vim-ed the config and changed nodeid of node 1 to "0" and node 2 to "1". Then I scp-ed the cluster.conf to the other node and tried to start cman. Thats the output: [root@rhel5n2 ~]# /etc/init.d/cman start Starting cluster: Loading modules... done Mounting configfs... done Starting ccsd... done Starting cman... failed cman not started: No node ID for rhel5n1.icoserve.test, run 'ccs_tool addnodeids' to fix /usr/sbin/cman_tool: aisexec daemon didn't start [FAILED] The cluster.conf is changed after this failed start: Version is <version + 1> and the nodeids are * Node 1 has "2" * Node 2 has "1" I think, ccs-daemon did this changes - but cman does not start although.. I have to set node 1's id to "1" and node 2's id to "2" to get cman running.. For what it's worth, you can't use node ID = 0 for nodes. The visible set should be nodes {1 2} after they get up and running; qdiskd shouldn't display node 0 in the quorate set. It could be that the quorate set output is backwards, but I doubt that. I'll look more into it. Are you using multipath / LVM / etc for qdisk? I'm using VMware-Workstation. And so the disks are virtual disks connected to both nodes. They are raw - no LVM, no Multipath-Fibrechannel. GFS works fine on this setting and RedHat cluster 3 with quorum-disk works too (So I don't think the problem is related to the fact that they are virtual disks) Here is the disk setting for both virtual machines: disk.locking = "FALSE" scsi1.sharedBus = "virtual" scsi1.virtualDev = "lsilogic" scsi1.present = "TRUE" scsi1:0.deviceType = "disk" scsi1:0.writeThrough = "TRUE" scsi1:0.present = "TRUE" scsi1:0.fileName = "/opt/vmware/machines/vmrhel3_san/qu01.vmdk" scsi1:0.mode = "independent-persistent" scsi1:1.deviceType = "disk" scsi1:1.writeThrough = "TRUE" scsi1:1.present = "TRUE" scsi1:1.fileName = "/opt/vmware/machines/vmrhel3_san/qu02.vmdk" scsi1:1.mode = "independent-persistent" Ah, so cman / aisexec probably crashed on activation? yes, it seems so Herbert, I'm pretty sure this is fixed in 5.1 - there are two bugs related to cman qdisk interaction which were fixed: (1) device names could only be 15 characters (you were using /dev/sdd, so this probably isn't the problem) (2) timer logic bug in openais caused cman/openais to die when qdiskd advertised fitness Could you retest with the 5.1 cman + ccs packages and let me know if it's not fixed? It should be. If it is fixed, could you close this bug? Sounds good - where can I find this packages? In RHN the latest versions are: cman-2.0.73-1.el5_1.1.i386 Red Hat Enterprise Linux (v. 5 for 32-bit x86) rgmanager-2.0.31-1.el5.i386 RHEL Clustering (v. 5 for 32-bit x86) With this versions the problem is not solved - cman died a few minutes after starting qdiskd. You also need the openais-0.80.3-7.el5 if you don't already have it. If you do, then this is a new bug - and it seems specific to your configuration (VMWare). Yum-ed up to RHEL5.1 and tested again -> now its working, qdisk is online and seems to be working. Thank You! (I cannot close the bug - bugzilla says, only owner can do that..) Sweet. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0599.html *** This bug has been marked as a duplicate of 314641 *** |