Hide Forgot
Description of problem: We have the following issue on a two node cluster system (node1 + node2) with qdisk. The RGManager command clustat does not show the cluster services any more on node1. The RGManager status was also not listed for both cluster members by clustat. On cluster member node2, the command clustat shows the cluster services and the RGManager for both nodes as running. The status logging of the RGManager stops on the cluster members 2 days ago without errors. See below the command clustat from both cluster nodes: node1 ~ # clustat Service states unavailable: Temporary failure; try again Cluster Status for CL_XY @ Fri Mar 18 08:51:21 2011 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1 1 Online, Local node2 2 Online /dev/dm-11 0 Online, Quorum Disk node2 ~ # clustat Cluster Status for CL_XY @ Fri Mar 18 08:53:33 2011 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1 1 Online, RG-Worker node2 2 Online, Local, RG-Master /dev/dm-11 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- service:1 node1 started service:2 node1 started service:3 node2 started service:4 node2 started Furthermore the LVM command does not successfully execute. The commands just hang, but it was possible to kill the processes by pressing ctrl+c. The exact same issue happend a month ago on this system with RHEL5.5. In the meantime the cluster was updated to RHEL5.6. Debugging for RGManager and cman is now activated in the /etc/cluster/cluster.conf file. Version-Release number of selected component (if applicable): kernel-2.6.18-238.5.1.el5.x86_64 cman-2.0.115-68.el5_6.1.x86_64 rgmanager-2.0.52-9.el5.x86_64 How reproducible: not reproducible Steps to Reproduce: 1. 2. 3. Actual results: RGManager does stop writing logs. Command clustat is not working for 1 node. LVM commands stop working. Expected results: Additional info: The cluster services include mounted disks. For this setup a LVM HA configuration is used (RHEL5.6).
I am faced with the same issue. Any idea when will a fix be available? Any work-arounds? (In reply to comment #0) > Description of problem: > > We have the following issue on a two node cluster system (node1 + node2) with > qdisk. > > The RGManager command clustat does not show the cluster services any more on > node1. The RGManager status was also not listed for both cluster members by > clustat. > > On cluster member node2, the command clustat shows the cluster services and the > RGManager for both nodes as running. > > The status logging of the RGManager stops on the cluster members 2 days ago > without errors. > > See below the command clustat from both cluster nodes: > > > node1 ~ # clustat > Service states unavailable: Temporary failure; try again > Cluster Status for CL_XY @ Fri Mar 18 08:51:21 2011 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > node1 1 Online, Local > node2 2 Online > /dev/dm-11 0 Online, Quorum Disk > > > node2 ~ # clustat > Cluster Status for CL_XY @ Fri Mar 18 08:53:33 2011 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > node1 1 Online, RG-Worker > node2 2 Online, Local, RG-Master > /dev/dm-11 0 Online, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:1 node1 started > service:2 node1 started > service:3 node2 started > service:4 node2 started > > Furthermore the LVM command does not successfully execute. The commands just > hang, but it was possible to kill the processes by pressing ctrl+c. > > The exact same issue happend a month ago on this system with RHEL5.5. In the > meantime the cluster was updated to RHEL5.6. Debugging for RGManager and cman > is now activated in the /etc/cluster/cluster.conf file. > > > Version-Release number of selected component (if applicable): > > kernel-2.6.18-238.5.1.el5.x86_64 > cman-2.0.115-68.el5_6.1.x86_64 > rgmanager-2.0.52-9.el5.x86_64 > > How reproducible: > not reproducible > > Steps to Reproduce: > 1. > 2. > 3. > > Actual results: > RGManager does stop writing logs. > Command clustat is not working for 1 node. > LVM commands stop working. > > Expected results: > > > Additional info: > The cluster services include mounted disks. > For this setup a LVM HA configuration is used (RHEL5.6).
Looks like this slipped through the cracks. clustat is just a victim here; if the cluster is locked up (e.g. fencing causing a problem or clvmd causing a problem), the errors will bubble up to the top. That lvm commands were hanging indicates that this is not an rgmanager issue.
If anyone sees this again with the current versions of the packages, then please reopen this and attach relevant LVM diagnostics to this bug.