Bug 228823

Summary:

service permanently at stopping state

Product:

[Retired] Red Hat Cluster Suite

Reporter:

Tomasz Jaszowski <tjaszowski>

Component:

clumanager

Assignee:

Lon Hohberger <lhh>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

high

Docs Contact:

Priority:

medium

Version:

CC:

cluster-maint, michael.hagmann

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

RHBA-2007-0149

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-05-10 21:20:30 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Allows disable requests from 'stopping' state	none

Description Tomasz Jaszowski 2007-02-15 12:01:29 UTC

Description of problem:
after tests (stopping services, restarting nides, etc) some of cluster services
have status stopping

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:
service is in state stopping, we can't make it stop/disable/start. We tried
disabling stopping and starting service, restarting rgmanager - no effect.

Expected results:
we would like to stop/disable this service, or to tell cluster that those
services are really stop, so we could do something more, like start this service
again

Additional info:
from the system point of view service is stopped - no process, no mounted
resources, etc. Those service can be started only on one node - it's restricted
by failover domain containing only one node, and recovery method -restart

Comment 1 Tomasz Jaszowski 2007-02-15 12:09:46 UTC

clustat -x | grep s1
    <group name="s1" state="113" state_str="stopping"  owner="t1"
last_owner="t1" restarts="0" last_transition="1171311768"
last_transition_str="Mon Feb 12 21:22:48 2007"/>

and nothing changed since then

Comment 2 Lon Hohberger 2007-02-15 17:53:27 UTC

The effect here (unable to stop) should be easy to fix (which I will).  However,
the cause is more interesting to fix.  Both are bugs anyway.

Do you have any specific reproducible steps to get a service into the 'stopping'
state?

Comment 3 Tomasz Jaszowski 2007-02-16 10:40:25 UTC

for now, we do not have any idea how to reproduce this problem (but we will try). 

Do You have any fast solution, how to tell cluster that those services are
really stopped (rebooting all nodes/stopping all cluster software is not an option)

Comment 4 Lon Hohberger 2007-02-16 13:46:33 UTC

Created attachment 148193 [details]
Allows disable requests from 'stopping' state

This does not fix the cause, but it should allow a user to disable a service
which was stuck in the 'stopping' state.

Comment 5 Tomasz Jaszowski 2007-02-19 11:22:33 UTC

ok, and if i wouldn't like to recompile modules now, is there any way to tell
cluster that those services are stopped? maybe some signal to send to rgmanager,
or writing something to /proc/cluster/ ?

Comment 6 Lon Hohberger 2007-02-19 16:38:54 UTC

It's not a kernel (or kernel module) patch; it's a patch against the rgmanager
source RPM.  You can build rgmanager with the patch and do a rolling upgrade
(i.e. upgrade one node at a time, and restart rgmanager).

The first node you should upgrade is 't1', since it's the one that needs to
clear the 'stopping' state.

Alternatively, you can stop all instances of rgmanager (cluster-wide), then
start them all - and that should clear the 'stopping' state.  This is a
sub-optimal solution, of course.

Comment 7 Lon Hohberger 2007-02-19 16:41:43 UTC

If you want, I can rebuild 1.9.54 with the patch for you, but it's not a
complete solution (i.e. it doesn't fix the _cause_; it lets you fix the
symptom), so I was trying to avoid the intermediate step.

Comment 8 Tomasz Jaszowski 2007-02-19 16:53:51 UTC

I had small time slot just for me, so I decided to restart all instances of
rgmanager. As we thought it helped.

For now it's working, but I'll try to find more useful logs to trace the problem.

If You can rebuild rgmanager, please do it. I'll use it if problem occurs again.

Comment 9 Lon Hohberger 2007-02-20 18:02:26 UTC

Ok.  So, the fix for the symptom will be available in the next update; you will
be able to just do 'clusvcadm -d <service>' and it will clear the state.

I'll have packages shortly.

Comment 10 Lon Hohberger 2007-02-20 19:46:52 UTC

http://people.redhat.com/lhh/rgmanager-1.9.54-3.228823test.src.rpm
http://people.redhat.com/lhh/rgmanager-1.9.54-3.228823test.i386.rpm
http://people.redhat.com/lhh/rgmanager-1.9.54-3.228823test.x86_64.rpm

Comment 13 Red Hat Bugzilla 2007-05-10 21:20:30 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0149.html