Bug 228823
Summary: | service permanently at stopping state | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Tomasz Jaszowski <tjaszowski> | ||||
Component: | clumanager | Assignee: | Lon Hohberger <lhh> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4 | CC: | cluster-maint, michael.hagmann | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHBA-2007-0149 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-05-10 21:20:30 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Tomasz Jaszowski
2007-02-15 12:01:29 UTC
clustat -x | grep s1 <group name="s1" state="113" state_str="stopping" owner="t1" last_owner="t1" restarts="0" last_transition="1171311768" last_transition_str="Mon Feb 12 21:22:48 2007"/> and nothing changed since then The effect here (unable to stop) should be easy to fix (which I will). However, the cause is more interesting to fix. Both are bugs anyway. Do you have any specific reproducible steps to get a service into the 'stopping' state? for now, we do not have any idea how to reproduce this problem (but we will try). Do You have any fast solution, how to tell cluster that those services are really stopped (rebooting all nodes/stopping all cluster software is not an option) Created attachment 148193 [details]
Allows disable requests from 'stopping' state
This does not fix the cause, but it should allow a user to disable a service
which was stuck in the 'stopping' state.
ok, and if i wouldn't like to recompile modules now, is there any way to tell cluster that those services are stopped? maybe some signal to send to rgmanager, or writing something to /proc/cluster/ ? It's not a kernel (or kernel module) patch; it's a patch against the rgmanager source RPM. You can build rgmanager with the patch and do a rolling upgrade (i.e. upgrade one node at a time, and restart rgmanager). The first node you should upgrade is 't1', since it's the one that needs to clear the 'stopping' state. Alternatively, you can stop all instances of rgmanager (cluster-wide), then start them all - and that should clear the 'stopping' state. This is a sub-optimal solution, of course. If you want, I can rebuild 1.9.54 with the patch for you, but it's not a complete solution (i.e. it doesn't fix the _cause_; it lets you fix the symptom), so I was trying to avoid the intermediate step. I had small time slot just for me, so I decided to restart all instances of rgmanager. As we thought it helped. For now it's working, but I'll try to find more useful logs to trace the problem. If You can rebuild rgmanager, please do it. I'll use it if problem occurs again. Ok. So, the fix for the symptom will be available in the next update; you will be able to just do 'clusvcadm -d <service>' and it will clear the state. I'll have packages shortly. http://people.redhat.com/lhh/rgmanager-1.9.54-3.228823test.src.rpm http://people.redhat.com/lhh/rgmanager-1.9.54-3.228823test.i386.rpm http://people.redhat.com/lhh/rgmanager-1.9.54-3.228823test.x86_64.rpm An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0149.html |