Bug 592103
Summary: | it can take up to 30 seconds after `service cman stop` for corosync to exit | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Nate Straz <nstraz> | ||||
Component: | cluster | Assignee: | Fabio Massimo Di Nitto <fdinitto> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 6.0 | CC: | ccaulfie, cluster-maint, lhh, rpeterso, teigland | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | cluster-3.0.12-2.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-07-02 18:55:09 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Nate Straz
2010-05-13 20:49:43 UTC
Sorry I don´t understand what this test case is supposed to check. Services can take time to stop. If you check corosync after 30 seconds, is it still running? is it hanging? Also please note the error you have in the "good log". It´s possible that fenced didn´t have time to join the group and that you did issue a shutdown command (that went in faster), while in the second one, fenced did take more time to allow corosync/cman to shutdown. Is there a specific restriction that they need to stop within a certain amount of time? Generally speaking, an init script will wait for the service it governs to completely shutdown before exiting. This test case verifies that. It also verifies that it can be repeated and no components fail when it is repeated. Please also provide the other info I requested in respect of corosync exiting later on so on. Those are relevant as I have never seen this behavior in my start/stop loop tests. corosync is not hanging, it just takes extra time to shut down. After 30 seconds, `service cman status` will report "corosync is stopped." Whiplash runs in parallel the following pseudo code while iterating: foreach node: service cman start foreach node: service cman status foreach node: service cman stop foreach node: `service cman status` == "corosync is stopped" This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Created attachment 413941 [details]
proposed patch
Dean, can you please give it a spin with this patch to /etc/init.d/cman ? Thanks fabio That patch allows the test to complete, but I'm still concerned. Why does it sometimes take corosync 30 seconds to unload on one node? thanks for testing. I suggest you file a separate bug for corosync and request info to their team. git commit 67aa7e900364b651aaf0a8b89acbd8157aa551cf on RHEL6 branch. Made it through whiplash with cman-3.0.12-6.el6. Red Hat Enterprise Linux Beta 2 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |