Bug 245381
Summary: | [RFE] Restart counters before a switch to relocate. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Charlie Wyse <cwyse> | ||||||
Component: | rgmanager | Assignee: | Lon Hohberger <lhh> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4 | CC: | cluster-maint | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHBA-2008-0791 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-07-25 19:15:09 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Charlie Wyse
2007-06-22 18:54:21 UTC
The restarts themselves are not tracked currently in rgmanager; that is, a restart itself is not recorded long-term; it is handled and never worried about again. In order to implement a 'time-based' limit on X restarts, we would either need to store more information in VF (such as an ancillary data block to record restart histories), store the information locally (other nodes shouldn't care about this information - since they're not running the service), or alter the semantics of how parts of the rg_state_t structure are used: typedef struct { char rs_name[64]; /**< Service name */ uint64_t rs_owner; /**< Member ID running service. */ uint64_t rs_last_owner; /**< Last member to run the service. */ uint32_t rs_state; /**< State of service. */ uint32_t rs_restarts; /**< Number of cluster-induced restarts */ uint64_t rs_transition; /**< Last service transition time */ uint32_t rs_id; /**< Service ID */ uint32_t rs_pad; /**< pad to 64-bit boundary */ } rg_state_t; (and utilize the rs_pad field for something...). Basically, changing the size of the above structure can not be done - it will break rolling upgrade to do so. With a node-local recording of cluster-induced restarts, it is very easy to throttle restarts based on X in Y time. Created attachment 162009 [details]
Patch. pass 1.
Created attachment 162010 [details]
Pass 2; adds support to the resource-agent so that it's picked up via the config
Note: does not do time-based throttling; only a hard limit.
The rs_id and rs_pad fields are not used by rgmanager. We could use these fields as a "first-start" time. (It's not even endian-swapped in reslist.h) Pushed to RHEL4 git branch An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0791.html |