Bug 236580
| Summary: | [HA LVM]: Bringing site back on-line after failure causes problems | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Retired] Red Hat Cluster Suite | Reporter: | Jonathan Earl Brassow <jbrassow> | ||||
| Component: | rgmanager | Assignee: | Jonathan Earl Brassow <jbrassow> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 4 | CC: | cfeist, cluster-maint, lhh | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2009-02-05 00:19:57 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Jonathan Earl Brassow
2007-04-16 15:31:48 UTC
I'm seeing something very different from you, but it may be worth trying with the changes I've made. Here's what I see: Site fail-over works fine. If I reactivate the failed site (including the storage device), when the service tries to move back, it fails to activate due to a conflict it sees in the available devices. [The failed device has now come back - leaving a LVM metadata conflict.] This leaves the service in the 'failed' state. Here's what I've done. I've added some code to determine what the valid devices are, and use those and only those devices when activating. This solved the problem for me. You will need to ensure that this works fine with your multipath setup. I don't think there should be issues in that regard, but I don't want to guess. This may not be the issue you are seeing, but the bug I found could certainly cause similar problems. Be sure that you have the latest updates. I've attached the lvm.sh file to be placed in /usr/share/cluster on all the machines. When we've gone through a few successful iterations of testing we will be sure to commit the changes. Created attachment 152701 [details]
lvm.sh script with bad device exclusion
bad device exclusion script with minor changes checked-in assigned -> post Another concern I have in the user's implementation is the initrd. The initrd should (must) contain the correctly modified lvm.conf This has been built and is in the current RHEL4 release of rgmanager. |