Bug 1251525
Summary: | galera ocf agent fail fast if sync fails during promote | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | David Vossel <dvossel> | |
Component: | resource-agents | Assignee: | Damien Ciabrini <dciabrin> | |
Status: | CLOSED WONTFIX | QA Contact: | Ofer Blaut <oblaut> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 7.2 | CC: | agk, cfeist, cluster-maint, dciabrin, fahmed, fdinitto, jmelvin, lmiksik, mbayer, oalbrigt, pzimek, royoung | |
Target Milestone: | rc | Keywords: | Triaged, ZStream | |
Target Release: | 7.7 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1299878 (view as bug list) | Environment: | ||
Last Closed: | 2019-02-22 08:50:02 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1299878 |
Description
David Vossel
2015-08-07 15:04:13 UTC
please add reproduce steps After further testing, the fix referenced in comment 7 does not solve the issue. The new way of tracking sync can timeout during monitor operation. Patch removed from newer 7.3 builds. Set to POST when new patch is ready for a build. *** Bug 1372616 has been marked as a duplicate of this bug. *** *** Bug 1376084 has been marked as a duplicate of this bug. *** An update here for tracking purpose... Throughout the many iterations of this bug fix, we ended up having a working fix, bug the drawback is that the galera resource agent becomes more complex because it has to track new states via a couple of additional crm attributes. Now meanwhile, the urgency of that bugzilla has lowered because nowadays on OpenStack, Keystone uses Fernet tokens, which means that the amount of data stored on the database has become really stable is we don't run anymore into situation where a missing DB cleanup would make the DB grow unbounded. So to summarize, this bz can be fixed but given the time constraint and the priority it won't be fixed in the short term. so I'm keeping it a little longer for tracking purpose. So a long overdue update on that one. As for context, in earlier version of OpenStack we used to very large or ever-growing mysql databases. During the promote operation, the entire DB could be sync'ed over rsync, which would sometime exceed the configured promote timeout operation. The approach for fixing it consisted in running different Slave monitor operations to track the rsync in a recurring fashion. But that yielded a considerable increase in complexity (additional attributes, many monitor operations, more tests...). Meanwhile in OpenStack we switched to what we called fernet tokens, which essentially means we now have a pretty bounded, small database to deal with. So this development is not needed anymore. Balancing the complexity of the code with the real need for it (we don't need it anymore for OpenStack), I think I'm going to drop that bz instead of letting it slip indefinitely. |