Bug 619680
| Summary: | cluster should not sync old cluster.conf during conf reload | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Fabio Massimo Di Nitto <fdinitto> | ||||||
| Component: | cluster | Assignee: | Fabio Massimo Di Nitto <fdinitto> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 6.0 | CC: | ccaulfie, cluster-maint, djansa, lhh, rpeterso, ssaha, syeghiay, teigland | ||||||
| Target Milestone: | rc | Keywords: | RHELNAK | ||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | cluster-3.0.12-21.el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | 617161 | ||||||||
| : | 625560 (view as bug list) | Environment: | |||||||
| Last Closed: | 2010-11-10 20:00:29 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 617161 | ||||||||
| Bug Blocks: | 625560 | ||||||||
| Attachments: |
|
||||||||
|
Comment 3
RHEL Program Management
2010-07-30 07:27:34 UTC
This isn't a bug, ccs_sync works exactly as defined. It distributes the cluster.conf file that you specify to all the nodes (no matter what's in that cluster.conf file). If there is a check, it should be done in cman_tool. Also, cman_tool probably shouldn't be propagating the files if it can't load them into corosync. It may make more sense to only propagate the files after they've been successfully loaded into corosync, (or include an option to force propagation). Created attachment 436640 [details]
Makes cman_tool fail on configuration regressions.
See patch for more explanation.
I believe the synopsis of this bug is incorrect. I do not believe this is a bug in ccs_sync at all. ccs_sync is "stupid" and it's job is not to safeguard the user; plenty of other tools do it already: cman_tool, luci, even cman itself. cman_tool calls out to ccs_config_validate, which the patch noted in comment #11 can be overloaded to also check the configuration version #. Since cman_tool already knows the current version # when we call ccs_config_validate, it's simply a matter of passing it in. Comment on attachment 436640 [details] Makes cman_tool fail on configuration regressions. The patch looks functionally OK, comments inline: >+ result = validate_config(comline, ver.cv_config); >+ if (result == 2) >+ /* Config regression = fail. */ >+ die("Not reloading, configuration version regression\n"); I would prefer a more explicit error message here rather than "regression". "New config version is older or the same as currently running.." or something. >diff --git a/config/tools/xml/ccs_config_validate.in b/config/tools/xml/ccs_config_validate.in >index 55ab4aa..08bae37 100644 >--- a/config/tools/xml/ccs_config_validate.in >+++ b/config/tools/xml/ccs_config_validate.in >@@ -17,6 +17,15 @@ if [ -z "$COROSYNC_DEFAULT_CONFIG_IFACE" ]; then > export COROSYNC_DEFAULT_CONFIG_IFACE=$CONFIG_LOADER:cmanpreconfig > fi > >+get_config_version() { >+ local file=$1 >+ >+ echo "xpath /cluster/@config_version" | \ >+ xmllint --shell "$file" | \ >+ grep content | \ >+ cut -f2 -d= >+} I personally don't like this style, but I'll look if i can find a better way to avoid forking that many commands for one op. >@@ -181,6 +196,16 @@ if [ -z "$quiet" ] || [ "$res" != "0" ]; then > -e 's#'$tempfile'#tempfile#g' > fi > >+# Configuration regression = 2 >+if [ -n "$old_version" ] && [ $old_version -ne 0 ] && [ $res -eq 0 ]; then >+ new_version=$(get_config_version $tempfile) >+ lecho "Old version: $old_version New version: $new_version" >+ if [ $new_version -le $old_version ]; then >+ echo "Warning: Configuration version regression!" >+ res=2 >+ fi >+fi AFAIR we try to avoid using -ne/-eq for some portability issues. No biggie again. just minor detail. Created attachment 436792 [details]
proposed patch
this is a new proposed fix. diff from the previous patch:
ccs_config_validates.in:
- change return code from 2 to 254 because we don´t want to overlap with
xmllint return codes.
- check if we can get new_version and if not return 253
- return 254 if new_version is <= old_config
- send errors from version check to stderr
- check the version before printing the xml validation result otherwise
the output can be confusing:
* Configuration validates
* Error: Configuration version is older than running config!
doesn´t really make sense.
cman_tool:
- propagate the quiet option correctly, so that in case a config reload error
can be properly debugged
- return WXITSTATUS from validate_config so we get the real error to parse
- parse the current 3 possible errors from ccs_config_validate to print a proper
error message.
Comment on attachment 436792 [details]
proposed patch
Signed-off-by: Lon Hohberger <lhh>
http://git.fedorahosted.org/git/?p=cluster.git;a=commitdiff;h=9cd607e41482b408aedb1afb53d56f5b05d05d7a upstream commit http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=3ac0bdfccc5c8b6883e3c331dfa3fca7eeb1b4c5 Corresponding man page update. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |