Bug 619680

Summary: cluster should not sync old cluster.conf during conf reload
Product: Red Hat Enterprise Linux 6 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: clusterAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: ccaulfie, cluster-maint, djansa, lhh, rpeterso, ssaha, syeghiay, teigland
Target Milestone: rcKeywords: RHELNAK
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: cluster-3.0.12-21.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 617161
: 625560 (view as bug list) Environment:
Last Closed: 2010-11-10 20:00:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 617161    
Bug Blocks: 625560    
Attachments:
Description Flags
Makes cman_tool fail on configuration regressions.
fdinitto: review+
proposed patch lhh: review+

Comment 3 RHEL Program Management 2010-07-30 07:27:34 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 7 Chris Feist 2010-08-03 21:46:44 UTC
This isn't a bug, ccs_sync works exactly as defined.  It distributes the cluster.conf file that you specify to all the nodes (no matter what's in that cluster.conf file).

If there is a check, it should be done in cman_tool.  Also, cman_tool probably shouldn't be propagating the files if it can't load them into corosync. It may make more sense to only propagate the files after they've been successfully loaded into corosync, (or include an option to force propagation).

Comment 11 Lon Hohberger 2010-08-04 19:18:37 UTC
Created attachment 436640 [details]
Makes cman_tool fail on configuration regressions.

See patch for more explanation.

Comment 12 Lon Hohberger 2010-08-04 19:22:08 UTC
I believe the synopsis of this bug is incorrect.  I do not believe this is a bug in ccs_sync at all.

ccs_sync is "stupid" and it's job is not to safeguard the user; plenty of other tools do it already: cman_tool, luci, even cman itself.

cman_tool calls out to ccs_config_validate, which the patch noted in comment #11 can be overloaded to also check the configuration version #.

Since cman_tool already knows the current version # when we call ccs_config_validate, it's simply a matter of passing it in.

Comment 13 Fabio Massimo Di Nitto 2010-08-05 03:20:38 UTC
Comment on attachment 436640 [details]
Makes cman_tool fail on configuration regressions.

The patch looks functionally OK, comments inline:


>+		result = validate_config(comline, ver.cv_config);
>+		if (result == 2)
>+			/* Config regression = fail. */
>+			die("Not reloading, configuration version regression\n");

I would prefer a more explicit error message here rather than "regression".
"New config version is older or the same as currently running.." or something.

>diff --git a/config/tools/xml/ccs_config_validate.in b/config/tools/xml/ccs_config_validate.in
>index 55ab4aa..08bae37 100644
>--- a/config/tools/xml/ccs_config_validate.in
>+++ b/config/tools/xml/ccs_config_validate.in
>@@ -17,6 +17,15 @@ if [ -z "$COROSYNC_DEFAULT_CONFIG_IFACE" ]; then
> 	export COROSYNC_DEFAULT_CONFIG_IFACE=$CONFIG_LOADER:cmanpreconfig
> fi
> 
>+get_config_version() {
>+	local file=$1
>+
>+	echo "xpath /cluster/@config_version" | \
>+		xmllint --shell "$file" | \
>+		grep content | \
>+	cut -f2 -d=
>+}

I personally don't like this style, but I'll look if i can find
a better way to avoid forking that many commands for one op.

>@@ -181,6 +196,16 @@ if [ -z "$quiet" ] || [ "$res" != "0" ]; then
> 		-e 's#'$tempfile'#tempfile#g'
> fi
> 
>+# Configuration regression = 2
>+if [ -n "$old_version" ] && [ $old_version -ne 0 ] && [ $res -eq 0 ]; then
>+	new_version=$(get_config_version $tempfile)
>+	lecho "Old version: $old_version   New version: $new_version"
>+	if [ $new_version -le $old_version ]; then
>+		echo "Warning: Configuration version regression!"
>+		res=2
>+	fi
>+fi

AFAIR we try to avoid using -ne/-eq for some portability issues.
No biggie again. just minor detail.

Comment 14 Fabio Massimo Di Nitto 2010-08-05 10:15:30 UTC
Created attachment 436792 [details]
proposed patch

this is a new proposed fix. diff from the previous patch:

ccs_config_validates.in:

- change return code from 2 to 254 because we don´t want to overlap with
  xmllint return codes.
- check if we can get new_version and if not return 253
- return 254 if new_version is <= old_config
- send errors from version check to stderr
- check the version before printing the xml validation result otherwise
  the output can be confusing:
* Configuration validates
* Error: Configuration version is older than running config!
doesn´t really make sense.

cman_tool:

- propagate the quiet option correctly, so that in case a config reload error
  can be properly debugged
- return WXITSTATUS from validate_config so we get the real error to parse
- parse the current 3 possible errors from ccs_config_validate to print a proper
  error message.

Comment 15 Lon Hohberger 2010-08-05 13:38:19 UTC
Comment on attachment 436792 [details]
proposed patch

Signed-off-by: Lon Hohberger <lhh>

Comment 17 Lon Hohberger 2010-08-05 15:53:00 UTC
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=3ac0bdfccc5c8b6883e3c331dfa3fca7eeb1b4c5

Corresponding man page update.

Comment 21 releng-rhel@redhat.com 2010-11-10 20:00:29 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.