Bug 1591045
Summary: | crm_failcount treats INFINITY as zero | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Reid Wahl <nwahl> |
Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
Status: | CLOSED ERRATA | QA Contact: | Patrik Hagara <phagara> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.5 | CC: | abeekhof, cluster-maint, mnovacek, phagara |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | 7.6 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | pacemaker-1.1.19-1.el7 | Doc Type: | No Doc Update |
Doc Text: |
Most users will use pcs resource failcount rather than crm_failcount directly, and this issue does not affect them.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-10-30 07:58:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Reid Wahl
2018-06-13 23:45:35 UTC
The issue begins earlier than the addition loop. The sed expression in line 114 only parses integer values ([0-9][0-9]*) and ignores strings like "INFINITY". Then the addition loop is not equipped to handle "INFINITY" either. 112 # Extract the attribute values (one per line) from the output 113 QAS_VALUE=$(echo "$QAS_ALL" | sed -n -e \ 114 's/.*<nvpair.*value="\([0-9][0-9]*\)".*>.*/\1/p') 115 116 # Sum the values 117 QAS_SUM=0 118 for i in 0 $QAS_VALUE; do 119 QAS_SUM=$(($QAS_SUM + $i)) 120 done 121 echo $QAS_SUM We could either add "INFINITY" to the match group in the sed expression, or simply stop restricting it to digits. Is there any reason we should restrict it to digits? -------------------- Here is a patch that adds "INFINITY", if we take that approach. # diff -u /usr/sbin/crm_failcount{,.bak} --- /usr/sbin/crm_failcount 2018-06-13 19:55:09.549196483 -0400 +++ /usr/sbin/crm_failcount.bak 2018-04-23 10:04:27.000000000 -0400 @@ -110,22 +110,13 @@ #fi # Extract the attribute values (one per line) from the output - #QAS_VALUE=$(echo "$QAS_ALL" | sed -n -e \ - #'s/.*<nvpair.*value="\([0-9][0-9]*\)".*>.*/\1/p') QAS_VALUE=$(echo "$QAS_ALL" | sed -n -e \ - 's/.*<nvpair.*value="\([0-9][0-9]*\|INFINITY\)".*>.*/\1/p') - - # echo "$QAS_VALUE" + 's/.*<nvpair.*value="\([0-9][0-9]*\)".*>.*/\1/p') # Sum the values QAS_SUM=0 for i in 0 $QAS_VALUE; do - if [ "$i" = "INFINITY" ]; then - QAS_SUM="INFINITY" - break - else - QAS_SUM=$(($QAS_SUM + $i)) - fi + QAS_SUM=$(($QAS_SUM + $i)) done echo $QAS_SUM } We could even regex-search for "INFINITY" in "$QAS_VALUE" instead of iterating through the values until we hit "INFINITY", but the above approach seems more straightforward. -------------------- Using this new version, we get the following results: # pcs cluster cib | grep fail-count-dummy <nvpair id="status-1-fail-count-dummy.monitor_10000" name="fail-count-dummy#monitor_10000" value="8"/> <nvpair id="status-1-fail-count-dummy.start_0" name="fail-count-dummy#start_0" value="INFINITY"/> # crm_failcount -G -r dummy -n monitor -I 10000ms scope=status name=fail-count-dummy#monitor_10000 value=8 # crm_failcount -G -r dummy -n start scope=status name=fail-count-dummy#start_0 value=INFINITY # crm_failcount -G -r dummy scope=status name=fail-count-dummy value=INFINITY Thanks for the discovery, and the patch! This is a regression in RHEL 7.5 (via upstream release 1.1.17 commit 83236161). QA: Reproducer is in Description. Instead of making the operations fail in actuality, for these purposes it would be sufficient to simulate failures by adding fail-count attributes directly, e.g. attrd_updater -U 8 -N ${NODE_NAME} -n fail-count-${RSC_NAME}#monitor_${INTERVAL_MS} attrd_updater -U INFINITY -N ${NODE_NAME} -n fail-count-${RSC_NAME}#start_0 (Such simulated failures won't show up in status but that is not important for this bz.) (In reply to Reid Wahl from comment #2) > Here is a patch that adds "INFINITY", if we take that approach. > > # diff -u /usr/sbin/crm_failcount{,.bak} > --- /usr/sbin/crm_failcount 2018-06-13 19:55:09.549196483 -0400 > +++ /usr/sbin/crm_failcount.bak 2018-04-23 10:04:27.000000000 -0400 > @@ -110,22 +110,13 @@ > #fi > > # Extract the attribute values (one per line) from the output > - #QAS_VALUE=$(echo "$QAS_ALL" | sed -n -e \ > - #'s/.*<nvpair.*value="\([0-9][0-9]*\)".*>.*/\1/p') > QAS_VALUE=$(echo "$QAS_ALL" | sed -n -e \ > - 's/.*<nvpair.*value="\([0-9][0-9]*\|INFINITY\)".*>.*/\1/p') > - > - # echo "$QAS_VALUE" > + 's/.*<nvpair.*value="\([0-9][0-9]*\)".*>.*/\1/p') > > # Sum the values > QAS_SUM=0 > for i in 0 $QAS_VALUE; do > - if [ "$i" = "INFINITY" ]; then > - QAS_SUM="INFINITY" > - break > - else > - QAS_SUM=$(($QAS_SUM + $i)) > - fi > + QAS_SUM=$(($QAS_SUM + $i)) > done > echo $QAS_SUM > } > Forgot to remove commented lines before running diff ;) # diff -u /usr/sbin/crm_failcount{,.bak} --- /usr/sbin/crm_failcount 2018-06-14 12:34:06.189587246 -0400 +++ /usr/sbin/crm_failcount.bak 2018-04-23 10:04:27.000000000 -0400 @@ -111,17 +111,12 @@ # Extract the attribute values (one per line) from the output QAS_VALUE=$(echo "$QAS_ALL" | sed -n -e \ - 's/.*<nvpair.*value="\([0-9][0-9]*\|INFINITY\)".*>.*/\1/p') + 's/.*<nvpair.*value="\([0-9][0-9]*\)".*>.*/\1/p') # Sum the values QAS_SUM=0 for i in 0 $QAS_VALUE; do - if [ "$i" = "INFINITY" ]; then - QAS_SUM="INFINITY" - break - else - QAS_SUM=$(($QAS_SUM + $i)) - fi + QAS_SUM=$(($QAS_SUM + $i)) done echo $QAS_SUM } Third time's the charm. Time for coffee. --- /usr/sbin/crm_failcount.bak 2018-04-23 10:04:27.000000000 -0400 +++ /usr/sbin/crm_failcount 2018-06-14 12:34:06.189587246 -0400 @@ -111,12 +111,17 @@ # Extract the attribute values (one per line) from the output QAS_VALUE=$(echo "$QAS_ALL" | sed -n -e \ - 's/.*<nvpair.*value="\([0-9][0-9]*\)".*>.*/\1/p') + 's/.*<nvpair.*value="\([0-9][0-9]*\|INFINITY\)".*>.*/\1/p') # Sum the values QAS_SUM=0 for i in 0 $QAS_VALUE; do - QAS_SUM=$(($QAS_SUM + $i)) + if [ "$i" = "INFINITY" ]; then + QAS_SUM="INFINITY" + break + else + QAS_SUM=$(($QAS_SUM + $i)) + fi done echo $QAS_SUM } Fixed upstream as of commit 8567726d before: ======= > [root@virt-122 ~]# rpm -q pacemaker > pacemaker-1.1.18-11.el7.x86_64 > [root@virt-122 ~]# pcs resource create dummy ocf:pacemaker:Dummy op monitor interval=30s > [root@virt-122 ~]# pcs resource disable dummy --wait > resource 'dummy' is not running on any node > [root@virt-122 ~]# pcs status > Cluster name: bzzt > Stack: corosync > Current DC: virt-122.cluster-qe.lab.eng.brq.redhat.com (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum > Last updated: Mon Jul 30 14:57:27 2018 > Last change: Mon Jul 30 14:57:23 2018 by root via cibadmin on virt-122.cluster-qe.lab.eng.brq.redhat.com > > 3 nodes configured > 1 resource configured (1 DISABLED) > > Online: [ virt-122.cluster-qe.lab.eng.brq.redhat.com virt-123.cluster-qe.lab.eng.brq.redhat.com virt-127.cluster-qe.lab.eng.brq.redhat.com ] > > Full list of resources: > > dummy (ocf::pacemaker:Dummy): Stopped (disabled) > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > [root@virt-122 ~]# attrd_updater -U 8 -N virt-127.cluster-qe.lab.eng.brq.redhat.com -n 'fail-count-dummy#monitor_30000' > [root@virt-122 ~]# attrd_updater -U INFINITY -N virt-127.cluster-qe.lab.eng.brq.redhat.com -n 'fail-count-dummy#start_0' > [root@virt-122 ~]# crm_failcount -G -r dummy -N virt-127.cluster-qe.lab.eng.brq.redhat.com -n monitor -I 30000ms > scope=status name=fail-count-dummy#monitor_30000 value=8 > [root@virt-122 ~]# crm_failcount -G -r dummy -N virt-127.cluster-qe.lab.eng.brq.redhat.com -n start > scope=status name=fail-count-dummy#start_0 value=INFINITY > [root@virt-122 ~]# crm_failcount -G -r dummy -N virt-127.cluster-qe.lab.eng.brq.redhat.com > scope=status name=fail-count-dummy value=8 Total failcount (last command) is equal to 8 instead of INFINITY, which is incorrect. after: ====== > [root@virt-148 ~]# rpm -q pacemaker > pacemaker-1.1.19-4.el7.x86_64 > [root@virt-148 ~]# pcs resource create dummy ocf:pacemaker:Dummy op monitor interval=30s > [root@virt-148 ~]# pcs resource disable dummy --wait > resource 'dummy' is not running on any node > [root@virt-148 ~]# pcs status > Cluster name: bzzt > Stack: corosync > Current DC: virt-150.cluster-qe.lab.eng.brq.redhat.com (version 1.1.19-4.el7-c3c624ea3d) - partition with quorum > Last updated: Mon Jul 30 15:56:58 2018 > Last change: Mon Jul 30 15:56:50 2018 by root via cibadmin on virt-148.cluster-qe.lab.eng.brq.redhat.com > > 3 nodes configured > 1 resource configured (1 DISABLED) > > Online: [ virt-148.cluster-qe.lab.eng.brq.redhat.com virt-149.cluster-qe.lab.eng.brq.redhat.com virt-150.cluster-qe.lab.eng.brq.redhat.com ] > > Full list of resources: > > dummy (ocf::pacemaker:Dummy): Stopped (disabled) > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > [root@virt-148 ~]# attrd_updater -U 8 -N virt-149.cluster-qe.lab.eng.brq.redhat.com -n 'fail-count-dummy#monitor_30000' > [root@virt-148 ~]# attrd_updater -U INFINITY -N virt-149.cluster-qe.lab.eng.brq.redhat.com -n 'fail-count-dummy#start_0' > [root@virt-148 ~]# crm_failcount -G -r dummy -N virt-149.cluster-qe.lab.eng.brq.redhat.com -n monitor -I 30000ms > scope=status name=fail-count-dummy#monitor_30000 value=8 > [root@virt-148 ~]# crm_failcount -G -r dummy -N virt-149.cluster-qe.lab.eng.brq.redhat.com -n start > scope=status name=fail-count-dummy#start_0 value=INFINITY > [root@virt-148 ~]# crm_failcount -G -r dummy -N virt-149.cluster-qe.lab.eng.brq.redhat.com > scope=status name=fail-count-dummy value=INFINITY Total failcount is now correctly evaluated as 8 + INFINITY = INFINITY. Marking verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3055 |