Bug 1108708

Summary: Config parser: Properly check integer values
Product: Red Hat Enterprise Linux 7 Reporter: Jan Friesse <jfriesse>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: ccaulfie, cluster-maint, jkortus
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: corosync-2.3.3-3.el7 Doc Type: Bug Fix
Doc Text:
Cause: User sets integer option in corosync.conf to value out of boundary (< 0 or (usually) > 2^32 - 1) Consequence: Corosync doesn't show error message. Integer is parsed and overflows. Fix: Properly check integer boundaries. Result: Corosync show error during configuration file parsing (error contains allowed integer range)
Story Points: ---
Clone Of:
: 1167390 (view as bug list) Environment:
Last Closed: 2015-03-05 08:27:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1167390    
Attachments:
Description Flags
coroparse: More strict numbers parsing none

Description Jan Friesse 2014-06-12 12:58:54 UTC
Description of problem:
Corosync doesn't properly check integers in corosync.conf for their range. It also doesn't allow enter 32-bit integer with highest bit set on 32-bit system.

Version-Release number of selected component (if applicable):
2.3.3

How reproducible:
100%

Steps to Reproduce and results:
* corosync.conf: token: -1 -> 4294967295 (no error shown)
* token: 4294967296 -> overflow -> 0 (no error shown)
* It's impossible to enter (perfectly valid) nodeid > 2^31 - 1 on 32-bit systems

Expected results:
Corosync show error on invalid integer value (check boundaries) ideally with allowed range. It's possible to use integer > 2^31 - 1 on 32-bit systems (not subject of QA. We don't support 32-bit systems any longer).

Comment 1 Jan Friesse 2014-06-12 13:00:48 UTC
Created attachment 908110 [details]
coroparse: More strict numbers parsing

coroparse: More strict numbers parsing

Previous safe_atoi didn't check range of input values so if for example
user used -1 s token timeout, it was converted to UINT32_MAX without
letting user know.

Another safe_atoi problem was using strtol. This works pretty well on
64-bit systems, where long integer is usually 64-bits long, sadly on
32-bit systems, it is usually 32-bit long. And because strtol returns
signed integer, it was not possible to enter 32-bit value with highest
bit set.

Solution is to use strtoll which is guaranteed to be at least 64-bits
long and check value range.

Also error message now contains also information about expected value
range.

Signed-off-by: Jan Friesse <jfriesse>
Reviewed-by: Christine Caulfield <ccaulfie>

Comment 2 Jan Friesse 2014-06-12 13:04:35 UTC
For QE: "Unit" test is csts config-strict-parse-numbers.sh

Comment 5 Jaroslav Kortus 2014-10-30 13:37:09 UTC
With introduction of dynamic token timeout the introduced checks might not be enough. For instance setting token: 4294966646 will forbid corosync to start in 3-node cluster (4294966646+650 will result in max+1).

As the purpose of this bug was to eliminate these values from config and let the user know they are off limits, I'm flipping this back to assigned for reconsideration. Users would have to be really evil to reach these corner cases, but when we already have a bug for this... :).

Comment 6 Jan Friesse 2014-10-30 16:30:54 UTC
Good catch. Sadly I'm really unsure if it's possible to let user know they are off limit in this weird case. Basically, during startup it's possible, but sadly during runtime it's not possible (we really do not want to crash corosync just because user added another node).

So my proposed solution is to limit maximum token timeout to uint32_max to prevent overflow. And to reach consistent behavior, not showing any extra error.

Opinions? Ideas?

Comment 7 Jaroslav Kortus 2014-10-30 17:03:00 UTC
limiting token timeout only will not do it. We should limit also all the other variables that come into play (like the coefficient), or better said, the resulting product of them.

We could do this logic also on reload I guess, at least that worked for the join timeout vs consensus ratio, if it was not valid on reload, the reload was refused.

Is it possible to do that or are there any use cases that I don't see right now?

Comment 8 Jan Friesse 2014-10-31 16:47:51 UTC
Oh, this is misunderstanding. By token I mean final token value. So yes, coefficient comes into play.

About join vs consensus error. Sadly it's not true (and it may be worth of BZ). Error is shown, but reload is still processed and values are used.

That's why I'm considering using highest possible values for token and related variables. Maybe with warning message for 7.1.

Opinions?

Comment 9 Jan Friesse 2014-11-24 16:04:14 UTC
Cloning bug to 7.2, to solve token timeout overflow and add functionality to refuse reload.

Comment 13 errata-xmlrpc 2015-03-05 08:27:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0365.html