Bug 855120

Summary: css_sync: harmful handling of XML entities
Product: Red Hat Enterprise Linux 6 Reporter: Jan Pokorný [poki] <jpokorny>
Component: ricciAssignee: Chris Feist <cfeist>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.3CC: cluster-maint, fdinitto
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 855112 Environment:
Last Closed: 2014-03-27 16:45:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 855112, 855121    
Bug Blocks:    

Description Jan Pokorný [poki] 2012-09-06 18:14:44 UTC
+++ This bug was initially created as a clone of Bug #855112 +++

In connection to case discussed in the public cluster ML [1], we need
to make sure the user input that makes its way into XML (either as
cluster.conf or XML-formatted requests towards ricci) is XML-ready.
That is, the "unsafe" characters are encoded as XML entities.

Otherwise, such requests have no effect and unfortunately, neither
ricci provides a sensible diagnostics about this.  Example of such
input is the password for the initial authentization against ricci.

[1] https://www.redhat.com/archives/linux-cluster/2012-August/msg00135.html

Comment 1 Jan Pokorný [poki] 2012-09-14 17:01:44 UTC
I proposed a (roughly tested) patch in (temporary) branch upstream:
http://git.fedorahosted.org/cgit/conga.git/commit/?h=bz855120

Comment 2 Chris Feist 2012-09-17 21:58:49 UTC
I took a quick look at the solution and I'm worried about it's complexity and potential for bugs.  Since the solution to this problem is just a simple find and replace can you just create a str_replace function that will return a string with the appropriate values replaced?

Something like this:

int str_replace(char **orig_str, char *find_str, char *replace_str)

which returns the number of characters replaced and updates orig_str

Then we don't make any changes to custom_getpass, but we add code to ricci_conn_handler_want_send_auth to do the search and replace.

so we would have something like this

passwd_len = custom_getpass(&passwd, &n, stdin);
count = str_replace(&passwd, "&", "&amp;");
count = str_replace(&passwd, ">", "&gt;");
count = str_replace(&passwd, "<", "&lt;");
count = str_replace(&passwd, "\"", "&quot;");
count = str_replace(&passwd, "'", "&apos;");

Comment 3 Chris Feist 2012-09-17 22:00:20 UTC
Also, one other note, you could take those 5 commands and stick them into a str_xml_escape function that we could potentially use in other parts of the code.

Comment 4 Jan Pokorný [poki] 2012-09-25 19:42:01 UTC
Chris, I don't understand you.  You are worried about error pronenesss
and at the same time you ask for str_replace, which is not only a greater
pain to implement properly in C (iterating through occurrences, resizing
buffer, shifting the tail a few characters to the right as opposed to
straight on-the-fly conversion) but more likely to contain a bug.
Beside, it is at least 5 times less efficient.


Anyway, since I find the issue having possible compatibility impact, I've
decided to bump the ricci protocol 1.0 -> 1.1, meaning that components
(in fact, only ricci, see further) using protocol 1.1 treat XML messages
in the intercommunication properly w.r.t. XML entities.

In practice, only ricci will be announcing protocol 1.1 (version ignored
by the clients until now anyway, so it ricci with 1.1 will remain backward
compatible for clients modulo solved corner cases).  Clients keep using
1.0 in the requests (mainly for simplicity, as this would require really
heavy refactoring, design apparently did not take protocol updates into
account), but they can workaround buggy behavior of older ricci with
1.0 protocol when it is detected.  This is exactly what I did with
ccs_sync (see referred branch I've just refreshed with revisited patch)
-- for ricci with 1.0, escape the password twice, not just once, as
   a countermeasure for its ill double unescape of attribute values
   (natively via libxml2 and explicitly in the code).


I hope the core of the solution is acceptable, I can comb some places
if any objections are raised.

Comment 5 Chris Feist 2012-09-25 20:39:56 UTC
Jan,

Unfortunately your solution is far more difficult to debug and figure out what's going on then simply adding a str_replace.  (In fact your code is actually doing a string replace in custom_getpass).

Also, why do we need to bump the ricci protocol?  If we just escape out the proper characters the ricci protocol will authenticate just fine.

Comment 6 Jan Pokorný [poki] 2012-09-25 21:33:48 UTC
Difficulties with debugging is a valid argument, debugging macros can
be hell, but:
- you can use, e.g., socat to look into what is being sent
- you can use test_getpass to trace the code easily
- if macro is a suspected culprit, you can always use gcc -E
  to generate macro-less code and use it instead

I am quite confident about the code, I was able to find some issues
prior to publishing the revisited patch (does not mean there is none
left).

---

The bump is mainly to mark non-misintepreting ricci server.
Misinterpreting in two ways:

- input: viz. "foo&amp;amp;bar" plaintext authentication password case
    user:   		   foo&amp;amp;bar
                                  v
    XML towards ricci:	   foo&amp;amp;amp;bar (ex: fixed ccs [bug 855117])
                                  v
    XML value via xmllib2: foo&amp;amp;bar
                                  v
    ricci extra handling:  foo&amp;bar (misinterpretation, cf. "user")

- output: unfortunately, if client gets "foo&bar" plaintext attribute
          value, it cannot be quite sure if the original value was
          indeed "foo&bar" or "foo&amp;bar"

    ricci's internal value:  foo&bar     foo&amp;bar
                                   \       /
    ricci extra handling:         foo&amp;bar
                                       v
    client's internal value:        foo&bar

In both cases, the error (input case) or uncertainty (output case) can
be nested through more levels (ricci - ricci-worker - modcluster).
Because the password/input case is assuredly not nested and because
the probability of using problematic characters in other XML stuff
is even lower, my intention is only to get password/input case right
(new client can authenticate against old ricci with *any* random password
in an acceptable length; there is no guarantee that password generated
by script will not be similar to that "foo&amp;amp;bar").

So, if client sees protocol 1.0 in hello message from ricci, it can make
assumption that the password should be escaped twice (instead of once,
note that this family of bugs started as performing escaping not even
once) to make this ricci understand it correctly.  Indeed, updating ricci
is warmly recommended but it may be more difficult then doing the same
with clients.

Comment 7 Chris Feist 2012-09-26 18:55:28 UTC
Jan,

Is there an example where you need to escape the password text more than once?  I did a quick test and as long I escape the password (once) it authenticates and everything works.  I'm not sure I see what the advantage of bumping the protocol version is to fix the password not being escaped issue.  Are you trying to fix other issues by bumping the version?

Comment 8 Chris Feist 2012-09-26 19:41:28 UTC
Ok, I see where it's being escaped twice.  Can we just fix ricci to not do the escaping twice?  Is there anywhere else in the code where we depend on things being escaped twice?

I'm looking at the invert_chars function in common/XML.cpp, it seems like it's only called after we get the value using xmlGetProp (which already does the escaping).  Can we just not call invert_chars?  Or if we need to just set up a special case for the password?

Comment 9 Jan Pokorný [poki] 2012-09-26 21:40:20 UTC
> Can we just fix ricci to not do the escaping twice?

> Can we just not call invert_chars

Yes, this is the core fix and done within [bug 855121].

Comment 10 Chris Feist 2012-09-26 21:49:47 UTC
Ok, I don't see any reason to be adding another protocol for ricci.  We should definitely *not* be doing this to fix a simple issue where the password isn't getting escaped properly.  Adding another protocol adds all sorts of potential issues and would basically double the test matrix.

All we should be doing is properly escaping the password being encapsulated with ccs_sync & potentially not double un-escaping it in ricci.

This should be a fairly trivial fix in ccs_sync by implementing a string replace function and we can either remove the 2nd unescape in ricci or just do an if statement and remove the 2nd unescape only for the password prompt (to prevent other bugs from popping up).  Doing any more than this is a huge risk to break other things inside of ricci.

Comment 11 Jan Pokorný [poki] 2012-09-27 12:48:45 UTC
No, there is an misconception, it is not another protocol, it remains
the same in terms of messages and everthing... except for declared
version.

In an ideal world, ricci and its clients would exchange identification
such as the component versions and based on this, they could adapt
to each other to workaround bugs like these.  In practice, this is to
be mimicked by artificial protocol version bump.  Clients then can
identify buggy ricci version [*] and do the following:

- warn about possible configuration inconsistencies, suggesting the
  update of ricci+modcluster

- at the very least, use a double-escape workaround for password
  value, which makes cases like "foo&amp;amp;bar" safe

[*] mind the inherent difference between software deployment use inside
the cluster and the deployment outside, which includes clients of ricci;
you cannot guarantee the same generation of fixed ricci and fixed clients
[ccs_sync and ccs], whereas updating clients is supposed to be less
invasive to the established environment.


For completeness, test scenario to cover whole set of bugs:

0. in all cases, use "foo&amp;amp;bar" as ricci's password,
   upon each successful authentication against ricci, do:
   rm -f /var/lib/ricci/certs/clients/* && service ricci restart

1. "1.0" ccs{,_sync} + luci vs. "1.0" ricci
   - reproducer of all [bug 855121], [bug 855117], [bug 855120]
     and [bug 855112] (modcluster has to be treated separately)
   - do: trigger authentication (ccs -h localhost --getconf, ccs_sync)
   - expected: <Timeout_reached_without_valid_XML_request/> after about
               two minutes when using, e.g., "foo&bar" (well-formedness
               breaking character) as a password, or authentication
               failure

2. "1.0" ccs{,_sync} + luci vs. "1.1" ricci
   - reproducer of [bug 855117], [bug 855120] only when using, e.g.,
     "foo&bar" as a password, now the response (displayed by ccs
     in debugging mode) should be enriched with a note:
     <Timeout_reached_without_valid_XML_request
       note="Warning: current client may rarely fail to authenticate,
             update recommended (rhbz855117,rhbz855120)."/>

3. "1.1" ccs{,_sync} + luci vs. "1.0" ricci
   - client-side workarounding of buggy ricci, authentication using
     "foo&amp;amp;bar" should proceed correctly, but a warning like
     this is issued to note the fact ricci may not treat XML correctly
     in rare cases:
     Warning: current ricci+modcluster may rarely lead to configuration
              inconsistencies, update recommended (rhbz855121).

4. "1.1" ccs{,_sync} + luci vs. "1.1" ricci
   - expected: works smoothly, no such warning ever needed

Plus basic sanity checking (getting content of cluster.conf and its
modification), ideally in all 4 cases (4*3) above.

For modcluster ([bug 858386]), the bug-triggering string has to be
crafted directly into cluster.conf and I am yet to state the example
at that bug specifically.

Comment 17 Jan Pokorný [poki] 2012-11-14 14:41:19 UTC
from [comment 5]

> Also, why do we need to bump the ricci protocol?  If we just escape out
> the proper characters the ricci protocol will authenticate just fine.

from [comment 7]

> I'm not sure I see what the advantage of bumping the protocol version
> is to fix the password not being escaped issue.  Are you trying to fix
> other issues by bumping the version?

Recent discussion on #linux-cluster (2012-11-14) demonstrates clearly why
it is harmful not to follow prepared API-update path -- a wise design
decision, but unfortunately ignored later on, at least in the case at
hand.

13:54 < frechdachs69>
 when doing 'ccs --getconf -h node1' I get 'Error: Unable to retrieve
 information from ricci (is modcluster installed?)'; modcluster is installed
 and modclusterd is  running; what could be the reason for the error?
13:56 < frechdachs69>
 ricci is running, too
13:59 < jpokorny>
 frechdachs69: firewall on node1? (tcp, dport 11111)
14:00 < frechdachs69>
 jpokorny: no, I even tried 'ccs --getconf -h localhost' without success
14:00 < frechdachs69>
 jpokorny: and 'telnet localhost 11111' gives me a prompt
14:01 < jpokorny>
 frechdachs69: can you try ccs with "-d"?
14:02 < frechdachs69>
 jpokorny: good hint; I get 'function get_cluster_schema not in API 1.0'
 as the error message
14:02 < jpokorny>
 frechdachs69: also not sure now, but getconf action is not available at
 older versions
14:02 < jpokorny>
 frechdachs69: ^ yep
14:03 < jpokorny>
 frechdachs69: I mean, not supported by older ricci
14:04 < frechdachs69>
 jpokorny: okay, since I have at RHEL 6.1 ricci installed I'll check with
 the one from the 6.3 DVD
14:07 < frechdachs69>
 jpokorny: hmm ... the reason might be that I had updated ccs to a the 6.3
 version before; maybe both (ccs/ricci) have to match
14:11 < jpokorny>
 frechdachs69: yes, unfortunately API versioning was not followed properly
 in this case :-/

This shows that when a new call (get_cluster_schema) was added into cluster
API in connection to [bug 724978], the version should have been bumped as
well to prevent such mismatch.  This would require a simple API version
checks (should have been here from beginning) in all protocol-facing
components (luci, ricci/modcluster, ccs), but could provide explicit error
messages.

And it is this proper engineering continuity approach that asks for
a version bump (of "protocol" in this case, which is orthogonal to versions
of APIs) -- for this very reason the versioning was included in
the protocol design.

Comment 21 Fabio Massimo Di Nitto 2014-03-27 16:45:33 UTC

*** This bug has been marked as a duplicate of bug 855112 ***