Bug 855120
Summary: | css_sync: harmful handling of XML entities | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jan Pokorný [poki] <jpokorny> |
Component: | ricci | Assignee: | Chris Feist <cfeist> |
Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.3 | CC: | cluster-maint, fdinitto |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 855112 | Environment: | |
Last Closed: | 2014-03-27 16:45:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 855112, 855121 | ||
Bug Blocks: |
Description
Jan Pokorný [poki]
2012-09-06 18:14:44 UTC
I proposed a (roughly tested) patch in (temporary) branch upstream: http://git.fedorahosted.org/cgit/conga.git/commit/?h=bz855120 I took a quick look at the solution and I'm worried about it's complexity and potential for bugs. Since the solution to this problem is just a simple find and replace can you just create a str_replace function that will return a string with the appropriate values replaced? Something like this: int str_replace(char **orig_str, char *find_str, char *replace_str) which returns the number of characters replaced and updates orig_str Then we don't make any changes to custom_getpass, but we add code to ricci_conn_handler_want_send_auth to do the search and replace. so we would have something like this passwd_len = custom_getpass(&passwd, &n, stdin); count = str_replace(&passwd, "&", "&"); count = str_replace(&passwd, ">", ">"); count = str_replace(&passwd, "<", "<"); count = str_replace(&passwd, "\"", """); count = str_replace(&passwd, "'", "'"); Also, one other note, you could take those 5 commands and stick them into a str_xml_escape function that we could potentially use in other parts of the code. Chris, I don't understand you. You are worried about error pronenesss and at the same time you ask for str_replace, which is not only a greater pain to implement properly in C (iterating through occurrences, resizing buffer, shifting the tail a few characters to the right as opposed to straight on-the-fly conversion) but more likely to contain a bug. Beside, it is at least 5 times less efficient. Anyway, since I find the issue having possible compatibility impact, I've decided to bump the ricci protocol 1.0 -> 1.1, meaning that components (in fact, only ricci, see further) using protocol 1.1 treat XML messages in the intercommunication properly w.r.t. XML entities. In practice, only ricci will be announcing protocol 1.1 (version ignored by the clients until now anyway, so it ricci with 1.1 will remain backward compatible for clients modulo solved corner cases). Clients keep using 1.0 in the requests (mainly for simplicity, as this would require really heavy refactoring, design apparently did not take protocol updates into account), but they can workaround buggy behavior of older ricci with 1.0 protocol when it is detected. This is exactly what I did with ccs_sync (see referred branch I've just refreshed with revisited patch) -- for ricci with 1.0, escape the password twice, not just once, as a countermeasure for its ill double unescape of attribute values (natively via libxml2 and explicitly in the code). I hope the core of the solution is acceptable, I can comb some places if any objections are raised. Jan, Unfortunately your solution is far more difficult to debug and figure out what's going on then simply adding a str_replace. (In fact your code is actually doing a string replace in custom_getpass). Also, why do we need to bump the ricci protocol? If we just escape out the proper characters the ricci protocol will authenticate just fine. Difficulties with debugging is a valid argument, debugging macros can be hell, but: - you can use, e.g., socat to look into what is being sent - you can use test_getpass to trace the code easily - if macro is a suspected culprit, you can always use gcc -E to generate macro-less code and use it instead I am quite confident about the code, I was able to find some issues prior to publishing the revisited patch (does not mean there is none left). --- The bump is mainly to mark non-misintepreting ricci server. Misinterpreting in two ways: - input: viz. "foo&amp;bar" plaintext authentication password case user: foo&amp;bar v XML towards ricci: foo&amp;amp;bar (ex: fixed ccs [bug 855117]) v XML value via xmllib2: foo&amp;bar v ricci extra handling: foo&bar (misinterpretation, cf. "user") - output: unfortunately, if client gets "foo&bar" plaintext attribute value, it cannot be quite sure if the original value was indeed "foo&bar" or "foo&bar" ricci's internal value: foo&bar foo&bar \ / ricci extra handling: foo&bar v client's internal value: foo&bar In both cases, the error (input case) or uncertainty (output case) can be nested through more levels (ricci - ricci-worker - modcluster). Because the password/input case is assuredly not nested and because the probability of using problematic characters in other XML stuff is even lower, my intention is only to get password/input case right (new client can authenticate against old ricci with *any* random password in an acceptable length; there is no guarantee that password generated by script will not be similar to that "foo&amp;bar"). So, if client sees protocol 1.0 in hello message from ricci, it can make assumption that the password should be escaped twice (instead of once, note that this family of bugs started as performing escaping not even once) to make this ricci understand it correctly. Indeed, updating ricci is warmly recommended but it may be more difficult then doing the same with clients. Jan, Is there an example where you need to escape the password text more than once? I did a quick test and as long I escape the password (once) it authenticates and everything works. I'm not sure I see what the advantage of bumping the protocol version is to fix the password not being escaped issue. Are you trying to fix other issues by bumping the version? Ok, I see where it's being escaped twice. Can we just fix ricci to not do the escaping twice? Is there anywhere else in the code where we depend on things being escaped twice? I'm looking at the invert_chars function in common/XML.cpp, it seems like it's only called after we get the value using xmlGetProp (which already does the escaping). Can we just not call invert_chars? Or if we need to just set up a special case for the password? > Can we just fix ricci to not do the escaping twice? > Can we just not call invert_chars Yes, this is the core fix and done within [bug 855121]. Ok, I don't see any reason to be adding another protocol for ricci. We should definitely *not* be doing this to fix a simple issue where the password isn't getting escaped properly. Adding another protocol adds all sorts of potential issues and would basically double the test matrix. All we should be doing is properly escaping the password being encapsulated with ccs_sync & potentially not double un-escaping it in ricci. This should be a fairly trivial fix in ccs_sync by implementing a string replace function and we can either remove the 2nd unescape in ricci or just do an if statement and remove the 2nd unescape only for the password prompt (to prevent other bugs from popping up). Doing any more than this is a huge risk to break other things inside of ricci. No, there is an misconception, it is not another protocol, it remains the same in terms of messages and everthing... except for declared version. In an ideal world, ricci and its clients would exchange identification such as the component versions and based on this, they could adapt to each other to workaround bugs like these. In practice, this is to be mimicked by artificial protocol version bump. Clients then can identify buggy ricci version [*] and do the following: - warn about possible configuration inconsistencies, suggesting the update of ricci+modcluster - at the very least, use a double-escape workaround for password value, which makes cases like "foo&amp;bar" safe [*] mind the inherent difference between software deployment use inside the cluster and the deployment outside, which includes clients of ricci; you cannot guarantee the same generation of fixed ricci and fixed clients [ccs_sync and ccs], whereas updating clients is supposed to be less invasive to the established environment. For completeness, test scenario to cover whole set of bugs: 0. in all cases, use "foo&amp;bar" as ricci's password, upon each successful authentication against ricci, do: rm -f /var/lib/ricci/certs/clients/* && service ricci restart 1. "1.0" ccs{,_sync} + luci vs. "1.0" ricci - reproducer of all [bug 855121], [bug 855117], [bug 855120] and [bug 855112] (modcluster has to be treated separately) - do: trigger authentication (ccs -h localhost --getconf, ccs_sync) - expected: <Timeout_reached_without_valid_XML_request/> after about two minutes when using, e.g., "foo&bar" (well-formedness breaking character) as a password, or authentication failure 2. "1.0" ccs{,_sync} + luci vs. "1.1" ricci - reproducer of [bug 855117], [bug 855120] only when using, e.g., "foo&bar" as a password, now the response (displayed by ccs in debugging mode) should be enriched with a note: <Timeout_reached_without_valid_XML_request note="Warning: current client may rarely fail to authenticate, update recommended (rhbz855117,rhbz855120)."/> 3. "1.1" ccs{,_sync} + luci vs. "1.0" ricci - client-side workarounding of buggy ricci, authentication using "foo&amp;bar" should proceed correctly, but a warning like this is issued to note the fact ricci may not treat XML correctly in rare cases: Warning: current ricci+modcluster may rarely lead to configuration inconsistencies, update recommended (rhbz855121). 4. "1.1" ccs{,_sync} + luci vs. "1.1" ricci - expected: works smoothly, no such warning ever needed Plus basic sanity checking (getting content of cluster.conf and its modification), ideally in all 4 cases (4*3) above. For modcluster ([bug 858386]), the bug-triggering string has to be crafted directly into cluster.conf and I am yet to state the example at that bug specifically. from [comment 5] > Also, why do we need to bump the ricci protocol? If we just escape out > the proper characters the ricci protocol will authenticate just fine. from [comment 7] > I'm not sure I see what the advantage of bumping the protocol version > is to fix the password not being escaped issue. Are you trying to fix > other issues by bumping the version? Recent discussion on #linux-cluster (2012-11-14) demonstrates clearly why it is harmful not to follow prepared API-update path -- a wise design decision, but unfortunately ignored later on, at least in the case at hand. 13:54 < frechdachs69> when doing 'ccs --getconf -h node1' I get 'Error: Unable to retrieve information from ricci (is modcluster installed?)'; modcluster is installed and modclusterd is running; what could be the reason for the error? 13:56 < frechdachs69> ricci is running, too 13:59 < jpokorny> frechdachs69: firewall on node1? (tcp, dport 11111) 14:00 < frechdachs69> jpokorny: no, I even tried 'ccs --getconf -h localhost' without success 14:00 < frechdachs69> jpokorny: and 'telnet localhost 11111' gives me a prompt 14:01 < jpokorny> frechdachs69: can you try ccs with "-d"? 14:02 < frechdachs69> jpokorny: good hint; I get 'function get_cluster_schema not in API 1.0' as the error message 14:02 < jpokorny> frechdachs69: also not sure now, but getconf action is not available at older versions 14:02 < jpokorny> frechdachs69: ^ yep 14:03 < jpokorny> frechdachs69: I mean, not supported by older ricci 14:04 < frechdachs69> jpokorny: okay, since I have at RHEL 6.1 ricci installed I'll check with the one from the 6.3 DVD 14:07 < frechdachs69> jpokorny: hmm ... the reason might be that I had updated ccs to a the 6.3 version before; maybe both (ccs/ricci) have to match 14:11 < jpokorny> frechdachs69: yes, unfortunately API versioning was not followed properly in this case :-/ This shows that when a new call (get_cluster_schema) was added into cluster API in connection to [bug 724978], the version should have been bumped as well to prevent such mismatch. This would require a simple API version checks (should have been here from beginning) in all protocol-facing components (luci, ricci/modcluster, ccs), but could provide explicit error messages. And it is this proper engineering continuity approach that asks for a version bump (of "protocol" in this case, which is orthogonal to versions of APIs) -- for this very reason the versioning was included in the protocol design. *** This bug has been marked as a duplicate of bug 855112 *** |