Bug 853890
Summary: | ccs_sync aborts in free (ccs_sync: double free or corruption) | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Frantisek Reznicek <freznice> |
Component: | ricci | Assignee: | Chris Feist <cfeist> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 6.3 | CC: | cluster-maint, esammons, jpokorny, rsteiger, slevine |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ccs-0.16.2-64.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-11-21 21:52:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 883504, 960054 |
Description
Frantisek Reznicek
2012-09-03 09:07:42 UTC
Workaround using cman_tool works well (cman_tool version -r) Confirming, thanks for trying this corner case. I proposed a (roughly tested) patch in (temporary) branch upstream: http://git.fedorahosted.org/cgit/conga.git/log/?h=bz853890 I haven't had a lot of time to look at the code, but I did a quick test and wouldn't it be simpler just to pull the cmdline_nodes out of the if/else like this? --- a/ricci/ccs_sync/ricci_conf.c +++ b/ricci/ccs_sync/ricci_conf.c @@ -68,6 +68,7 @@ int main(int argc, const char const * const *argv) { hash_t *dest_nodes; int num_nodes = 0; int increment_version = 0; + hash_t cmdline_nodes; ret = hash_init(&node_hash, 5, string_compare, hash_string, hash_dtor); if (ret == -1) { @@ -181,7 +182,6 @@ int main(int argc, const char const * const *argv) { /* Send to all current member nodes */ dest_nodes = &node_hash; } else { - hash_t cmdline_nodes; ret = hash_init(&cmdline_nodes, 5, string_compare, hash_string, hash_dtor); if (ret == -1) { That seems to work and only moves one line of code. jpokorny do you see an issue with this code? [re comment 6] It works, but duct taping (naïve solution) is probably what we want to avoid as it may bite in long-term, isn't it? These are things not covered by your patch: - dead code ("if (num_nodes <= 0)" condition) - memory leak (either if nodes used selectively or all-in-cluster.conf) - missing hash_destroy - (this is a bit more subjective) bad condition for "Unable to find cluster nodes in ..." message As you can see, my version is even shorter, even though I added comments to support comprehension. Purely my opinion, I would definitely be more happy with the patch I proposed, however your version will (probably) do the job from purely blackbox perspective, too. Then lets go with my solution. The patch moves one line of code so it's unlikely to affect any other currently expected behavior with ccs. If we were planning on using ccs_sync in future releases it would probably be worth fixing it, but for now, just making a one line change that fixes the problem is the way to go. Fixed Here: https://git.fedorahosted.org/cgit/conga.git/commit/?h=RHEL6&id=13fc0f7cde80ccf34486814592203cd00ea59fb0 Before fix: [root@ask-02 ~]# rpm -q ricci ricci-0.16.2-63.el6.x86_64 [root@ask-02 ~]# ccs -f bztest --createcluster bztest -i [root@ask-02 ~]# ccs -f bztest --addnode n1 Node n1 added. [root@ask-02 ~]# ccs -f bztest --addnode n2 Node n2 added. [root@ask-02 ~]# ccs_sync -f bztest n1 n2 *** glibc detected *** ccs_sync: double free or corruption (top): 0x0000000001ef97a0 *** ======= Backtrace: ========= /lib64/libc.so.6[0x3914475366] /lib64/libc.so.6[0x3914477e93] ccs_sync[0x4024b8] .... After fix: (make sure n1/n2 are nodes that exist in dns but aren't running ricci) [root@ask-02 t2]# rpm -q ricci ricci-0.16.2-64.el6.x86_64 [root@ask-02 t2]# ccs -f bztest --createcluster bztest -i [root@ask-02 t2]# ccs -f bztest --addnode n1 Node n1 added. [root@ask-02 t2]# ccs -f bztest --addnode n2 Node n2 added. [root@ask-02 t2]# ccs_sync -f bztest n1 n2 Failed to connect to n2: Connection refused. Failed to connect to n1: Connection refused. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1673.html |