Bug 877381 - ricci crashes upon the request when local cluster.conf file is too large
Summary: ricci crashes upon the request when local cluster.conf file is too large
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: ricci
Version: 6.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: ---
Assignee: Chris Feist
QA Contact: Cluster QE
Depends On:
Blocks: 881827 885830
TreeView+ depends on / blocked
Reported: 2012-11-16 11:44 UTC by Jaroslav Kortus
Modified: 2013-08-23 13:18 UTC (History)
3 users (show)

Fixed In Version: ricci-0.16.2-59.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 885830 (view as bug list)
Last Closed: 2013-02-21 10:47:56 UTC
Target Upstream Version:

Attachments (Terms of Use)
ricci core (134.41 KB, application/x-xz)
2012-11-16 14:00 UTC, Jaroslav Kortus
no flags Details
This should be a proper patch for ricci side instead of pthread_attr_setstacksize change as in [attachment 579845]. (709 bytes, patch)
2012-11-19 20:43 UTC, Jan Pokorný [poki]
no flags Details | Diff

System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 847234 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 965785 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Product Errata RHBA-2013:0453 0 normal SHIPPED_LIVE ricci bug fix and enhancement update 2013-02-20 21:07:44 UTC

Internal Links: 847234 965785

Description Jaroslav Kortus 2012-11-16 11:44:00 UTC
Description of problem:
ricci crashes if cluster.conf contains, for example:
<fencedevice agent="nonexistent_agent1" devices="<over 64k long string here>" name="agent_name1"/>

Maybe it's related to bug 815752 but specifying long enough string was enough for me to crash it.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. create cluster.conf with very long parameter (as described above)
2. ccs_sync -i
3. ricci crash
Actual results:
kernel: ricci[22072]: segfault at 7f6d6f628338 ip 0000000000416b20 sp 00007f6d6f628340 error 6 in ricci[400000+3d000]

Expected results:
no crash

Additional info:

Comment 1 Jaroslav Kortus 2012-11-16 11:49:56 UTC
short perl script to generate these long device-names:

$devices="x" x 65536;

for (my $count=1;$count<=20;$count++) {
$fd=sprintf '<fencedevice agent="%s%s" devices="%s" name="%s%s"/>', $agent, $count, $devices, $name, $count;
print $fd."\n";

Comment 3 Jaroslav Kortus 2012-11-16 13:59:42 UTC
had to do:
echo 1 > /proc/sys/fs/suid_dumpable 
(see http://www.kernel.org/doc/man-pages/online/pages/man5/proc.5.html)
to be able to produce the core in standard environment.

Comment 4 Jaroslav Kortus 2012-11-16 14:00:08 UTC
Created attachment 646331 [details]
ricci core

Comment 5 Jan Pokorný [poki] 2012-11-16 14:51:43 UTC
Thanks, here we go:

#0  0x0000000000416b20 in _M_refdata
    at /usr/include/c++/4.4.6/bits/basic_string.h:215
#1  _S_construct
    at /usr/include/c++/4.4.6/bits/basic_string.tcc:158
#2  basic_string    
    at /usr/include/c++/4.4.6/bits/basic_string.tcc:222
#3  File::shred
    at File.cpp:201
Cannot access memory at address 0x7ff088c0ab58

Comment 7 Jan Pokorný [poki] 2012-11-19 16:03:17 UTC
re [comment 5]:

for "len" variable at File::shred: len = 2295641392

Jarda, do you have any core dump limits imposed on your system?
I am not sure, but the provided core seems incomplete.
Could you try to recreate the core with unlimited "maximum size of core
files created"?

Comment 8 Jaroslav Kortus 2012-11-19 16:49:54 UTC
it's been captured with core unlimited...

Comment 9 Jan Pokorný [poki] 2012-11-19 18:07:50 UTC
OK, I used mismatched version of ricci originally (there should be a big
fat warning for that in debugger or I am blind).

With a proper one:

#0  File::read
    (this=<value optimized out>)
    at File.cpp:144

#1  File::operator String const
    (this=<value optimized out>)
    at File.cpp:215

#2  readXML
    (filename=<value optimized out>)
    at XML.cpp:303

#3  clusterinfo()
    at Ricci.cpp:464

#4  Ricci::ricci_header
    (this=<value optimized out>, authed=true, full=true)
    at Ricci.cpp:82

#5  Ricci::hello
    (this=<value optimized out>, authed=<value optimized out>)
    at Ricci.cpp:103

#6  ClientInstance::run
    at ClientInstance.cpp:113

#7  start_thread
    (thread_obj=<value optimized out>)
    at Thread.cpp:31

#8  start_thread
    at pthread_create.c:301

#9  ??()

#10 ??()

This makes this equivalent with [bug 815752 comment 8] backtrace.

Comment 10 Jan Pokorný [poki] 2012-11-19 19:10:13 UTC
I can see that while Jarda puts emphasis on parameter length (Jarda,
could you recheck with longish parameter, but with overall XML size
under 200 kB?), the real issue lays in what [bug bug 815752] describes
+ tries (naively) to fix.

In File::read's frame:

> (gdb) p len
> $1 = 1313179

In other words, Jarda's cluster.conf has more than 1.25 MB.
This corresponds directly to how this file was probably generated
described in [comment 1] (65600 * 20 / 1024.0 / 1024 = 1.251,
which is a bit less than a total size of 1313179 / 1024.0 / 1024 = 1.252
that includes surrounding mandatory tags).

Fix for [bug 815752] just naively increases the stack size per thread,
from 256kB to 1MB, which is still too few in Jarda's case.
BTW. "ulimit -s" in my RHEL 6.3 VM shows me that process can have
up to 10 MB stack size by default, which is equal to what pthread library
uses as the default minimum stack size per thread [*].

Anyway, it is a good practise not abuse stack for clearly heap-ish things
(big objects) as is the case with XML content.  See also [1].

# cat <<EOF | gcc -xc -lpthread -o test-stack -
> #include <pthread.h>
> #include <stdio.h>
> int main(int argc, char *argv[])
> {
>    size_t stacksize;
>    pthread_attr_t attr;
>    pthread_attr_init(&attr);
>    pthread_attr_getstacksize (&attr, &stacksize);
>    printf("Default stack size = %d\n", stacksize);
> }
# ./test-stack
Default stack size = 10485760

[1] http://stackoverflow.com/a/8650249

Comment 11 Jan Pokorný [poki] 2012-11-19 20:43:11 UTC
Created attachment 648116 [details]
This should be a proper patch for ricci side instead of pthread_attr_setstacksize change as in [attachment 579845 [details]].

See also [bug 815752].

Comment 13 Jaroslav Kortus 2012-11-20 11:38:43 UTC
as as confirmed now, just single long 64k parameter does not crash ricci, so it must have been the overall size limit mentioned earlier.

Comment 20 errata-xmlrpc 2013-02-21 10:47:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.