Bug 877381

Summary: ricci crashes upon the request when local cluster.conf file is too large
Product: Red Hat Enterprise Linux 6 Reporter: Jaroslav Kortus <jkortus>
Component: ricciAssignee: Chris Feist <cfeist>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 6.4CC: cluster-maint, jpokorny, rsteiger
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ricci-0.16.2-59.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 885830 (view as bug list) Environment:
Last Closed: 2013-02-21 10:47:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 881827, 885830    
Attachments:
Description Flags
ricci core
none
This should be a proper patch for ricci side instead of pthread_attr_setstacksize change as in [attachment 579845]. none

Description Jaroslav Kortus 2012-11-16 11:44:00 UTC
Description of problem:
ricci crashes if cluster.conf contains, for example:
<fencedevice agent="nonexistent_agent1" devices="<over 64k long string here>" name="agent_name1"/>

Maybe it's related to bug 815752 but specifying long enough string was enough for me to crash it.

Version-Release number of selected component (if applicable):
ricci-0.16.2-56.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. create cluster.conf with very long parameter (as described above)
2. ccs_sync -i
3. ricci crash
  
Actual results:
kernel: ricci[22072]: segfault at 7f6d6f628338 ip 0000000000416b20 sp 00007f6d6f628340 error 6 in ricci[400000+3d000]


Expected results:
no crash

Additional info:

Comment 1 Jaroslav Kortus 2012-11-16 11:49:56 UTC
short perl script to generate these long device-names:

$agent="nonexistent_agent";
$devices="x" x 65536;
$name="agent_name";

for (my $count=1;$count<=20;$count++) {
$fd=sprintf '<fencedevice agent="%s%s" devices="%s" name="%s%s"/>', $agent, $count, $devices, $name, $count;
print $fd."\n";
}

Comment 3 Jaroslav Kortus 2012-11-16 13:59:42 UTC
had to do:
echo 1 > /proc/sys/fs/suid_dumpable 
(see http://www.kernel.org/doc/man-pages/online/pages/man5/proc.5.html)
to be able to produce the core in standard environment.

Comment 4 Jaroslav Kortus 2012-11-16 14:00:08 UTC
Created attachment 646331 [details]
ricci core

Comment 5 Jan Pokorný [poki] 2012-11-16 14:51:43 UTC
Thanks, here we go:

#0  0x0000000000416b20 in _M_refdata
    (this=0x7ff088d4b550)
    at /usr/include/c++/4.4.6/bits/basic_string.h:215
#1  _S_construct
    (this=0x7ff088d4b550)
    at /usr/include/c++/4.4.6/bits/basic_string.tcc:158
#2  basic_string    
    (this=0x7ff088d4b550)
    at /usr/include/c++/4.4.6/bits/basic_string.tcc:222
#3  File::shred
    (this=0x7ff088d4b550)
    at File.cpp:201
Cannot access memory at address 0x7ff088c0ab58

Comment 7 Jan Pokorný [poki] 2012-11-19 16:03:17 UTC
re [comment 5]:

for "len" variable at File::shred: len = 2295641392

Jarda, do you have any core dump limits imposed on your system?
I am not sure, but the provided core seems incomplete.
Could you try to recreate the core with unlimited "maximum size of core
files created"?

Comment 8 Jaroslav Kortus 2012-11-19 16:49:54 UTC
it's been captured with core unlimited...

Comment 9 Jan Pokorný [poki] 2012-11-19 18:07:50 UTC
OK, I used mismatched version of ricci originally (there should be a big
fat warning for that in debugger or I am blind).

With a proper one:

#0  File::read
    (this=<value optimized out>)
    at File.cpp:144

#1  File::operator String const
    (this=<value optimized out>)
    at File.cpp:215

#2  readXML
    (filename=<value optimized out>)
    at XML.cpp:303

#3  clusterinfo()
    at Ricci.cpp:464

#4  Ricci::ricci_header
    (this=<value optimized out>, authed=true, full=true)
    at Ricci.cpp:82

#5  Ricci::hello
    (this=<value optimized out>, authed=<value optimized out>)
    at Ricci.cpp:103

#6  ClientInstance::run
    (this=0x136bec0)
    at ClientInstance.cpp:113

#7  start_thread
    (thread_obj=<value optimized out>)
    at Thread.cpp:31

#8  start_thread
    (arg=0x7ff088d4c700)
    at pthread_create.c:301

#9  ??()

#10 ??()

This makes this equivalent with [bug 815752 comment 8] backtrace.

Comment 10 Jan Pokorný [poki] 2012-11-19 19:10:13 UTC
I can see that while Jarda puts emphasis on parameter length (Jarda,
could you recheck with longish parameter, but with overall XML size
under 200 kB?), the real issue lays in what [bug bug 815752] describes
+ tries (naively) to fix.

In File::read's frame:

> (gdb) p len
> $1 = 1313179

In other words, Jarda's cluster.conf has more than 1.25 MB.
This corresponds directly to how this file was probably generated
described in [comment 1] (65600 * 20 / 1024.0 / 1024 = 1.251,
which is a bit less than a total size of 1313179 / 1024.0 / 1024 = 1.252
that includes surrounding mandatory tags).

Fix for [bug 815752] just naively increases the stack size per thread,
from 256kB to 1MB, which is still too few in Jarda's case.
BTW. "ulimit -s" in my RHEL 6.3 VM shows me that process can have
up to 10 MB stack size by default, which is equal to what pthread library
uses as the default minimum stack size per thread [*].

Anyway, it is a good practise not abuse stack for clearly heap-ish things
(big objects) as is the case with XML content.  See also [1].


[*]
# cat <<EOF | gcc -xc -lpthread -o test-stack -
> #include <pthread.h>
> #include <stdio.h>
> 
> int main(int argc, char *argv[])
> {
>    size_t stacksize;
>    pthread_attr_t attr;
>    pthread_attr_init(&attr);
>    pthread_attr_getstacksize (&attr, &stacksize);
>    printf("Default stack size = %d\n", stacksize);
>    
> }
> EOF
# ./test-stack
Default stack size = 10485760


[1] http://stackoverflow.com/a/8650249

Comment 11 Jan Pokorný [poki] 2012-11-19 20:43:11 UTC
Created attachment 648116 [details]
This should be a proper patch for ricci side instead of pthread_attr_setstacksize change as in [attachment 579845 [details]].

See also [bug 815752].

Comment 13 Jaroslav Kortus 2012-11-20 11:38:43 UTC
as as confirmed now, just single long 64k parameter does not crash ricci, so it must have been the overall size limit mentioned earlier.

Comment 20 errata-xmlrpc 2013-02-21 10:47:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0453.html