877381 – ricci crashes upon the request when local cluster.conf file is too large

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 877381 - ricci crashes upon the request when local cluster.conf file is too large

Summary: ricci crashes upon the request when local cluster.conf file is too large

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	ricci
Sub Component:
Version:	6.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Chris Feist
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	881827 885830
TreeView+	depends on / blocked

Reported:	2012-11-16 11:44 UTC by Jaroslav Kortus
Modified:	2013-08-23 13:18 UTC (History)
CC List:	3 users (show)
Fixed In Version:	ricci-0.16.2-59.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	885830 (view as bug list)
Environment:
Last Closed:	2013-02-21 10:47:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
ricci core (134.41 KB, application/x-xz) 2012-11-16 14:00 UTC, Jaroslav Kortus	no flags	Details
This should be a proper patch for ricci side instead of pthread_attr_setstacksize change as in [attachment 579845]. (709 bytes, patch) 2012-11-19 20:43 UTC, Jan Pokorný [poki]	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	847234	1	None	None	None	2021-01-20 06:05:38 UTC
Red Hat Bugzilla	965785	1	None	None	None	2021-01-20 06:05:38 UTC
Red Hat Product Errata	RHBA-2013:0453	0	normal	SHIPPED_LIVE	ricci bug fix and enhancement update	2013-02-20 21:07:44 UTC

Internal Links: 847234 965785

Description Jaroslav Kortus 2012-11-16 11:44:00 UTC

Description of problem:
ricci crashes if cluster.conf contains, for example:
<fencedevice agent="nonexistent_agent1" devices="<over 64k long string here>" name="agent_name1"/>

Maybe it's related to bug 815752 but specifying long enough string was enough for me to crash it.

Version-Release number of selected component (if applicable):
ricci-0.16.2-56.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. create cluster.conf with very long parameter (as described above)
2. ccs_sync -i
3. ricci crash
  
Actual results:
kernel: ricci[22072]: segfault at 7f6d6f628338 ip 0000000000416b20 sp 00007f6d6f628340 error 6 in ricci[400000+3d000]


Expected results:
no crash

Additional info:

Comment 1 Jaroslav Kortus 2012-11-16 11:49:56 UTC

short perl script to generate these long device-names:

$agent="nonexistent_agent";
$devices="x" x 65536;
$name="agent_name";

for (my $count=1;$count<=20;$count++) {
$fd=sprintf '<fencedevice agent="%s%s" devices="%s" name="%s%s"/>', $agent, $count, $devices, $name, $count;
print $fd."\n";
}

Comment 3 Jaroslav Kortus 2012-11-16 13:59:42 UTC

had to do:
echo 1 > /proc/sys/fs/suid_dumpable 
(see http://www.kernel.org/doc/man-pages/online/pages/man5/proc.5.html)
to be able to produce the core in standard environment.

Comment 4 Jaroslav Kortus 2012-11-16 14:00:08 UTC

Created attachment 646331 [details]
ricci core

Comment 5 Jan Pokorný [poki] 2012-11-16 14:51:43 UTC

Thanks, here we go:

#0  0x0000000000416b20 in _M_refdata
    (this=0x7ff088d4b550)
    at /usr/include/c++/4.4.6/bits/basic_string.h:215
#1  _S_construct
    (this=0x7ff088d4b550)
    at /usr/include/c++/4.4.6/bits/basic_string.tcc:158
#2  basic_string    
    (this=0x7ff088d4b550)
    at /usr/include/c++/4.4.6/bits/basic_string.tcc:222
#3  File::shred
    (this=0x7ff088d4b550)
    at File.cpp:201
Cannot access memory at address 0x7ff088c0ab58

Comment 7 Jan Pokorný [poki] 2012-11-19 16:03:17 UTC

re [comment 5]:

for "len" variable at File::shred: len = 2295641392

Jarda, do you have any core dump limits imposed on your system?
I am not sure, but the provided core seems incomplete.
Could you try to recreate the core with unlimited "maximum size of core
files created"?

Comment 8 Jaroslav Kortus 2012-11-19 16:49:54 UTC

it's been captured with core unlimited...

Comment 9 Jan Pokorný [poki] 2012-11-19 18:07:50 UTC

OK, I used mismatched version of ricci originally (there should be a big
fat warning for that in debugger or I am blind).

With a proper one:

#0  File::read
    (this=<value optimized out>)
    at File.cpp:144

#1  File::operator String const
    (this=<value optimized out>)
    at File.cpp:215

#2  readXML
    (filename=<value optimized out>)
    at XML.cpp:303

#3  clusterinfo()
    at Ricci.cpp:464

#4  Ricci::ricci_header
    (this=<value optimized out>, authed=true, full=true)
    at Ricci.cpp:82

#5  Ricci::hello
    (this=<value optimized out>, authed=<value optimized out>)
    at Ricci.cpp:103

#6  ClientInstance::run
    (this=0x136bec0)
    at ClientInstance.cpp:113

#7  start_thread
    (thread_obj=<value optimized out>)
    at Thread.cpp:31

#8  start_thread
    (arg=0x7ff088d4c700)
    at pthread_create.c:301

#9  ??()

#10 ??()

This makes this equivalent with [bug 815752 comment 8] backtrace.

Comment 10 Jan Pokorný [poki] 2012-11-19 19:10:13 UTC

I can see that while Jarda puts emphasis on parameter length (Jarda,
could you recheck with longish parameter, but with overall XML size
under 200 kB?), the real issue lays in what [bug bug 815752] describes
+ tries (naively) to fix.

In File::read's frame:

> (gdb) p len
> $1 = 1313179

In other words, Jarda's cluster.conf has more than 1.25 MB.
This corresponds directly to how this file was probably generated
described in [comment 1] (65600 * 20 / 1024.0 / 1024 = 1.251,
which is a bit less than a total size of 1313179 / 1024.0 / 1024 = 1.252
that includes surrounding mandatory tags).

Fix for [bug 815752] just naively increases the stack size per thread,
from 256kB to 1MB, which is still too few in Jarda's case.
BTW. "ulimit -s" in my RHEL 6.3 VM shows me that process can have
up to 10 MB stack size by default, which is equal to what pthread library
uses as the default minimum stack size per thread [*].

Anyway, it is a good practise not abuse stack for clearly heap-ish things
(big objects) as is the case with XML content.  See also [1].


[*]
# cat <<EOF | gcc -xc -lpthread -o test-stack -
> #include <pthread.h>
> #include <stdio.h>
> 
> int main(int argc, char *argv[])
> {
>    size_t stacksize;
>    pthread_attr_t attr;
>    pthread_attr_init(&attr);
>    pthread_attr_getstacksize (&attr, &stacksize);
>    printf("Default stack size = %d\n", stacksize);
>    
> }
> EOF
# ./test-stack
Default stack size = 10485760


[1] http://stackoverflow.com/a/8650249

Comment 11 Jan Pokorný [poki] 2012-11-19 20:43:11 UTC

Created attachment 648116 [details]
This should be a proper patch for ricci side instead of pthread_attr_setstacksize change as in [attachment 579845 [details]].

See also [bug 815752].

Comment 13 Jaroslav Kortus 2012-11-20 11:38:43 UTC

as as confirmed now, just single long 64k parameter does not crash ricci, so it must have been the overall size limit mentioned earlier.

Comment 20 errata-xmlrpc 2013-02-21 10:47:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0453.html

Note You need to log in before you can comment on or make changes to this bug.