Bug 877381
Summary: | ricci crashes upon the request when local cluster.conf file is too large | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jaroslav Kortus <jkortus> | ||||||
Component: | ricci | Assignee: | Chris Feist <cfeist> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 6.4 | CC: | cluster-maint, jpokorny, rsteiger | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | ricci-0.16.2-59.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 885830 (view as bug list) | Environment: | |||||||
Last Closed: | 2013-02-21 10:47:56 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 881827, 885830 | ||||||||
Attachments: |
|
Description
Jaroslav Kortus
2012-11-16 11:44:00 UTC
short perl script to generate these long device-names: $agent="nonexistent_agent"; $devices="x" x 65536; $name="agent_name"; for (my $count=1;$count<=20;$count++) { $fd=sprintf '<fencedevice agent="%s%s" devices="%s" name="%s%s"/>', $agent, $count, $devices, $name, $count; print $fd."\n"; } had to do: echo 1 > /proc/sys/fs/suid_dumpable (see http://www.kernel.org/doc/man-pages/online/pages/man5/proc.5.html) to be able to produce the core in standard environment. Created attachment 646331 [details]
ricci core
Thanks, here we go: #0 0x0000000000416b20 in _M_refdata (this=0x7ff088d4b550) at /usr/include/c++/4.4.6/bits/basic_string.h:215 #1 _S_construct (this=0x7ff088d4b550) at /usr/include/c++/4.4.6/bits/basic_string.tcc:158 #2 basic_string (this=0x7ff088d4b550) at /usr/include/c++/4.4.6/bits/basic_string.tcc:222 #3 File::shred (this=0x7ff088d4b550) at File.cpp:201 Cannot access memory at address 0x7ff088c0ab58 re [comment 5]: for "len" variable at File::shred: len = 2295641392 Jarda, do you have any core dump limits imposed on your system? I am not sure, but the provided core seems incomplete. Could you try to recreate the core with unlimited "maximum size of core files created"? it's been captured with core unlimited... OK, I used mismatched version of ricci originally (there should be a big fat warning for that in debugger or I am blind). With a proper one: #0 File::read (this=<value optimized out>) at File.cpp:144 #1 File::operator String const (this=<value optimized out>) at File.cpp:215 #2 readXML (filename=<value optimized out>) at XML.cpp:303 #3 clusterinfo() at Ricci.cpp:464 #4 Ricci::ricci_header (this=<value optimized out>, authed=true, full=true) at Ricci.cpp:82 #5 Ricci::hello (this=<value optimized out>, authed=<value optimized out>) at Ricci.cpp:103 #6 ClientInstance::run (this=0x136bec0) at ClientInstance.cpp:113 #7 start_thread (thread_obj=<value optimized out>) at Thread.cpp:31 #8 start_thread (arg=0x7ff088d4c700) at pthread_create.c:301 #9 ??() #10 ??() This makes this equivalent with [bug 815752 comment 8] backtrace. I can see that while Jarda puts emphasis on parameter length (Jarda, could you recheck with longish parameter, but with overall XML size under 200 kB?), the real issue lays in what [bug bug 815752] describes + tries (naively) to fix. In File::read's frame: > (gdb) p len > $1 = 1313179 In other words, Jarda's cluster.conf has more than 1.25 MB. This corresponds directly to how this file was probably generated described in [comment 1] (65600 * 20 / 1024.0 / 1024 = 1.251, which is a bit less than a total size of 1313179 / 1024.0 / 1024 = 1.252 that includes surrounding mandatory tags). Fix for [bug 815752] just naively increases the stack size per thread, from 256kB to 1MB, which is still too few in Jarda's case. BTW. "ulimit -s" in my RHEL 6.3 VM shows me that process can have up to 10 MB stack size by default, which is equal to what pthread library uses as the default minimum stack size per thread [*]. Anyway, it is a good practise not abuse stack for clearly heap-ish things (big objects) as is the case with XML content. See also [1]. [*] # cat <<EOF | gcc -xc -lpthread -o test-stack - > #include <pthread.h> > #include <stdio.h> > > int main(int argc, char *argv[]) > { > size_t stacksize; > pthread_attr_t attr; > pthread_attr_init(&attr); > pthread_attr_getstacksize (&attr, &stacksize); > printf("Default stack size = %d\n", stacksize); > > } > EOF # ./test-stack Default stack size = 10485760 [1] http://stackoverflow.com/a/8650249 Created attachment 648116 [details] This should be a proper patch for ricci side instead of pthread_attr_setstacksize change as in [attachment 579845 [details]]. See also [bug 815752]. as as confirmed now, just single long 64k parameter does not crash ricci, so it must have been the overall size limit mentioned earlier. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0453.html |