Bug 906610
Summary: | Cluster lockups caused by xmlNodeDump() being too slow | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Andrew Beekhof <abeekhof> | ||||
Component: | libxml2 | Assignee: | Daniel Veillard <veillard> | ||||
Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-tools-bugs | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.4 | CC: | fdinitto, riel | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-12-06 12:49:31 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Andrew Beekhof
2013-02-01 02:24:40 UTC
CC'ing David who has also been working on this. Hum, on Fedora 17 with git head of libxml2, running while true ; do time ./tst ; done with the data generated as suggested i get real 0m0.550s user 0m0.491s sys 0m0.051s real 0m0.552s user 0m0.500s sys 0m0.047s real 0m0.555s user 0m0.497s sys 0m0.050s real 0m0.556s user 0m0.494s sys 0m0.057s i.e. fairly consistent output ... thinkpad:~/XML -> ls -l journal.xml -rw-rw-r--. 1 veillard veillard 9390736 Feb 6 11:48 journal.xml thinkpad:~/XML -> on an older machine running 6.3 and a local build of libxml2-2.7.6-11.el6.x86_64 I get : [root@server tmp]# time ./tst real 0m8.758s user 0m5.693s sys 0m3.045s [root@server tmp]# time ./tst real 0m8.780s user 0m5.697s sys 0m3.063s [root@server tmp]# time ./tst real 0m9.028s user 0m5.592s sys 0m3.413s [root@server tmp]# time ./tst real 0m9.007s user 0m5.682s sys 0m3.303s [root@server tmp]# time ./tst real 0m8.793s user 0m5.580s sys 0m3.194s [root@server tmp]# that doesn't seems to deviate too much but is indeed quite slow ... my bet would be that we are realloc() 'ating all the time on growing the in-memory output buffer. However I think the problem lies in the lack of reproduceability, i.e. that sometime serializing a 20k buffer should be instantaneous and instead takes 4s. I would suspect that at some point memory becomes short on one of the realloc() leading to some need to free memory somehwere, and all bets are off. Can you confirm your application program is multithreaded which i guess won't help in this case ? I can try to chase the buffer memory allocation to see if the problem is not exacerbated by the fact we are constantly reallocating. if I strace tst on the RHEL box i see a flow of mmap(NULL, 9367552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc3cf45d000 munmap(0x7fc3cfd4c000, 9367552) = 0 mmap(NULL, 9371648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc3d0639000 munmap(0x7fc3cf45d000, 9367552) = 0 mmap(NULL, 9375744, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc3cfd48000 munmap(0x7fc3d0639000, 9371648) = 0 mmap(NULL, 9379840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc3cf456000 munmap(0x7fc3cfd48000, 9375744) = 0 mmap(NULL, 9383936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc3d0636000 munmap(0x7fc3cf456000, 9379840) = 0 mmap(NULL, 9388032, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc3cfd42000 munmap(0x7fc3d0636000, 9383936) = 0 mmap(NULL, 9392128, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc3cf44d000 munmap(0x7fc3cfd42000, 9388032) = 0 while on the F17 box I see read(3, "", 5488) = 0 brk(0) = 0x75b1000 brk(0x75d2000) = 0x75d2000 close(3) = 0 -> end of the read of the input mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f008ef58000 mremap(0x7f008ef58000, 135168, 266240, MREMAP_MAYMOVE) = 0x7f008ef09000 mremap(0x7f008ef09000, 266240, 528384, MREMAP_MAYMOVE) = 0x7f008ee88000 mremap(0x7f008ee88000, 528384, 1052672, MREMAP_MAYMOVE) = 0x7f008ed87000 mremap(0x7f008ed87000, 1052672, 2101248, MREMAP_MAYMOVE) = 0x7f008eb86000 mremap(0x7f008eb86000, 2101248, 4198400, MREMAP_MAYMOVE) = 0x7f008e785000 mremap(0x7f008e785000, 4198400, 8392704, MREMAP_MAYMOVE) = 0x7f008df84000 mremap(0x7f008df84000, 8392704, 16781312, MREMAP_MAYMOVE) = 0x7f008cf83000 exit_group(0) = ? +++ exited with 0 +++ thinkpad:~/XML -> so it seems we only remap 7 times by doing a doubled allocation stategy. If i just change the buffer allocation stategy by adding xmlBufferSetAllocationScheme(buf, XML_BUFFER_ALLOC_DOUBLEIT) after the buf = xmlBufferCreate(); then I'm back to something normal: [root@server tmp]# time ./tst real 0m0.459s user 0m0.370s sys 0m0.088s [root@server tmp]# time ./tst real 0m0.463s user 0m0.375s sys 0m0.086s [root@server tmp]# time ./tst real 0m0.463s user 0m0.382s sys 0m0.080s [root@server tmp]# time ./tst real 0m0.459s user 0m0.378s sys 0m0.080s [root@server tmp]# so while there is something a bit wrong in libxml2 default allocation, the test case did allocate the buffer itself and libxml2 has no hint what it might be used to. The immediate solution on RHEL-6 might simply be to make sure you set the allocation buffer stategy before calling xmlNodeDump() Still to me this pinpoint that realloc() can suddenly randomly take like 4s when it should return instantly (realloc to 20k !!!) and that's a serious flaw of our memory allocator (again guessing multithreading doesn't help), so we may fix most occurences of that problem with the change of allocation stategy, but the bug still lies somewhere and that could still happen on the single or couple reallocs which may be needed to grow the buffer to 20k. Another thing is that you can give an initial memory size to the buffer if you have an idea of the final size by using xmlBufferPtr xmlBufferCreateSize(size_t size); Daniel Agreed that the reproducibility is a PITA. I was hoping that figuring out why this guy's test took too long might fix ours "for free". It does at least appear to have given us a lead to investigate :) The application isn't multi threaded. I can try using XML_BUFFER_ALLOC_DOUBLEIT to see if that helps. Thats certainly one thing that my function would be doing differently (mine grows the buffer by 4k each time) and might help explain the difference. I also have the program print out the xml being dumped when it takes too long. IIRC, when I did last and ran it through something like the original test - it was instantaneous :( I guess this means that memory allocation is almost certainly the core issue here. As another datapoint, I took a typically large xml file from a cluster here and put it through a few thousand read/dump cycles hoping that one might take too long - none did. We can't fix the realloc() random problem (at least here), but we can try to avoid it. I really suggest you change the code by allocating upfront a buffer you know might be sufficient, and changing the allocation strategy to limit the realloc() counts. With those two hopefully you will avoid the issue ! Daniel Here's the results with the default allocator, I'll try double next: Feb 07 04:00:34 BadNews: Feb 7 03:50:04 east-03 pengine[15598]: error: dump_xml: xmlNodeDump() -> 284343bytes took 3s Feb 07 04:00:34 BadNews: Feb 7 03:50:33 east-04 cib[24697]: error: dump_xml: xmlNodeDump() -> 286379bytes took 6s Feb 07 04:00:34 BadNews: Feb 7 03:51:28 east-04 pengine[24702]: error: dump_xml: xmlNodeDump() -> 285169bytes took 3s Feb 07 04:00:35 BadNews: Feb 7 03:51:44 east-07 cib[25766]: error: dump_xml: xmlNodeDump() -> 287632bytes took 3s Feb 07 04:00:35 BadNews: Feb 7 03:52:35 east-07 pengine[25771]: error: dump_xml: xmlNodeDump() -> 285975bytes took 2s Feb 07 04:00:35 BadNews: Feb 7 03:53:17 east-07 pengine[25771]: error: dump_xml: xmlNodeDump() -> 288613bytes took 6s Feb 07 04:00:35 BadNews: Feb 7 03:54:44 east-08 cib[12525]: error: dump_xml: xmlNodeDump() -> 291458bytes took 7s Feb 07 04:00:35 BadNews: Feb 7 03:54:58 east-08 cib[12525]: error: dump_xml: xmlNodeDump() -> 288536bytes took 2s Feb 07 04:00:35 BadNews: Feb 7 03:55:07 east-08 cib[12525]: error: dump_xml: xmlNodeDump() -> 288536bytes took 2s Feb 07 04:00:35 BadNews: Feb 7 03:55:10 east-08 cib[12525]: error: dump_xml: xmlNodeDump() -> 288536bytes took 2s Feb 07 04:00:35 BadNews: Feb 7 03:55:32 east-08 pengine[12530]: error: dump_xml: xmlNodeDump() -> 288428bytes took 2s Feb 07 04:00:36 BadNews: Feb 7 03:56:16 east-09 pengine[14400]: error: dump_xml: xmlNodeDump() -> 293090bytes took 4s Feb 07 04:00:36 BadNews: Feb 7 03:56:19 east-09 pengine[14400]: error: dump_xml: xmlNodeDump() -> 293028bytes took 2s Feb 07 04:00:36 BadNews: Feb 7 03:56:59 east-10 pengine[15879]: error: dump_xml: xmlNodeDump() -> 283719bytes took 2s Feb 07 04:00:36 BadNews: Feb 7 03:57:04 east-11 cib[15646]: error: dump_xml: xmlNodeDump() -> 284082bytes took 3s Feb 07 04:00:36 BadNews: Feb 7 03:57:18 east-11 cib[15646]: error: dump_xml: xmlNodeDump() -> 283876bytes took 2s Feb 07 04:00:36 BadNews: Feb 7 03:57:36 east-11 cib[15646]: error: dump_xml: xmlNodeDump() -> 283945bytes took 2s Feb 07 04:00:36 BadNews: Feb 7 03:57:45 east-12 cib[15395]: error: dump_xml: xmlNodeDump() -> 283945bytes took 3s Feb 07 04:00:36 BadNews: Feb 7 03:57:49 east-12 cib[15395]: error: dump_xml: xmlNodeDump() -> 283945bytes took 4s Feb 07 04:00:36 BadNews: Feb 7 03:57:53 east-12 cib[15395]: error: dump_xml: xmlNodeDump() -> 283945bytes took 4s Feb 07 04:00:36 BadNews: Feb 7 03:57:57 east-12 cib[15395]: error: dump_xml: xmlNodeDump() -> 283945bytes took 3s Feb 07 04:00:36 BadNews: Feb 7 03:58:01 east-12 cib[15395]: error: dump_xml: xmlNodeDump() -> 283945bytes took 3s Feb 07 04:00:36 BadNews: Feb 7 03:58:05 east-12 cib[15395]: error: dump_xml: xmlNodeDump() -> 283945bytes took 4s Feb 07 04:00:36 BadNews: Feb 7 03:58:22 east-12 cib[15395]: error: dump_xml: xmlNodeDump() -> 284014bytes took 2s Feb 07 04:00:37 BadNews: Feb 7 03:58:29 east-13 cib[15069]: error: dump_xml: xmlNodeDump() -> 284220bytes took 4s Feb 07 04:00:37 BadNews: Feb 7 03:59:23 east-14 pengine[14826]: error: dump_xml: xmlNodeDump() -> 283995bytes took 2s Feb 07 04:00:37 BadNews: Feb 7 03:59:25 east-15 cib[14471]: error: dump_xml: xmlNodeDump() -> 284388bytes took 2s Feb 07 04:00:37 BadNews: Feb 7 03:59:57 east-16 cib[13944]: error: dump_xml: xmlNodeDump() -> 284427bytes took 12s Feb 07 04:00:37 BadNews: Feb 7 04:00:15 east-16 pengine[13949]: error: dump_xml: xmlNodeDump() -> 284133bytes took 2s This one came in while the next one was compiling... ouch! Feb 07 04:10:21 BadNews: Feb 7 04:08:44 east-15 crmd[23459]: error: dump_xml: xmlNodeDump() -> 271882bytes took 18s Without XML_BUFFER_ALLOC_DOUBLEIT, xmlNodeDump() took more than 1s on more than 50 occasions in 20 minutes. With it, the cluster has been running for over half an hour and not hit it once. Perhaps XML_BUFFER_ALLOC_DOUBLEIT should be the default? It actually is the default in the current tree. But what are you running on that machine for it to take 12s to allocate 200KB ? Is it swapping ? We are still handling OOM condition extremely poorly :-\ Daniel Hum ... no it wasn't the default, but a double allocation scheme was used in the current tree compared to RHEL. So it became the default for this use case however since recent versions changed the buffer implementation and the new buffer have the new semantic. However the proper solution I think is to switch the buffer to DOUBLEIT for the time it is being used for xmlNodeDump() then restore to previous allocation strategy on the way out. I have commited a patch to do this upstream: http://git.gnome.org/browse/libxml2/commit/?id=23922c536c97e0eaac4bd7fbe68cbf9fceb66b36 but again the code base for RHEL is different especially in that area, so if we were to fix the RHEL behaviour, a different patch should be applied. I'm not sure we need to pursue an errata in RHEL though, as there is a way though the API to change the behaviour, Daniel (In reply to comment #10) > Hum ... no it wasn't the default, but a double allocation scheme was used > in the current tree compared to RHEL. So it became the default for this use > case > however since recent versions changed the buffer implementation and the > new buffer have the new semantic. > However the proper solution I think is to switch the buffer to DOUBLEIT for > the > time it is being used for xmlNodeDump() then restore to previous allocation > strategy on the way out. I have commited a patch to do this upstream: > > http://git.gnome.org/browse/libxml2/commit/ > ?id=23922c536c97e0eaac4bd7fbe68cbf9fceb66b36 Patch makes sense to me. Thanks :) > > but again the code base for RHEL is different especially in that area, so > if we were to fix the RHEL behaviour, a different patch should be applied. > I'm not sure we need to pursue an errata in RHEL though, as there is a way > though the API to change the behaviour, Agreed. (In reply to comment #9) > It actually is the default in the current tree. > But what are you running on that machine for it to take 12s to allocate > 200KB ? > Is it swapping ? > We are still handling OOM condition extremely poorly :-\ I'm almost certain the machine isn't swapping, I will do some tests in the morning to be sure. The nodes are Dell Power Edge SC1435 1U servers with 8GB of RAM, two Dual-Core AMD Opertron 2212 HE CPUs. So not exactly under-spec'd. As for what is running, pretty much just the cluster. Although that can be more than you'd think when 16 nodes start hammering the config system with status updates. So at this point my suggestion is simply to close this bug or reassign it to the cluster component, the fix associated being the patch to make the buffer use the doubleit allocation mode. For libxml2 there isn't much we can do reasonably except say it should be fixed in upstream version. Right ? Daniel Daniel, Andrew, Are the mmap calls in the trace taking a really long time to complete? Is the full trace available somewhere? It would be very strange if realloc is not be calling mremap on RHEL6, that is something the C library should have been doing for years... Created attachment 696347 [details]
zipped trace of the test run
That's actually a Centos box, but that should not fundamentally change
the output:
[root@server ~]# uname -a
Linux server.veillard.com 2.6.32-279.22.1.el6.x86_64 #1 SMP Wed Feb 6 03:10:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@server tmp]# gcc -o tst -I/usr/include/libxml2 tst.c -lxml2
[root@server tmp]# strace -o tst.trace ./tst
[root@server tmp]# ls -l tst.trace
-rw-r--r--. 1 root root 497758 Feb 12 10:52 tst.trace
[root@server tmp]# gzip tst.trace
[root@server tmp]# ls -l tst.trace.gz
-rw-r--r--. 1 root root 45826 Feb 12 10:52 tst.trace.gz
[root@server tmp]#
(In reply to comment #15) > So at this point my suggestion is simply to close this bug or reassign it > to the cluster component, the fix associated being the patch to > make the buffer use the doubleit allocation mode. For libxml2 > there isn't much we can do reasonably except say it should be fixed in > upstream version. > > Right ? Agreed, unless you want to see if mmap is causing problems too. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/ |