Red Hat Bugzilla – Bug 1096744
Increased memory requirements of MRG-M 3.0, as compared to 2.3
Last modified: 2015-04-14 09:48:03 EDT
Description of problem: Comparing memory requirements of MRG-M 3.0 against latest MRG 2 broker (0.18-20), I see 3.0 consumes up to 4 times more (!) memory in one scenario. Basic use case: have X (like 100) queues bound to a fanout exchange. Send Y (few thousands of) messages to the exchange and compare memory consumed by qpidd. The more queues you use, the worse 3.0 behaves. The more messages you send (for considerable high X), the worse 3.0 behaves. Version-Release number of selected component (if applicable): 0.22-38 How reproducible: 100% Steps to Reproduce: 1. Have a script like: service qpidd restart queues=$1 durable=$2 messages=$3 size=$4 for i in $(seq 1 $queues); do qpid-receive -a "queue_${i}; {create:always, node:{type:queue, durable:${durable}, x-bindings:[{ exchange:'amq.fanout', queue:'queue_${i}' }], x-declare:{ arguments:{ 'qpid.file_size':640 }}}}" & done wait qpid-send -a "amq.fanout" -m ${messages} --durable=yes --content-size=${size} ps aux | grep qpid | grep daemon 2. Run the script against 0.18-20 and 0.22-* broker with parameters like: 100 false 1000 0 100 false 100000 0 500 false 1000 0 500 false 10000 0 Actual results: 0.22-* broker shows in "ps" output (RSS column) much higher memory requirements. Expected results: Similar memory consumtion of 3.0 broker. Additional info: Few observations from various tests I run: - when having just 1 queue, memory requirements of 0.22 are usually little bit _lower_ than 0.18. So in general, 0.22 manages memory better. - The more queues you use, the worse 3.0 behaves. The more messages you send (for considerable high X), the worse 3.0 behaves. - message content length does not have visible impact on the issue (len=0 and len=1024 tested) - message durability has no visible impact as well Raw data from my tests: #queues durable msgs msg_len 0.18-20 0.22-38 0.22/0.18 1 false 1000 0 9604 9900 103.1% 1 false 10000 0 22084 19204 87.0% 1 false 100000 0 126200 111400 88.3% 1 false 100000 1024 235852 218260 92.5% 1 true 100000 0 138696 122652 88.4% 1 true 100000 1024 245504 233564 95.1% 100 false 100000 0 369188 1333984 361.3% 100 false 100000 1024 477516 1443052 302.2% 100 false 1000 0 15512 25732 165.9% 100 false 1000 1024 16732 24800 148.2% 100 true 1000 0 112052 180872 161.4% 100 true 1000 1024 182912 230932 126.3% 500 false 1000 0 28676 85124 296.8% 500 false 1000 1024 29760 85832 288.4% 500 false 1000 0 28676 85124 296.8% copy 500 false 2000 0 44240 147404 333.2% 500 false 3000 0 57840 208868 361.1% 500 false 4000 0 78864 271024 343.7% 500 false 5000 0 93520 345960 369.9% 500 false 6000 0 101724 399152 392.4% 500 false 7000 0 123748 475364 384.1% 500 false 8000 0 134720 533952 396.3% 500 false 9000 0 146372 593464 405.4% 500 false 10000 0 155136 651360 419.9% 500 false 10000 1024 161764 661336 408.8% 100 false 1000 0 15512 25732 165.9% copy 200 false 1000 0 19768 42584 215.4% 300 false 1000 0 23960 59692 249.1% 400 false 1000 0 27964 70836 253.3% 500 false 1000 0 28676 85124 296.8% copy
This is somehow related to https://bugzilla.redhat.com/show_bug.cgi?id=867826. Comparing memory requirements in the use case before and after restart between 0.18-20 and 0.22-38, I get interesting results: 10queues, 100000msgs each, before restart 275672 339944 10queues, 100000msgs each, after restart 1324916 1160520 20queues, 100000msgs each, before restart 444836 582364 20queues, 100000msgs each, after restart 2642236 2308496 100queues, 10000msgs each, before restart 342388 442552 100queues, 10000msgs each, after restart 1424352 1260080 100queues, 10000msgs(1024b) each, before restart 377520 469464 100queues, 10000msgs(1024b) each, after restart 2500696 2331000 So, while memory consumption of 3.0 before broker restart is higher, it is lower after a restart. Message size does not affect that. All tests were run with legacy store. This observation somehow put less stress in fixing this BZ. Anyway I would like to get explanation for the phenomenon (higher RAM usage before restart).
The issue is in the amount of message state that is shared between the queues. Before a restart, the 0.18 based code shares everything. This is in fact a bug. Certain changes are supposed to be specific to the message on a specific queue and do not apply to the same message on other queues. This is fixed in 0.22 based builds which is why the memory is higher. However, while we cannot eliminate any extra memory (even one 64 bit int, when present in 500 copies of 10000 different messages) I think we can improve on the amount extra memory used. On recovering from disk, messages on queues share no state even if they originally represented the same message.
(In reply to Gordon Sim from comment #3) > The issue is in the amount of message state that is shared between the > queues. Before a restart, the 0.18 based code shares everything. This is in > fact a bug. Certain changes are supposed to be specific to the message on a > specific queue and do not apply to the same message on other queues. > > This is fixed in 0.22 based builds which is why the memory is higher. > However, while we cannot eliminate any extra memory (even one 64 bit int, > when present in 500 copies of 10000 different messages) I think we can > improve on the amount extra memory used. > > On recovering from disk, messages on queues share no state even if they > originally represented the same message. Thanks a lot for this sound explanation, really appreciated. Due to Comment #2 (https://bugzilla.redhat.com/show_bug.cgi?id=1096744#c2), this BZ turns rather to request for explanation. With one issue explained, it remains to answer: Why the ratio "0.22_memory_utilization / 0.18_memory_utilization" gets worse when enqueueing more messages? As e.g.: - for 500 queues, 1000 messages sent, 0.18 has 28676 RSS, 0.22 has 85124, i.e. 3times more; - same setup but sending 10000 messages, 0.18 has 155136 RSS and 0.22 has 651360, i.e. 4times more - why not "only" 3times more like for 1000 msgs?
Some improvements checked in upstream: https://svn.apache.org/r1597121
Pavel, would you rerun your tests to find out what difference Gordon's changes made? (In reply to Gordon Sim from comment #5) > Some improvements checked in upstream: https://svn.apache.org/r1597121
(In reply to Justin Ross from comment #6) > Pavel, would you rerun your tests to find out what difference Gordon's > changes made? > > (In reply to Gordon Sim from comment #5) > > Some improvements checked in upstream: https://svn.apache.org/r1597121 I already did such comparison - in very general, for use cases with few hundreds of queues bound to a fanout exchange, Gordon's improvement decreases memory usage by half. See csv data: #queues durable msgs msg_len;0.18-20;0.22-38;0.22/0.18;Upstream (r1601656);Upstream/0.18;Upstream/0.22 1 false 1000 0;9604;9900;103.1%;;; 1 false 10000 0;22084;19204;87.0%;;; 1 false 100000 0;126200;111400;88.3%;107968;85.6%;96.9% 1 false 100000 1024;235852;218260;92.5%;;; 1 true 100000 0;138696;122652;88.4%;;; 1 true 100000 1024;245504;233564;95.1%;;; 100 false 100000 0;369188;1333984;361.3%;602052;163.1%;45.1% 100 false 100000 1024;477516;1443052;302.2%;681876;142.8%;47.3% 100 false 1000 0;15512;25732;165.9%;17752;114.4%;69.0% 100 false 1000 1024;16732;24800;148.2%;;; 100 true 1000 0;112052;180872;161.4%;;; 100 true 1000 1024;182912;230932;126.3%;;; 500 false 1000 0;28676;85124;296.8%;48124;167.8%;56.5% 500 false 1000 1024;29760;85832;288.4%;51376;172.6%;59.9% 500 false 1000 0;28676;85124;296.8%;48124;167.8%;56.5% 500 false 2000 0;44240;147404;333.2%;75800;171.3%;51.4% 500 false 3000 0;57840;208868;361.1%;100572;173.9%;48.2% 500 false 4000 0;78864;271024;343.7%;125528;159.2%;46.3% 500 false 5000 0;93520;345960;369.9%;155612;166.4%;45.0% 500 false 6000 0;101724;399152;392.4%;181108;178.0%;45.4% 500 false 7000 0;123748;475364;384.1%;207596;167.8%;43.7% 500 false 8000 0;134720;533952;396.3%;231176;171.6%;43.3% 500 false 9000 0;146372;593464;405.4%;256724;175.4%;43.3% 500 false 10000 0;155136;651360;419.9%;281784;181.6%;43.3% 500 false 10000 1024;161764;661336;408.8%;293632;181.5%;44.4% 100 false 1000 0;15512;25732;165.9%;17752;114.4%;69.0% 200 false 1000 0;19768;42584;215.4%;23592;119.3%;55.4% 300 false 1000 0;23960;59692;249.1%;37968;158.5%;63.6% 400 false 1000 0;27964;70836;253.3%;45012;161.0%;63.5% 500 false 1000 0;28676;85124;296.8%;48124;167.8%;56.5% Let me know if you are also interested in memory utilization after broker restart (though that is rather a topic of bz867826 ([RFE] QPid memory usage is not consistent across restart)).
Quoting the figures above, kudos to Gordon to decrease memory utilization by 50% (in the relevant use cases)!
Thanks, Pavel and Gordon. Given comment 3, the memory increases still existing fall in the expected range. -> POST
I have results summarized: :: x86_64,c++ ---------------- + ---------------- + ---------------- + ---------------- Qcnt,Mcnt,Msize | 18-36.el6.x86_64 | 30-5.el6.x86_64 | result ---------------- + ---------------- + ---------------- + ---------------- 1,10000,1024 | 32636 | 30952 | 0.948400539282 1,100000,0 | 125668 | 110016 | 0.875449597352 10,100000,0 | 147856 | 154664 | 1.04604480035 100,10000,0 | 47592 | 72200 | 1.51706169104 1,10000,0 | 19428 | 19664 | 1.0121474161 10,1000,2048 | 13152 | 14108 | 1.07268856448 1,10000,2048 | 40604 | 45056 | 1.10964437001 100,100000,0 | 371368 | 600272 | 1.61638051744 10,1000,1024 | 13744 | 14772 | 1.07479627474 10,10000,1024 | 32828 | 35764 | 1.08943584745 500,1000,2048 | 42276 | 49236 | 1.16463241555 100,10000,2048 | 68968 | 93504 | 1.35575919267 100,10000,1024 | 60844 | 83468 | 1.37183617119 100,1000,1024 | 22232 | 22552 | 1.01439366679 500,1000,1024 | 42188 | 50780 | 1.20365980848 10,10000,0 | 22244 | 24460 | 1.09962237008 500,10000,0 | 150160 | 276788 | 1.84328716036 500,10000,2048 | 175308 | 304136 | 1.73486663472 500,10000,1024 | 161340 | 288024 | 1.78519895872 10,10000,2048 | 42940 | 45720 | 1.06474149977 1,1000,2048 | 11920 | 12912 | 1.08322147651 1,1000,1024 | 15192 | 11984 | 0.788836229595 100,1000,2048 | 25832 | 23664 | 0.916073087643 ---------------- + ---------------- + ---------------- + ---------------- :: i686,c++ ---------------- + ---------------- + ---------------- + ---------------- Qcnt,Mcnt,Msize | 18-36.el6.i686 | 30-5.el6.i686 | result ---------------- + ---------------- + ---------------- + ---------------- 1,10000,1024 | 25620 | 25600 | 0.999219359875 1,100000,0 | 82352 | 73152 | 0.888284437536 10,100000,0 | 97324 | 105840 | 1.08750154124 100,10000,0 | 34264 | 52728 | 1.53887462059 1,10000,0 | 14072 | 15000 | 1.06594656055 10,1000,2048 | 10908 | 11988 | 1.09900990099 1,10000,2048 | 35672 | 35624 | 0.998654406818 100,100000,0 | 244564 | 433184 | 1.77125006133 10,1000,1024 | 9588 | 11000 | 1.14726741761 10,10000,1024 | 26860 | 29140 | 1.08488458675 500,1000,2048 | 32396 | 39296 | 1.21298925793 100,10000,2048 | 53156 | 73672 | 1.38595831139 100,10000,1024 | 43796 | 63524 | 1.45045209608 100,1000,1024 | 14544 | 17312 | 1.1903190319 500,1000,1024 | 34828 | 36616 | 1.0513380039 10,10000,0 | 15960 | 18420 | 1.15413533835 500,10000,0 | 108108 | 202148 | 1.86987086987 500,10000,2048 | 124044 | 223112 | 1.79865209119 500,10000,1024 | 118324 | 214132 | 1.80970893479 10,10000,2048 | 36800 | 39144 | 1.06369565217 1,1000,2048 | 10408 | 11456 | 1.10069177556 1,1000,1024 | 8936 | 10436 | 1.1678603402 100,1000,2048 | 15652 | 18152 | 1.15972399693 ---------------- + ---------------- + ---------------- + ---------------- so basically between 0.18 and 0.30 the memory increment reside in 0.9~2.0 interval, which is acceptable In my opinion. Could you please check Gordon and Pavel ?
Also, the cases where there is between 50% and 100% increase are those with 100 or more queues and at least 10000 messages, i.e. 1 million message 'instances'. Ultimately what is acceptable will depend on use cases I guess, but I think this is a 'reasonable' situation.
(In reply to Gordon Sim from comment #13) > Also, the cases where there is between 50% and 100% increase are those with > 100 or more queues and at least 10000 messages, i.e. 1 million message > 'instances'. Ultimately what is acceptable will depend on use cases I guess, > but I think this is a 'reasonable' situation. I agree. Definitely there is a significant improvement from "Raw data from my tests:" table in c#0 (300% or 400% mem.utilization). Some increase is acceptable drawback of the way the broker deals with messages over multiple queues.
Reran with latest qpid-cpp-server-0.30-7 and results are still very satisfiable. marking this as VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2015-0805.html