Bug 478882

Summary:

Feature: Async DR replication throughput

Product:

Red Hat Enterprise MRG

Reporter:

Carl Trieloff <cctrieloff>

Component:

qpid-cpp

Assignee:

Gordon Sim <gsim>

Status:

CLOSED ERRATA

QA Contact:

Frantisek Reznicek <freznice>

Severity:

medium

Docs Contact:

Priority:

urgent

Version:

1.1

CC:

esammons, iboverma

Target Milestone:

1.1.1

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-04-21 16:16:28 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

441698

Bug Blocks:

Attachments:

Description	Flags
Patch to latency test for verifying replication performance	none
bz478882_test_results II	none
bz478882_test_results III (mentioned in previous comment)	none

Description Carl Trieloff 2009-01-05 20:28:04 UTC

Async queue replication (DR) needs to be able to replicate 25k msg/ second 500 byte size to the replica with a median latency below 2 seconds (from 6000 queues). the 25k is across the queues.

Comment 1 Gordon Sim 2009-01-26 19:07:57 UTC

Created attachment 330023 [details]
Patch to latency test for verifying replication performance

I used the attached patch to modify latency test for measuring throughput and latency of replication.

I set up replication from one broker to another as described in http://cwiki.apache.org/confluence/display/qpid/queue+state+replication (also in the feature BZ on which this depends).

I then created a latency-test-1 queue on both brokers, specifying event generation  as 1 (enqueue events only) for the primary brokers copy of the queue.

[Example:

../python/commands/qpid-config -a "localhost:5555" add queue latency-test-1 --generate-queue-events 1
../python/commands/qpid-config -a "localhost:6666" add queue latency-test-1
../python/commands/qpid-config -a "localhost:6666" add exchange replication replication
../python/commands/qpid-route queue add localhost:6666 localhost:5555 replication replication
]

I then ran a patched latencytest instance against the primary broker (--port 5555 in above example config) with --rate 30000 and another instance of latencytest running against the backup broker (--port 6666 in above example config) with the same options and an additional --receive option to make it only read in the messages (which have been replicated from the primary).

Latency was well within the required limits at this throughput.

Comment 2 Gordon Sim 2009-01-26 19:10:16 UTC

I believe the solution to be capable of this requirement on throughput and latency for the replication link itself. The impact on the general latency for a clustered source broker latency should be evaluated still.

Comment 4 Frantisek Reznicek 2009-03-04 14:06:15 UTC

Please find below two ADR throughput measurements:

Configuration is common:
main broker (A) ran with --replication-queue replication-queue --create-replication-queue true, the backup broker (B) w/o special options
Both qpidd instances configured with all pluging (including management and store) and no clustering is performed (openais down)
modified latencytest params:
q_cnt=1
msg_size=500
msg_cnt=1000
msg_rate=25000

1st latencytest on main broker which does deplication

> ./latencytest -p $host1port --size ${msg_size} --queues ${q_cnt} --count ${msg_cnt} --rate ${msg_rate} --report-frequency 5000

2nd latencytest on backup broker

> ./latencytest -p $host2port --size ${msg_size} --queues ${q_cnt} --count ${msg_cnt} --rate ${msg_rate} --report-frequency 5000 --receive


[alpha] RHEL 5.2 mrg4 (A) vs mrg3 (B)


- host A -

[root@mrg4 _bzs]#
[root@mrg4 _bzs]# export PYTHONPATH=/root/_bzs/qpid/python
[root@mrg4 _bzs]#
[root@mrg4 _bzs]# host2name="mrg3.lab.bos.redhat.com"
[root@mrg4 _bzs]# host1name="mrg4.lab.bos.redhat.com"
[root@mrg4 _bzs]# host1port=5555
[root@mrg4 _bzs]# host2port=6666
[root@mrg4 _bzs]#
[root@mrg4 _bzs]# host1p="${host1name}:${host1port}"
[root@mrg4 _bzs]# host2p="${host2name}:${host2port}"
[root@mrg4 _bzs]#
[root@mrg4 _bzs]# q_cnt=1
[root@mrg4 _bzs]# msg_size=500
[root@mrg4 _bzs]# msg_cnt=1000
[root@mrg4 _bzs]# msg_rate=25000
[root@mrg4 _bzs]#
[root@mrg4 _bzs]# rm -rf data qpidd.log
[root@mrg4 _bzs]#   qpidd --auth no --replication-queue replication-queue --create-replication-queue true --data-dir data --log-enable  info+ -p $host1port >qpidd.log 2>qpidd.log &
[1] 6531

[root@mrg4 _bzs]# ./latencytest -p $host1port --size ${msg_size} --queues ${q_cnt} --count ${msg_cnt} --rate ${msg_rate} --report-frequency 5000
Latency(ms): min=0.507, max=81.34, avg=5.40698
Latency(ms): min=0.45, max=9.46, avg=2.20908
Latency(ms): min=0.466, max=42.366, avg=3.07864
Latency(ms): min=0.368, max=7.796, avg=2.14092
Latency(ms): min=0.513, max=6.843, avg=2.25006
Latency(ms): min=0.254, max=42.044, avg=2.58061
Latency(ms): min=0.49, max=44.48, avg=2.57329
Latency(ms): min=0.605, max=9.323, avg=2.31233
Latency(ms): min=0.436, max=43.783, avg=4.0789
Latency(ms): min=0.281, max=45.193, avg=3.12112
Latency(ms): min=0.165, max=44.735, avg=3.51132
Latency(ms): min=0.511, max=45.181, avg=2.79409
Latency(ms): min=0.484, max=43.716, avg=2.73812
Latency(ms): min=0.322, max=12.246, avg=2.37017
Latency(ms): min=0.503, max=43.886, avg=2.79589
Latency(ms): min=0.53, max=42.86, avg=2.9728
Latency(ms): min=0.419, max=42.624, avg=3.27404
Latency(ms): min=0.45, max=42.074, avg=3.35103
Latency(ms): min=0.516, max=43.718, avg=3.78374
Latency(ms): min=0.413, max=44.856, avg=3.27581
Latency(ms): min=0.508, max=42.989, avg=2.96591
Latency(ms): min=0.286, max=44.827, avg=3.46171
Latency(ms): min=0.152, max=44.687, avg=4.6114
Latency(ms): min=0.46, max=48.316, avg=4.83911
Latency(ms): min=0.474, max=45.327, avg=2.97223
Latency(ms): min=0.428, max=75.007, avg=7.15479
Latency(ms): min=0.469, max=43.71, avg=3.68349
Latency(ms): min=0.397, max=47.749, avg=5.64584
Latency(ms): min=0.462, max=44.277, avg=5.00946
Latency(ms): min=0.407, max=43.431, avg=3.15999
Latency(ms): min=0.437, max=77.548, avg=6.1353
Latency(ms): min=0.46, max=70.18, avg=5.46327
Latency(ms): min=0.481, max=24.26, avg=2.94559
Latency(ms): min=0.235, max=43.195, avg=3.57899
Latency(ms): min=0.563, max=129.894, avg=10.7743
Latency(ms): min=0.476, max=97.441, avg=7.90055
Latency(ms): min=0.242, max=10.14, avg=2.58521
Latency(ms): min=0.534, max=45.251, avg=3.01295
Latency(ms): min=0.347, max=43.786, avg=3.80542
Latency(ms): min=0.352, max=43.211, avg=3.51108
Latency(ms): min=0.481, max=44.031, avg=4.41895
Latency(ms): min=0.202, max=45.272, avg=4.81373
Latency(ms): min=0.376, max=42.963, avg=3.67745
Latency(ms): min=0.47, max=47.275, avg=3.79439


- host B -
[root@mrg3 _bzs]#
[root@mrg3 _bzs]# export PYTHONPATH=/root/_bzs/qpid/python
[root@mrg3 _bzs]#
[root@mrg3 _bzs]# host2name="mrg3.lab.bos.redhat.com"
[root@mrg3 _bzs]# host1name="mrg4.lab.bos.redhat.com"
[root@mrg3 _bzs]# host1port=5555
[root@mrg3 _bzs]# host2port=6666
[root@mrg3 _bzs]#
[root@mrg3 _bzs]# host1p="${host1name}:${host1port}"
[root@mrg3 _bzs]# host2p="${host2name}:${host2port}"
[root@mrg3 _bzs]#
[root@mrg3 _bzs]# q_cnt=1
[root@mrg3 _bzs]# msg_size=500
[root@mrg3 _bzs]# msg_cnt=1000
[root@mrg3 _bzs]# msg_rate=25000
[root@mrg3 _bzs]#
[root@mrg3 _bzs]# rm -rf data qpidd.log
[root@mrg3 _bzs]#   qpidd --auth no --data-dir data --log-enable info+ -p $host2port >qpidd.log 2>qpidd.log &
[1] 8420
[root@mrg3 _bzs]# python ${PYTHONPATH}/commands/qpid-config -a ${host2p} add exchange replication replication-exchange
[root@mrg3 _bzs]#
[root@mrg3 _bzs]# python ${PYTHONPATH}/commands/qpid-route queue add ${host2p} ${host1p} replication-exchange replication-queue
[root@mrg3 _bzs]#
[root@mrg3 _bzs]# for ((i=1;i<=${q_cnt};i++)); do
>   python ${PYTHONPATH}/commands/qpid-config -a ${host1p} add queue latency-test-${i} --generate-queue-events 1
>   python ${PYTHONPATH}/commands/qpid-config -a ${host2p} add queue latency-test-${i}
>   echo -n "."
> done
.[root@mrg3 _bzs]# echo

[root@mrg3 _bzs]#   ./latencytest -p $host2port --size ${msg_size} --queues ${q_cnt} --count ${msg_cnt} --rate ${msg_rate} --report-frequency 5000 --receive
Latency(ms): min=33739.1, max=34041, avg=33897.6
Latency(ms): min=34039.4, max=35698.4, avg=34848.2
Latency(ms): min=35696.9, max=37259.5, avg=36501.9
Latency(ms): min=37258.6, max=38857.1, avg=38083
Latency(ms): min=38858.3, max=40494.9, avg=39664.8
Latency(ms): min=40494.8, max=42152.9, avg=41311.7
Latency(ms): min=42153.1, max=43702.8, avg=42930.4
Latency(ms): min=43701.6, max=45204.7, avg=44426.7
Latency(ms): min=45204.7, max=46770, avg=46015.2
Latency(ms): min=46768.5, max=48289.2, avg=47546.3
Latency(ms): min=48297.9, max=49902.1, avg=49107.9
Latency(ms): min=49903, max=51466.5, avg=50715.2
Latency(ms): min=51465.9, max=52968.2, avg=52249
Latency(ms): min=52966.6, max=54608, avg=53788.6
Latency(ms): min=54608.1, max=55471.6, avg=55186.5
Latency(ms): min=55293.9, max=55447, avg=55370
Latency(ms): min=55245.8, max=55435.4, avg=55360.2
Latency(ms): min=55209.6, max=55501.6, avg=55322.3
Latency(ms): min=55253.7, max=55518.5, avg=55393.2
Latency(ms): min=55354.4, max=55592.9, avg=55466.1
Latency(ms): min=55396.6, max=55663, avg=55512.3
Latency(ms): min=55470.7, max=55686.3, avg=55580.7
Latency(ms): min=55535.3, max=55812.2, avg=55616.7
Latency(ms): min=55742.2, max=55988.3, avg=55885.9
Latency(ms): min=55846.9, max=56300.8, avg=56093.4
Latency(ms): min=56091.8, max=56378.8, avg=56240
Latency(ms): min=56216.7, max=56387.4, avg=56314.9
Latency(ms): min=56267.9, max=56382.4, avg=56320.1
Latency(ms): min=56324.2, max=56462.4, avg=56401.3
Latency(ms): min=56338.3, max=56522.3, avg=56443.8
Latency(ms): min=56416.1, max=56835.9, avg=56619.3
Latency(ms): min=56787.8, max=57365.8, avg=57085.9
Latency(ms): min=57345.5, max=57710.4, avg=57559.1
Latency(ms): min=57679.3, max=58006.5, avg=57798.9
Latency(ms): min=57942.8, max=58141.3, avg=58041.5
Latency(ms): min=57966.4, max=58428.6, avg=58207
Latency(ms): min=58292.1, max=58453.1, avg=58374
Latency(ms): min=58284.3, max=58751.5, avg=58551.4
Latency(ms): min=58728.4, max=58885.8, avg=58824.5
Latency(ms): min=58863.8, max=59074.8, avg=59000
Latency(ms): min=58952.2, max=59332.1, avg=59143.5
Latency(ms): min=59272.7, max=59716.7, avg=59505.4
Latency(ms): min=59622.1, max=59959.9, avg=59834.8
Latency(ms): min=59843.1, max=60186.2, avg=60030.7
Latency(ms): min=60181.6, max=60539.6, avg=60394.7


[beta] RHEL 5.3 ibm-firefly vs hp-xw4800-01 (also included dummy run before config)

- host A -
[root@ibm-firefly _bz478882]# export PYTHONPATH=/root/qpid/python
[root@ibm-firefly _bz478882]#
[root@ibm-firefly _bz478882]# host2name="hp-xw4800-01.rhts.bos.redhat.com"
[root@ibm-firefly _bz478882]# host1name="ibm-firefly.rhts.bos.redhat.com"
[root@ibm-firefly _bz478882]# host1port=5555
[root@ibm-firefly _bz478882]# host2port=6666
[root@ibm-firefly _bz478882]#
[root@ibm-firefly _bz478882]# host1p="${host1name}:${host1port}"
[root@ibm-firefly _bz478882]# host2p="${host2name}:${host2port}"
[root@ibm-firefly _bz478882]#
[root@ibm-firefly _bz478882]# q_cnt=1
[root@ibm-firefly _bz478882]# msg_size=500
[root@ibm-firefly _bz478882]# msg_cnt=1000
[root@ibm-firefly _bz478882]# msg_rate=25000
[root@ibm-firefly _bz478882]#
[root@ibm-firefly _bz478882]# rm -rf data qpidd.log
[root@ibm-firefly _bz478882]#   qpidd --auth no --replication-queue replication-queue --create-replication-queue true --data-dir data --log-enable  info+ -p $host1port >qpidd.log 2>qpidd.log &
[1] 604
[root@ibm-firefly _bz478882]# ./latencytest --rate 30000 -p $host1port
Latency(ms): min=0.905858, max=43.9862, avg=7.903
Latency(ms): min=1.18764, max=4.25857, avg=1.99007
... dummy run ...
Latency(ms): min=1.15519, max=4.51908, avg=1.97551
Latency(ms): min=0.941836, max=4.45373, avg=2.04207
Latency(ms): min=1.1928, max=4.21926, avg=1.96785
Latency(ms): min=1.18429, max=3.49097, avg=1.96424
Latency(ms): min=1.20277, max=3.66784, avg=1.99794
Latency(ms): min=1.16898, max=3.19366, avg=1.95455

[root@ibm-firefly _bz478882]# python ${PYTHONPATH}/commands/qpid-config -a ${host2p} add exchange replication replication-exchange
[root@ibm-firefly _bz478882]#
[root@ibm-firefly _bz478882]# python ${PYTHONPATH}/commands/qpid-route queue add ${host2p} ${host1p} replication-exchange replication-queue
[root@ibm-firefly _bz478882]#
[root@ibm-firefly _bz478882]# for ((i=1;i<=${q_cnt};i++)); do
>   python ${PYTHONPATH}/commands/qpid-config -a ${host1p} add queue latency-test-${i} --generate-queue-events 1
>   python ${PYTHONPATH}/commands/qpid-config -a ${host2p} add queue latency-test-${i}
>   echo -n "."
> done
.[root@ibm-firefly _bz478882]# echo

[root@ibm-firefly _bz478882]# ./latencytest -p $host1port --size ${msg_size} --queues ${q_cnt} --count ${msg_cnt} --rate ${msg_rate} --report-frequency 5000
Latency(ms): min=0.668043, max=84.621, avg=14.4539
Latency(ms): min=0.722328, max=53.8368, avg=11.6958
Latency(ms): min=0.661594, max=50.1167, avg=9.86539
Latency(ms): min=0.70123, max=32.561, avg=6.86544
Latency(ms): min=0.669104, max=47.8238, avg=6.9541
Latency(ms): min=0.634969, max=20.75, avg=3.03517
Latency(ms): min=0.499173, max=7.73909, avg=2.46665
Latency(ms): min=0.516198, max=10.7712, avg=2.75924
Latency(ms): min=0.552409, max=11.4303, avg=2.49702
Latency(ms): min=0.56256, max=7.50214, avg=2.67156
Latency(ms): min=0.450299, max=8.13226, avg=2.72505
Latency(ms): min=0.610913, max=9.56747, avg=2.55677



- host B -

[root@hp-xw4800-01 _bz478882]# export PYTHONPATH=/root/qpid/python
[root@hp-xw4800-01 _bz478882]#
[root@hp-xw4800-01 _bz478882]# host2name="hp-xw4800-01.rhts.bos.redhat.com"
[root@hp-xw4800-01 _bz478882]# host1name="ibm-firefly.rhts.bos.redhat.com"
[root@hp-xw4800-01 _bz478882]# host1port=5555
[root@hp-xw4800-01 _bz478882]# host2port=6666
[root@hp-xw4800-01 _bz478882]#
[root@hp-xw4800-01 _bz478882]# host1p="${host1name}:${host1port}"
[root@hp-xw4800-01 _bz478882]# host2p="${host2name}:${host2port}"
[root@hp-xw4800-01 _bz478882]#
[root@hp-xw4800-01 _bz478882]# q_cnt=1
[root@hp-xw4800-01 _bz478882]# msg_size=500
[root@hp-xw4800-01 _bz478882]# msg_cnt=1000
[root@hp-xw4800-01 _bz478882]# msg_rate=25000
[root@hp-xw4800-01 _bz478882]#
[root@hp-xw4800-01 _bz478882]# rm -rf data qpidd.log
[root@hp-xw4800-01 _bz478882]#   qpidd --auth no --data-dir data --log-enable info+ -p $host2port >qpidd.log 2>qpidd.log &
[1] 27627
[root@hp-xw4800-01 _bz478882]# ./latencytest --rate 30000 -p $host1port
Connection refused: localhost:5555 (qpid/sys/posix/Socket.cpp:159)
[root@hp-xw4800-01 _bz478882]# ./latencytest --rate 30000 -p $host2port
Latency(ms): min=3.742, max=478.974, avg=215.356
Latency(ms): min=243.783, max=422.588, avg=330.866
Latency(ms): min=2.534, max=33.805, avg=8.26378
... dummy run ...
Latency(ms): min=2.394, max=33.495, avg=8.99774
Latency(ms): min=2.238, max=34.331, avg=8.69972

[root@hp-xw4800-01 _bz478882]# ./latencytest -p $host2port --size ${msg_size} --queues ${q_cnt} --count ${msg_cnt} --rate ${msg_rate} --report-frequency 5000 --receive
Latency(ms): min=97285.2, max=98842.4, avg=98090.1
Latency(ms): min=98842.4, max=100524, avg=99670.2
Latency(ms): min=100524, max=102209, avg=101333
Latency(ms): min=102208, max=103827, avg=102993
Latency(ms): min=103829, max=105353, avg=104610
Latency(ms): min=105351, max=106363, avg=105932
Latency(ms): min=106362, max=107229, avg=106817
Latency(ms): min=107227, max=107861, avg=107394
Latency(ms): min=1.79769e+308, max=0, avg=nan
Latency(ms): min=1.79769e+308, max=0, avg=nan
Latency(ms): min=1.79769e+308, max=0, avg=nan
Latency(ms): min=1.79769e+308, max=0, avg=nan


Actually I'm still not convinced whether results I posted above are reaching the target.

Could you possibly review Gordon, please?

Comment 5 Gordon Sim 2009-03-04 14:15:12 UTC

Were both instances of latencytest running on the same host (regardless of the host of the broker they are connected to)? That will rule out any clock syncing issues.

Comment 6 Frantisek Reznicek 2009-03-05 14:52:26 UTC

Created attachment 334134 [details]
bz478882_test_results II

Re-testing done on strong machines see machine desc. below.

There are attached 3 cases:
1] single queue replication
2] 5 queues replication
3] 10 queues replication

The qpidd was started with --log-enable error+ to avoid latencydrop due to logging. Also latencytest instances are run from one machine (where is run also main broker - replication one). All broker modules enabled [management=on, store=on].


[09:40:32]     os: Red Hat Enterprise Linux Server release 5.3 (Tikanga)
[09:40:32]  uname: Linux ibm-mongoose.rhts.bos.redhat.com 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux
[09:40:32] uptime:  09:40:32 up 2 days, 23:00,  1 user,  load average: 0.00, 0.10, 0.26
[09:40:32] whoami: root (USER:root)
[09:40:32]   date: 2009-03-05 09:40:32 1236264032
[09:40:32]    pwd: /root/qpid_test_common_mrg_install
[09:40:32]     df: /dev/mapper/VolGroup00-LogVol00   31G  2.3G   28G   8% /
[09:40:32]      w: root     pts/0    dhcp-lab-200.eng Mon07    0.00s  6.83s  0.05s -bash
[09:40:32] get_cpu_info():CPU information:
processor       : 0 1 2 3 4 5 6 7
vendor_id       : GenuineIntel
model name      : Intel(R) Xeon(R) CPU           E5450  @ 3.00GHz
cpu MHz         : 3000.428
cpu cores       : 4
bogomips        : 6002.49 5999.34 5999.39 5999.37 5999.38 5999.39 5999.40 5999.40


[08:33:19]     os: Red Hat Enterprise Linux Server release 5.3 (Tikanga)
[08:33:19]  uname: Linux dell-pem600-01.rhts.bos.redhat.com 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux
[08:33:19] uptime:  08:33:19 up 45 min,  1 user,  load average: 0.18, 0.05, 0.01
[08:33:19] whoami: root (USER:root)
[08:33:19]   date: 2009-03-05 08:33:19 1236259999
[08:33:19]    pwd: /root/qpid_test_common_mrg_install
[08:33:19]     df: /dev/mapper/VolGroup00-LogVol00   62G  1.7G   57G   3% /
[08:33:19]      w: root     pts/0    dhcp-lab-200.eng 08:21    0.00s  0.05s  0.00s make rb
[08:33:19] get_cpu_info():CPU information:
processor       : 0 1 2 3 4 5 6 7
vendor_id       : GenuineIntel
model name      : Intel(R) Xeon(R) CPU           E5410  @ 2.33GHz
cpu MHz         : 2000.000
cpu cores       : 4
bogomips        : 4657.56 4654.99 4655.01 4655.01 4655.02 4655.04 4655.02 4655.01

Comment 7 Frantisek Reznicek 2009-03-05 14:53:55 UTC

Gordon, could you possibly review once more, please? -> [NEEDINFO]

Comment 8 Frantisek Reznicek 2009-03-05 15:09:24 UTC

I just forgot to mentioned that attachement id=334134 contains situation we already discussed, qpidd aborts with 
terminate called after throwing an instance of 'terminate called recursively     std::bad_alloc'
if there is no more memory. 
We closed our discussion with NOTABUG status, but let's once more consider how is the qpidd memory hungry when replicating,
bz478882_test_results2.txt: lines 772 - 1176 show clearly that approx 90 seconds with 5 queue replication from one qpidd instance to another one eats more than 3GB of memory. [./latencytest --rate 25000 --size 500 used as message source]

It might be good to document this behavior somewhere.

Comment 9 Frantisek Reznicek 2009-03-05 17:20:02 UTC

Re-tested once more with corrected parameters:

=== CASE 1 - one queue latencytest 25k messages / sec & ADR ===

  latencies on main broker   <2.5 ms
  latencies on backup broker ~4.0 ms  (latencies from replicated queue)

=== CASE 2 - 5 queues latencytest 25k messages 
             (over all queues, 5k msgs / sec & ADR ===

  latencies on main broker   ~4.0 ms
  latencies on backup broker >8000.0 ms and growing (latencies from repl. queue)

=== CASE 3 - 10 queues latencytest 25k messages 
             (over all queus, 2.5k msgs / sec & queue) / sec & ADR ===

  latencies on main broker   <7.5 ms
  latencies on backup broker ~23000.0 ms  (latencies from replicated queue)


I'm sorry I still not sure this can be considered as values meeting target above. May I ask for one more review, please?


Details attached in next comment...

Comment 10 Frantisek Reznicek 2009-03-05 17:21:10 UTC

Created attachment 334160 [details]
bz478882_test_results III (mentioned in previous comment)

Comment 11 Gordon Sim 2009-03-09 18:44:51 UTC

It appears that its the overhead of the extra connections that causes the deterioration in the latencies (and indeed the throughput of the federation link, which is the primary source of the issue).

If you use the --single-connection on each instance of the latency test I believe you will be able to sustain 2500 msgs/sec across 10 different queues (with 10 sessions sharing the same connection).

We probably will need to revisit the performance of replication. Beyond a certain amount of load the federation bridge can't seem to cope and the replication queue starts backing up.

However, I think for now if we can observe the rates with the desired number of queues over a shared connection, I think we can close this BZ and raise new ones as required with more specific loading requirements.

Comment 12 Frantisek Reznicek 2009-03-12 13:31:55 UTC

Re-tested once more with corrected parameters and with --single-connection on packages: qpidd-0.5.752581-1.el5 on RHEL5.3 i386 / x86_64


=== CASE 1 - one queue latencytest 25k messages / sec & ADR ===
  one queue '--single-connection no' used

  latencies on main broker   <2.5 ms
  latencies on backup broker <4.0 ms  (latencies from replicated queue)

=== CASE 2 - 10 queues latencytest 25k messages 
             (over all queus, 2.5k msgs / sec & queue) / sec & ADR ===
  ten queues '--single-connection yes' used

  latencies on main broker   <9.2 ms
  latencies on backup broker <7.5 ms

    [freznice@dhcp-lab-200 ~]$ cat bb.txt | awk -f b.awk
     Latency(ms): min=2.385484, max=26.490872, avg=7.493615  [520 samples av.]
    [freznice@dhcp-lab-200 ~]$ cat mb.txt | awk -f b.awk
     Latency(ms): min=2.196191, max=20.739687, avg=9.176185  [520 samples av.]

As described above in gsim comment #11 the increase of latencies was caused by number of connections, so --single-connection yes helped to solve that.

->VERIFIED

Comment 14 errata-xmlrpc 2009-04-21 16:16:28 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html