Description of problem: There are detected too many JournalInactive messages by default. Let's look at cluster test log file analysis (run time approx one week): 121012_mrg-qe-16VMs_xxxi_2.log.10.x.y.194.qpidd.log 639406/628839 98% 121012_mrg-qe-16VMs_xxxi_2.log.10.x.y.124.qpidd.log 76271/69274 91% 121012_mrg-qe-16VMs_xxxi_2.log.10.x.y.125.qpidd.log 754672/745357 99% 121012_mrg-qe-16VMs_xxxi_2.log.10.x.y.127.qpidd.log 976685/967296 99% 121012_mrg-qe-17VMs_xxxi_1.log.10.x.y.192.qpidd.log 17122/47 0% 121012_mrg-qe-17VMs_xxxi_1.log.10.x.y.178.qpidd.log 19622/6189 32% 121012_mrg-qe-17VMs_xxxi_1.log.10.x.y.179.qpidd.log 20408/5414 27% 121012_mrg-qe-17VMs_xxxi_1.log.10.x.y.181.qpidd.log 51635/35859 69% All above data are from kvm VM machines, each 2.5GB mem, 3 cpu cores. Most of the the time (7 days of 8 days was cluster idle). Above data shows total number of lines / number of JournalInactive lines. I propose to move JournalInactive messages to 'debug' logging level to avoid such extensive logging on default info+ levels. Alternatively periodicity (5sec) should be increased to at least 20 secs. Those message may cause perfomance issues Version-Release number of selected component (if applicable): cman-2.0.115-107.el5 cman-devel-2.0.115-107.el5 openais-0.80.6-37.el5 python-qpid-0.18-1.el5 python-qpid-qmf-0.18-1.el5 python-saslwrapper-0.18-1.el5 qpid-cpp-client-0.18-1.el5 qpid-cpp-client-devel-0.18-1.el5 qpid-cpp-client-devel-docs-0.18-1.el5 qpid-cpp-client-rdma-0.18-1.el5 qpid-cpp-client-ssl-0.18-1.el5 qpid-cpp-server-0.18-1.el5 qpid-cpp-server-cluster-0.18-1.el5 qpid-cpp-server-devel-0.18-1.el5 qpid-cpp-server-rdma-0.18-1.el5 qpid-cpp-server-ssl-0.18-1.el5 qpid-cpp-server-store-0.18-1.el5 qpid-cpp-server-xml-0.18-1.el5 qpid-java-client-0.18-2.el5 qpid-java-common-0.18-2.el5 qpid-java-example-0.18-2.el5 qpid-jca-0.18-2.el5 qpid-jca-xarecovery-0.18-2.el5 qpid-jca-zip-0.18-2.el5 qpid-qmf-0.18-1.el5 qpid-tests-0.18-1.el5 qpid-tools-0.18-1.el5 rgmanager-2.0.52-28.el5_8.5 rh-qpid-cpp-tests-0.18-1.el5 ruby-qpid-qmf-0.18-1.el5 ruby-saslwrapper-0.18-1.el5 saslwrapper-0.18-1.el5 sesame-1.0-4 How reproducible: 80% Steps to Reproduce: 1. run cluster_test or 1. configure active - active cluster 2. create bunch of queues 3. let cluster idle 4. qpidd logs are getting full of JournalInactive messages Actual results: Too many JournalInactive messages at info+ Expected results: Either significantly less JournalInactive messages at info+ (increase periodicity interval) or move JournalInactive messages to debug logging level. Additional info: Example of JournalInactive messages: [root@dhcp-lab-218 cluster_test]# grep JournalInactive 121012_mrg-qe-16VMs_xxxi_2.log.10.x.y.194.qpidd.log | head -200 2012-10-12 10:19:53 [System] info JournalInactive:TplStore task late and overran 1 times: late 2ms, overran 2ms (taking 6943ns) on average. 2012-10-12 10:21:34 [System] info JournalInactive:TplStore task late and overran 1 times: late 3ms, overran 2ms (taking 7835ns) on average. 2012-10-12 10:27:46 [System] info JournalInactive:TplStore task late and overran 1 times: late 4ms, overran 3ms (taking 8766ns) on average. 2012-10-12 10:27:52 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-60 task overran 1 times by 383ms (taking 408684448ns) on average. 2012-10-12 10:28:03 [System] info JournalInactive:TplStore task late and overran 2 times: late 71ms, overran 70ms (taking 5761ns) on average. 2012-10-12 10:28:03 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-20 task late and overran 1 times: late 89ms, overran 10ms (taking 2842ns) on average. 2012-10-12 10:28:03 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-30 task late and overran 1 times: late 94ms, overran 89ms (taking 2694ns) on average. 2012-10-12 10:28:03 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-50 task late and overran 1 times: late 363ms, overran 139ms (taking 2850ns) on average. 2012-10-12 10:28:03 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-60 task late and overran 1 times: late 68ms, overran 41ms (taking 4551ns) on average. 2012-10-12 10:28:03 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-70 task late and overran 1 times: late 139ms, overran 94ms (taking 2760ns) on average. 2012-10-12 10:28:03 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-80 task late and overran 2 times: late 212ms, overran 192ms (taking 3549ns) on average. 2012-10-12 10:28:08 [System] info JournalInactive:TplStore task late and overran 3 times: late 518ms, overran 518ms (taking 5518ns) on average. 2012-10-12 10:28:08 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-20 task late and overran 2 times: late 751ms, overran 711ms (taking 2352ns) on average. 2012-10-12 10:28:08 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-30 task late and overran 2 times: late 754ms, overran 751ms (taking 2951ns) on average. 2012-10-12 10:28:08 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-50 task late and overran 2 times: late 888ms, overran 776ms (taking 2602ns) on average. 2012-10-12 10:28:08 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-60 task late and overran 3 times: late 107ms, overran 599ms (taking 501482558ns) on average. 2012-10-12 10:28:08 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-70 task late and overran 2 times: late 776ms, overran 754ms (taking 2617ns) on average. 2012-10-12 10:28:08 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-80 task late and overran 2 times: late 898ms, overran 888ms (taking 3188ns) on average. 2012-10-12 10:28:08 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-90 task late 2 times by 711ms on average. 2012-10-12 10:28:13 [System] info JournalInactive:TplStore task late and overran 3 times: late 2ms, overran 2ms (taking 1997ns) on average. 2012-10-12 10:28:13 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-20 task late and overran 3 times: late 2ms, overran 2ms (taking 2015ns) on average. 2012-10-12 10:28:13 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-30 task late and overran 3 times: late 2ms, overran 2ms (taking 1790ns) on average. 2012-10-12 10:28:13 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-50 task late and overran 3 times: late 2ms, overran 2ms (taking 2008ns) on average. 2012-10-12 10:28:13 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-60 task late and overran 3 times: late 2ms, overran 2ms (taking 9074ns) on average. 2012-10-12 10:28:13 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-70 task late and overran 3 times: late 2ms, overran 2ms (taking 2014ns) on average. 2012-10-12 10:28:13 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-80 task late and overran 3 times: late 2ms, overran 2ms (taking 3035ns) on average. 2012-10-12 10:28:18 [System] info JournalInactive:TplStore task late and overran 1 times: late 3ms, overran 3ms (taking 2528ns) on average. 2012-10-12 10:28:18 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-20 task late and overran 1 times: late 3ms, overran 3ms (taking 2694ns) on average. 2012-10-12 10:28:18 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-30 task late and overran 1 times: late 3ms, overran 3ms (taking 2465ns) on average. 2012-10-12 10:28:18 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-50 task late and overran 1 times: late 3ms, overran 3ms (taking 2572ns) on average. 2012-10-12 10:28:18 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-60 task late and overran 1 times: late 2ms, overran 2ms (taking 6166ns) on average. 2012-10-12 10:28:18 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-70 task late and overran 1 times: late 3ms, overran 3ms (taking 2340ns) on average. 2012-10-12 10:28:18 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-80 task late and overran 1 times: late 3ms, overran 3ms (taking 3929ns) on average. 2012-10-12 10:28:24 [System] info JournalInactive:TplStore task late and overran 6 times: late 104ms, overran 104ms (taking 2289ns) on average. 2012-10-12 10:28:24 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-20 task late and overran 6 times: late 104ms, overran 104ms (taking 2282ns) on average. 2012-10-12 10:28:24 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-30 task late and overran 6 times: late 104ms, overran 104ms (taking 2244ns) on average. 2012-10-12 10:28:24 [System] info JournalInactive:test_cluster_toggle_load_w_sesame-50 task late and overran 6 times: late 104ms, overran 104ms (taking 2219ns) on average.
These statements are not produced by store code. Rather they are qpid::sys::TimerWarnings. Gordon Sim reduced the severity from Warning to Info in September, 2011. http://mail-archives.apache.org/mod_mbox/qpid-commits/201109.mbox/%3C20110908172735.BB50B23889EB@eris.apache.org%3E It looks like the journal tasks are perpetually late which is not surprising based on the job that they do. To fix this for Journal would be an across the board change for all qpid tasks.
Patched upstream at r1045316. The patch demotes TimerWarning messages, which produce the JournalInactive messages above, to debug level and, if enabled, prints each message once per minute. This patch affects all TimerWarning messages and not just JournalInactive messages. To a customer these messages are alarming and of dubious value. These messages may be of interest to a diagnostic engineer some day and are enabled selectively by using broker switch: --log-enable debug+:Timer
Tested on RHEL5.9 and RHEL6.3 (both i386 and x86_64). The issue has been fixed, the JournalInactive log message has 'debug' level now. Packages used for testing: RHEL5.9 python-qpid-0.18-4.el5 python-qpid-qmf-0.18-12.el5 qpid-cpp-client-0.18-12.el5 qpid-cpp-client-devel-0.18-12.el5 qpid-cpp-client-devel-docs-0.18-12.el5 qpid-cpp-client-ssl-0.18-12.el5 qpid-cpp-server-0.18-12.el5 qpid-cpp-server-cluster-0.18-12.el5 qpid-cpp-server-devel-0.18-12.el5 qpid-cpp-server-ssl-0.18-12.el5 qpid-cpp-server-store-0.18-12.el5 qpid-cpp-server-xml-0.18-12.el5 qpid-java-client-0.18-5.el5 qpid-java-common-0.18-5.el5 qpid-java-example-0.18-5.el5 qpid-qmf-0.18-12.el5 qpid-qmf-devel-0.18-12.el5 qpid-tools-0.18-7.el5 RHEL6.3 python-qpid-0.18-4.el6 python-qpid-qmf-0.18-12.el6_3 qpid-cpp-client-0.18-12.el6_3 qpid-cpp-client-devel-0.18-12.el6_3 qpid-cpp-client-devel-docs-0.18-12.el6_3 qpid-cpp-server-0.18-12.el6_3 qpid-cpp-server-cluster-0.18-12.el6_3 qpid-cpp-server-devel-0.18-12.el6_3 qpid-cpp-server-store-0.18-12.el6_3 qpid-cpp-server-xml-0.18-12.el6_3 qpid-java-client-0.18-5.el6 qpid-java-common-0.18-5.el6 qpid-java-example-0.18-5.el6 qpid-qmf-0.18-12.el6_3 qpid-qmf-devel-0.18-12.el6_3 qpid-tools-0.18-7.el6_3 -> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0561.html