Bug 1028177
Summary: | oo-stats causes new threads to be created for activemq that are never reaped | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Timothy Williams <tiwillia> |
Component: | Installer | Assignee: | Luke Meyer <lmeyer> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.0.0 | CC: | gpei, libra-bugs |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-01-25 11:47:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Timothy Williams
2013-11-07 20:34:56 UTC
Just getting up to speed here... a couple questions: 1. Have you observed whether this happens on a network of activemq brokers only, or if it also occurs with a single activemq? 2. Any idea how activemq.xml looked before the workaround, or what procedure was followed to create it? Our product docs, sample activemq-network.xml file, and openshift.sh now all create policies to expire queues, but I'm wondering if these get named such that they're not covered by the policy. Hi Luke, > 1. Have you observed whether this happens on a network of activemq brokers only, or if it also occurs with a single activemq? The customer has a network of activemq brokers but I have tested this on both an all-in-one system with one activemq instance and a system with a network (3) of activemq instances. The symptom is exhibited both times. > 2. Any idea how activemq.xml looked before the workaround, or what procedure was followed to create it? I unfortunately do not have the customer's activemq.xml from before he implemented the workaround, as he did it very quickly. However, using the installation script (for 2.0 and 1.2) creates the activemq.xml that will exhibit the issue. I have a couple systems up now that are exhibiting the issue, feel free to ping me and I will pass on the credentials/ips - Tim What we were missing is the broker schedulePeriodForDestinationPurge attribute. Without it, evidently nothing checks for the timeout to remove inactive queues. ActiveMQ docs give us the details: http://activemq.apache.org/delete-inactive-destinations.html MCollective docs don't mention that particular detail: http://docs.puppetlabs.com/mcollective/deploy/middleware/activemq.html#reply-queue-pruning although it is included in the example file. I'll add it to our script, our example files, and check docs to see if any changes are needed (I don't think so). This is nothing specific to oo-stats, BTW; oo-accept-systems or even oo-mco ping create a new reply queue, with name based on the broker host and process. The reason we don't see this balloon out of control normally is that typically the processes making MCollective requests are the OpenShift broker ones, which don't change much, so the same queues get reused a lot. https://github.com/openshift/openshift-extras/pull/257 specific commit is https://github.com/openshift/openshift-extras/commit/38d8055a9abe78fadb6f75777cdcb50827510f9c It does seem to take a little longer than the configured 300 secs (5 minutes) to prune the inactive queues. But they do go away; you could set the limits lower to test. Similar improvement made for 1.2 scripts/examples. Verify this bug on 2.0.z/2013-12-23.1 with the new installation script. Modify the inactiveTimoutBeforeGC="30000", schedulePeriodForDestinationPurge="10000" to test this issue more easily. [root@broker ~]# date;ps -eLf | grep activemq | wc -l ; oo-stats >> /dev/null ; date ; ps -eLf | grep activemq | wc -l Tue Dec 24 04:06:41 EST 2013 56 Tue Dec 24 04:06:50 EST 2013 58 [root@broker ~]# date;ps -eLf | grep activemq | wc -l Tue Dec 24 04:07:33 EST 2013 56 Monitor the log of activemq, tailf /var/log/activemq/activemq.log 2013-12-24 04:07:29,510 | INFO | mcollective.reply.broker.ose20-1216-com.cn_16374 Inactive for longer than 30000 ms - removing ... | org.apache.activemq.broker.region.Queue | ActiveMQ Broker[activemq.ose20-1216-com.cn] Scheduler Sanitizing this bug of customer information so it can be public. |