Bug 1665893 - candlepin refuses to start or hangs periodically when having too many messages in ActiveMQ journal
Summary: candlepin refuses to start or hangs periodically when having too many message...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Candlepin
Version: 6.4
Hardware: x86_64
OS: Linux
high
high
Target Milestone: 6.8.0
Assignee: satellite6-bugs
QA Contact: Lai
URL:
Whiteboard:
Depends On: 1667952 1842933 1842936
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-14 11:42 UTC by Pavel Moravec
Modified: 2024-03-25 15:12 UTC (History)
28 users (show)

Fixed In Version: candlepin-2.6.17-1, candlepin-2.9.28-1, candlepin-3.1.13-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1667952 1851132 (view as bug list)
Environment:
Last Closed: 2020-10-27 12:58:03 UTC
Target Upstream Version:
Embargoed:
smeyer: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3803901 0 None None None 2019-01-14 12:07:10 UTC
Red Hat Product Errata RHSA-2020:4366 0 None None None 2020-10-27 12:58:21 UTC

Internal Links: 1808949

Description Pavel Moravec 2019-01-14 11:42:10 UTC
Description of problem:
When candlepin is unable to send further events to qpidd (e.g. due to katello_event_queue full), it stores further events in /var/lib/candlepin/activemq-artemis/journal . Once this journal becomes "too full", an attempt to start tomcat/candlepin fails - regardless that candlepin newly *can* send messages to the qpidd broker.

This completely prevents starting candlepin, until one purges away the journal content.


Version-Release number of selected component (if applicable):
Sat 6.4
candlepin-2.4.8-1.el7.noarch


How reproducible:
100%


Steps to Reproduce:
1. Either have Satellite6.4 installed and mimic LOCE task problems such that katello_event_queue grows over time - and patiently generate more and more events in candlepin, even when it will be already in suspended mode.

2. Or use prepared journal content (tarball will follow) for a fast reproducer.
3. ensure candlepin can properly send messages to qpidd (again)
4. restart tomcat and wait approx. 3 minutes.


Actual results:
4. candlepin fails to start, logging a huge thread dump to error.log (details in additional info)


Expected results:
4. candlepin can start any time.


Additional info:
error.log having thread dump:

==> /var/log/candlepin/error.log <==
2019-01-14 12:28:14,415 [thread=Thread-0 (-scheduled-threads)] [=, org=, csid=] WARN  org.apache.activemq.artemis.core.server - AMQ222199: Thread dump: *******************************************************************************
Complete Thread dump 
"Thread-4 (ActiveMQ-scheduled-threads)" Id=63 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7c00f6ce
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7c00f6ce
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


"Thread-3 (ActiveMQ-scheduled-threads)" Id=62 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7c00f6ce
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7c00f6ce
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


"Thread-2 (ActiveMQ-scheduled-threads)" Id=61 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7c00f6ce
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7c00f6ce
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


"Thread-1 (ActiveMQ-scheduled-threads)" Id=60 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7c00f6ce
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7c00f6ce
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


"Thread-0 (ActiveMQ-scheduled-threads)" Id=57 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7c00f6ce
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7c00f6ce
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


"activemq-buffer-timeout" Id=56 WAITING on java.util.concurrent.Semaphore$NonfairSync@3902672a
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.Semaphore$NonfairSync@3902672a
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
	at org.apache.activemq.artemis.core.io.buffer.TimedBuffer$CheckTimer.run(TimedBuffer.java:427)
	at java.lang.Thread.run(Thread.java:748)


"Thread-0 (-scheduled-threads)" Id=55 RUNNABLE
	at sun.management.ThreadImpl.dumpThreads0(Native Method)
	at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:454)
	at org.apache.activemq.artemis.utils.ThreadDumpUtil.threadDump(ThreadDumpUtil.java:47)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.threadDump(ActiveMQServerImpl.java:929)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.lambda$initializeCriticalAnalyzer$2(ActiveMQServerImpl.java:642)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$$Lambda$13/1979890430.run(Unknown Source)
	at org.apache.activemq.artemis.utils.critical.CriticalAnalyzerImpl.fireAction(CriticalAnalyzerImpl.java:145)
	at org.apache.activemq.artemis.utils.critical.CriticalAnalyzerImpl.check(CriticalAnalyzerImpl.java:130)
	at org.apache.activemq.artemis.utils.critical.CriticalAnalyzerImpl$1.run(CriticalAnalyzerImpl.java:53)
	at org.apache.activemq.artemis.core.server.ActiveMQScheduledComponent$2.run(ActiveMQScheduledComponent.java:284)
	at org.apache.activemq.artemis.core.server.ActiveMQScheduledComponent$3.run(ActiveMQScheduledComponent.java:294)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

	Number of locked synchronizers = 1
	- java.util.concurrent.ThreadPoolExecutor$Worker@4a914d3d


"pool-2-thread-1" Id=52 RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:255)
	at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:424)
	at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(NativePRNG.java:525)
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:544)
	-  locked java.lang.Object@ba508d9
	at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
	at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:220)
	at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
	at java.security.SecureRandom.next(SecureRandom.java:491)
	at java.util.Random.nextInt(Random.java:329)
	at sun.security.ssl.SSLContextImpl.engineInit(SSLContextImpl.java:101)
	at javax.net.ssl.SSLContext.init(SSLContext.java:282)
	at org.apache.qpid.ssl.SSLContextFactory.buildContext(SSLContextFactory.java:131)
	at org.apache.qpid.ssl.SSLContextFactory.buildClientContext(SSLContextFactory.java:62)
	at org.apache.qpid.transport.network.security.SecurityLayerFactory$SSLSecurityLayer.<init>(SecurityLayerFactory.java:79)
	at org.apache.qpid.transport.network.security.SecurityLayerFactory.newInstance(SecurityLayerFactory.java:52)
	at org.apache.qpid.transport.Connection.connect(Connection.java:233)
	-  locked java.lang.Object@8f1bb59
	at org.apache.qpid.client.AMQConnectionDelegate_0_10.makeBrokerConnection(AMQConnectionDelegate_0_10.java:222)
	at org.apache.qpid.client.AMQConnection.makeBrokerConnection(AMQConnection.java:664)
	at org.apache.qpid.client.AMQConnection.<init>(AMQConnection.java:444)
	at org.apache.qpid.client.AMQConnectionFactory.createConnection(AMQConnectionFactory.java:124)
	at org.candlepin.audit.QpidConnection.newConnection(QpidConnection.java:170)
	at org.candlepin.audit.QpidQmf.runQuery(QpidQmf.java:101)
	at org.candlepin.audit.QpidQmf.getExchangeBoundQueueNames(QpidQmf.java:220)
	at org.candlepin.audit.QpidQmf.getStatus(QpidQmf.java:178)
	at org.candlepin.controller.QpidStatusMonitor.monitor(QpidStatusMonitor.java:124)
	-  locked org.candlepin.controller.QpidStatusMonitor@73bc8a0f
	at org.candlepin.controller.QpidStatusMonitor.run(QpidStatusMonitor.java:86)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

	Number of locked synchronizers = 1
	- java.util.concurrent.ThreadPoolExecutor$Worker@3db5f5a6


"IoReceiver - localhost/127.0.0.1:5671" Id=50 RUNNABLE (in native)
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:156)
	at java.lang.Thread.run(Thread.java:748)


"IoSender - localhost/127.0.0.1:5671" Id=51 WAITING on java.lang.Object@592d9865
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@592d9865
	at java.lang.Object.wait(Object.java:502)
	at org.apache.qpid.transport.network.io.IoSender.run(IoSender.java:279)
	at java.lang.Thread.run(Thread.java:748)


"ack-flusher" Id=45 TIMED_WAITING on java.util.TaskQueue@7aedb059
	at java.lang.Object.wait(Native Method)
	-  waiting on java.util.TaskQueue@7aedb059
	at java.util.TimerThread.mainLoop(Timer.java:552)
	at java.util.TimerThread.run(Timer.java:505)


"QuartzScheduler_QuartzSchedulerThread" Id=42 TIMED_WAITING on java.lang.Object@7f43e72e
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@7f43e72e
	at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:253)


"QuartzScheduler_Worker-15" Id=41 TIMED_WAITING on java.lang.Object@68b8e63b
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@68b8e63b
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-14" Id=40 TIMED_WAITING on java.lang.Object@5d63a5f9
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@5d63a5f9
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-13" Id=39 TIMED_WAITING on java.lang.Object@4a1ba974
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@4a1ba974
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-12" Id=38 TIMED_WAITING on java.lang.Object@2a815d8e
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@2a815d8e
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-11" Id=37 TIMED_WAITING on java.lang.Object@527efdb
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@527efdb
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-10" Id=36 TIMED_WAITING on java.lang.Object@45abceeb
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@45abceeb
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-9" Id=35 TIMED_WAITING on java.lang.Object@52ec8f23
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@52ec8f23
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-8" Id=34 TIMED_WAITING on java.lang.Object@72a1d5dd
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@72a1d5dd
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-7" Id=33 TIMED_WAITING on java.lang.Object@48c2a565
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@48c2a565
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-6" Id=32 TIMED_WAITING on java.lang.Object@5385f600
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@5385f600
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-5" Id=31 TIMED_WAITING on java.lang.Object@65a6fec4
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@65a6fec4
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-4" Id=30 TIMED_WAITING on java.lang.Object@1955d090
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@1955d090
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-3" Id=29 TIMED_WAITING on java.lang.Object@4461b752
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@4461b752
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-2" Id=28 TIMED_WAITING on java.lang.Object@38286c28
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@38286c28
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"QuartzScheduler_Worker-1" Id=27 TIMED_WAITING on java.lang.Object@6aaaf537
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.Object@6aaaf537
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:568)


"Statistics Thread-__DEFAULT__-1" Id=25 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5dcc79a2
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5dcc79a2
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


"__DEFAULT__" Id=24 WAITING on java.util.TaskQueue@14de9e6e
	at java.lang.Object.wait(Native Method)
	-  waiting on java.util.TaskQueue@14de9e6e
	at java.lang.Object.wait(Object.java:502)
	at java.util.TimerThread.mainLoop(Timer.java:526)
	at java.util.TimerThread.run(Timer.java:505)


"C3P0PooledConnectionPoolManager[identityToken->2tmgiaa0l499gq1p2apom|33acba40]-HelperThread-#2" Id=23 TIMED_WAITING on com.mchange.v2.async.ThreadPoolAsynchronousRunner@6f6cf9f1
	at java.lang.Object.wait(Native Method)
	-  waiting on com.mchange.v2.async.ThreadPoolAsynchronousRunner@6f6cf9f1
	at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:683)


"C3P0PooledConnectionPoolManager[identityToken->2tmgiaa0l499gq1p2apom|33acba40]-HelperThread-#1" Id=22 TIMED_WAITING on com.mchange.v2.async.ThreadPoolAsynchronousRunner@6f6cf9f1
	at java.lang.Object.wait(Native Method)
	-  waiting on com.mchange.v2.async.ThreadPoolAsynchronousRunner@6f6cf9f1
	at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:683)


"C3P0PooledConnectionPoolManager[identityToken->2tmgiaa0l499gq1p2apom|33acba40]-HelperThread-#0" Id=21 TIMED_WAITING on com.mchange.v2.async.ThreadPoolAsynchronousRunner@6f6cf9f1
	at java.lang.Object.wait(Native Method)
	-  waiting on com.mchange.v2.async.ThreadPoolAsynchronousRunner@6f6cf9f1
	at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:683)


"C3P0PooledConnectionPoolManager[identityToken->2tmgiaa0l499gq1p2apom|33acba40]-AdminTaskTimer" Id=20 TIMED_WAITING on java.util.TaskQueue@5252aa66
	at java.lang.Object.wait(Native Method)
	-  waiting on java.util.TaskQueue@5252aa66
	at java.util.TimerThread.mainLoop(Timer.java:552)
	at java.util.TimerThread.run(Timer.java:505)


"Statistics Thread-file:/var/lib/tomcat/webapps/candlepin/WEB-INF/classes/ehcache.xml::WebappClassLoader
  context: /candlepin
  delegate: false
  repositories:
    /WEB-INF/classes/
----------> Parent Classloader:
java.net.URLClassLoader@7a81197d
::org.ehcache.jcache.JCacheCachingProvider@2bdd47be-1" Id=17 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6cedd8e7
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6cedd8e7
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


"file:/var/lib/tomcat/webapps/candlepin/WEB-INF/classes/ehcache.xml::WebappClassLoader
  context: /candlepin
  delegate: false
  repositories:
    /WEB-INF/classes/
----------> Parent Classloader:
java.net.URLClassLoader@7a81197d
::org.ehcache.jcache.JCacheCachingProvider@2bdd47be" Id=16 WAITING on java.util.TaskQueue@c2016fa
	at java.lang.Object.wait(Native Method)
	-  waiting on java.util.TaskQueue@c2016fa
	at java.lang.Object.wait(Object.java:502)
	at java.util.TimerThread.mainLoop(Timer.java:526)
	at java.util.TimerThread.run(Timer.java:505)


"com.google.inject.internal.util.$Finalizer" Id=15 WAITING on java.lang.ref.ReferenceQueue$Lock@40bd5fe
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.ref.ReferenceQueue$Lock@40bd5fe
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
	at com.google.inject.internal.util.$Finalizer.run(Finalizer.java:114)


"localhost-startStop-1" Id=14 RUNNABLE
	at org.apache.activemq.artemis.core.journal.impl.JournalImpl$11.checkDeleteSize(JournalImpl.java:1406)
	at org.apache.activemq.artemis.core.journal.impl.JournalImpl$11.deleteRecord(JournalImpl.java:1435)
	at org.apache.activemq.artemis.core.journal.impl.JournalImpl$13.onReadCommitRecord(JournalImpl.java:1932)
	at org.apache.activemq.artemis.core.journal.impl.JournalImpl.readJournalFile(JournalImpl.java:729)
	at org.apache.activemq.artemis.core.journal.impl.JournalImpl.load(JournalImpl.java:1753)
	-  locked org.apache.activemq.artemis.core.journal.impl.JournalImpl@434d4e83
	at org.apache.activemq.artemis.core.journal.impl.JournalImpl.load(JournalImpl.java:1390)
	-  locked org.apache.activemq.artemis.core.journal.impl.JournalImpl@434d4e83
	at org.apache.activemq.artemis.core.journal.impl.JournalImpl.load(JournalImpl.java:1374)
	at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.loadMessageJournal(AbstractJournalStorageManager.java:846)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java:2573)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java:2336)
	-  locked org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl@41658ced
	at org.apache.activemq.artemis.core.server.impl.LiveOnlyActivation.run(LiveOnlyActivation.java:71)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStart(ActiveMQServerImpl.java:543)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:482)
	-  locked org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl@41658ced
	at org.apache.activemq.artemis.core.server.embedded.EmbeddedActiveMQ.start(EmbeddedActiveMQ.java:115)
	at org.candlepin.audit.ActiveMQContextListener.contextInitialized(ActiveMQContextListener.java:159)
	at org.candlepin.guice.CandlepinContextListener.withInjector(CandlepinContextListener.java:182)
	at org.jboss.resteasy.plugins.guice.GuiceResteasyBootstrapServletContextListener.contextInitialized(GuiceResteasyBootstrapServletContextListener.java:57)
	at org.candlepin.guice.CandlepinContextListener.contextInitialized(CandlepinContextListener.java:144)
	at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:5118)
	at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5634)
	-  locked org.apache.catalina.core.StandardContext@6939ae95
	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145)
	-  locked org.apache.catalina.core.StandardContext@6939ae95
	at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:899)
	at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:875)
	at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:652)
	at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1260)
	at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:2002)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

	Number of locked synchronizers = 1
	- java.util.concurrent.ThreadPoolExecutor$Worker@5541d560


"Catalina-startStop-1" Id=13 WAITING on java.util.concurrent.FutureTask@4d7ed5fd
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.FutureTask@4d7ed5fd
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:1163)
	at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:497)
	at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1713)
	at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:337)
	at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
	at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
	at org.apache.catalina.util.LifecycleBase.setStateInternal(LifecycleBase.java:388)
	-  locked org.apache.catalina.core.StandardHost@26024709
	at org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:333)
	-  locked org.apache.catalina.core.StandardHost@26024709
	at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1136)
	-  locked org.apache.catalina.core.StandardHost@26024709
	at org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:819)
	-  locked org.apache.catalina.core.StandardHost@26024709
	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145)
	-  locked org.apache.catalina.core.StandardHost@26024709
	at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1571)
	at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1561)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

	Number of locked synchronizers = 1
	- java.util.concurrent.ThreadPoolExecutor$Worker@42dafa95


"GC Daemon" Id=12 TIMED_WAITING on sun.misc.GC$LatencyLock@17804d89
	at java.lang.Object.wait(Native Method)
	-  waiting on sun.misc.GC$LatencyLock@17804d89
	at sun.misc.GC$Daemon.run(GC.java:117)


"Signal Dispatcher" Id=4 RUNNABLE


"Finalizer" Id=3 WAITING on java.lang.ref.ReferenceQueue$Lock@3e9030ac
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.ref.ReferenceQueue$Lock@3e9030ac
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)


"Reference Handler" Id=2 WAITING on java.lang.ref.Reference$Lock@53ffc177
	at java.lang.Object.wait(Native Method)
	-  waiting on java.lang.ref.Reference$Lock@53ffc177
	at java.lang.Object.wait(Object.java:502)
	at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)


"main" Id=1 WAITING on java.util.concurrent.FutureTask@6f9f0c51
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.FutureTask@6f9f0c51
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1119)
	-  locked org.apache.catalina.core.StandardEngine@681537c8
	at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:300)
	-  locked org.apache.catalina.core.StandardEngine@681537c8
	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145)
	-  locked org.apache.catalina.core.StandardEngine@681537c8
	at org.apache.catalina.core.StandardService.startInternal(StandardService.java:444)
	-  locked org.apache.catalina.core.StandardEngine@681537c8
	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145)
	-  locked org.apache.catalina.core.StandardService@7d6cc346
	at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:758)
	-  locked java.lang.Object@42468917
	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145)
	-  locked org.apache.catalina.core.StandardServer@3e6fa38a
	at org.apache.catalina.startup.Catalina.start(Catalina.java:693)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:294)
	at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:428)


===============================================================================
End Thread dump 
*******************************************************************************

Comment 4 Kevin Howell 2019-01-14 15:06:17 UTC
Have you seen this occur in real-world scenarios? If so, can you describe the conditions where this occurs?

Comment 5 Pavel Moravec 2019-01-14 15:09:52 UTC
(In reply to Kevin Howell from comment #4)
> Have you seen this occur in real-world scenarios? If so, can you describe
> the conditions where this occurs?

Yes. The scenario from point 1. (katello_event_queue gets full, candlepin to suspended mode as it cant send more msgs to qpidd but somehow generating more events/messages and storing inside the internal broker) was observed at customer. And resulted in the huge journal directory I provided.

Backing up the journal dir (to the copy I shared) was resolution for them.

Comment 6 Nikos Moumoulidis 2019-07-19 13:21:28 UTC
I was able to reproduce the issue on my local vm, by placing the journal files from the tar under /var/lib/candlepin/activemq-artemis/journal.
The thread dump is done by ActiveMQ's "critical analyzer", which is simply monitoring response times in key operations (like journal persistence), and if it perceives that an operation is taking too long, it logs this thread dump. The default setting is to wait 2 minutes before deciding that the operation is slow.
I've increased this timeout to 3 hours, and re-started tomcat. After about 70 minutes, I saw the following errors, of which the main one is the "java.lang.OutOfMemoryError: Java heap space", which I think is the cause here. After taking a heap dump, I can see that a lot of memory was utilized by ActiveMQ data. Basically Artemis is trying to read all the records from all those journals into memory, and is running out.

Increasing the JVM's memory from -Xmx1536m to -Xmx5120m, re-starting tomcat and it takes about 5 minutes for candlepin to get deployed successfully.

candlepin.log:

...
2019-07-15 15:32:32,008 [thread=localhost-startStop-1] [=, org=, csid=] WARN  org.apache.activemq.artemis.core.server - AMQ222165: No Dead Letter Address configured for queue jobs in AddressSettings
2019-07-15 15:32:32,009 [thread=localhost-startStop-1] [=, org=, csid=] WARN  org.apache.activemq.artemis.core.server - AMQ222166: No Expiry Address configured for queue jobs in AddressSettings
2019-07-15 15:48:58,998 [thread=C3P0PooledConnectionPoolManager[identityToken->z8kflta31wbuv62nh2qzm|173ecc4d]-AdminTaskTimer] [=, org=, csid=] WARN  com.mchange.v2.async.ThreadPoolAsynchronousRunner - com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@67152f1e -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!
2019-07-15 16:42:29,661 [thread=Thread-79 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@32340be)] [=, org=, csid=] WARN  org.apache.activemq.artemis.utils.actors.OrderedExecutor - Java heap space
java.lang.OutOfMemoryError: Java heap space
2019-07-15 16:42:29,862 [thread=C3P0PooledConnectionPoolManager[identityToken->z8kflta31wbuv62nh2qzm|173ecc4d]-AdminTaskTimer] [=, org=, csid=] WARN  com.mchange.v2.async.ThreadPoolAsynchronousRunner - com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@67152f1e -- APPARENT DEADLOCK!!! Complete Status:
        Managed Threads: 3
        Active Threads: 3
        Active Tasks:
                com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@1cd576dc
                        on thread: C3P0PooledConnectionPoolManager[identityToken->z8kflta31wbuv62nh2qzm|173ecc4d]-HelperThread-#2
                com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@65cc4dbc
                        on thread: C3P0PooledConnectionPoolManager[identityToken->z8kflta31wbuv62nh2qzm|173ecc4d]-HelperThread-#1
                com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@16a9c7df
                        on thread: C3P0PooledConnectionPoolManager[identityToken->z8kflta31wbuv62nh2qzm|173ecc4d]-HelperThread-#0
        Pending Tasks:
                com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@36ff1303
Pool thread stack traces:
        Thread[C3P0PooledConnectionPoolManager[identityToken->z8kflta31wbuv62nh2qzm|173ecc4d]-HelperThread-#1,5,main]
                com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:722)
        Thread[C3P0PooledConnectionPoolManager[identityToken->z8kflta31wbuv62nh2qzm|173ecc4d]-HelperThread-#2,5,main]
                com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:722)
        Thread[C3P0PooledConnectionPoolManager[identityToken->z8kflta31wbuv62nh2qzm|173ecc4d]-HelperThread-#0,5,main]
                com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:722)

catalina.log:

...
Jul 15, 2019 3:32:21 PM com.google.inject.internal.ProxyFactory <init>
WARNING: Method [public void org.candlepin.model.EntitlementCertificateCurator.delete(org.candlepin.model.Persisted)] is synthetic and is being intercepted by [com.google.inject.persist.jpa.JpaLocalTxnInterceptor@7d311b21]. This could indicate a bug.  The method may be intercepted twice, or may not be intercepted at all.
Jul 15, 2019 4:42:29 PM org.apache.catalina.startup.HostConfig deployWARs
SEVERE: Error waiting for multi-thread deployment of WAR files to complete
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:832)
        at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:495)
        at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1713)
        at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:337)
        at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
        at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
        at org.apache.catalina.util.LifecycleBase.setStateInternal(LifecycleBase.java:388)
        at org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:333)
        at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1136)
        at org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:819)
        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145)
        at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1571)
        at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1561)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.OutOfMemoryError: Java heap space

Jul 15, 2019 4:42:29 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /var/lib/tomcat/webapps/ROOT
Jul 15, 2019 4:42:30 PM org.apache.catalina.startup.TldConfig execute
INFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
Jul 15, 2019 4:42:30 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deployment of web application directory /var/lib/tomcat/webapps/ROOT has finished in 860 ms
Jul 15, 2019 4:42:30 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-bio-8443"]
Jul 15, 2019 4:42:30 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-bio-8080"]
Jul 15, 2019 4:42:30 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["ajp-bio-8009"]
Jul 15, 2019 4:42:30 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 4221063 ms

Comment 7 Nikos Moumoulidis 2019-07-22 13:47:06 UTC
(In reply to Pavel Moravec from comment #0)
> When candlepin is unable to send further events to qpidd (e.g. due to
> katello_event_queue full), it stores further events in

Hi, can you provide more details on the circumstances that lead to the artemis journal growing?
You mention katello_event_queue was full, is that what actually happened, or just an example?
Normally, if candlepin cannot send further events to qpidd, it enters SUSPEND_MODE, so there would be no events being generated by artemis (since the system is not usable); did candlepin enter SUSPEND_MODE in this case?
The candlepin.log for the time that candlepin could not send messages would also be very helpful.
It is very unusual for the artemis journal to grow to be 557 files, so the best course of action is to fix whatever caused that to grow.

Comment 8 Pavel Moravec 2019-07-28 13:28:11 UTC
(In reply to Nikos Moumoulidis from comment #7)
> (In reply to Pavel Moravec from comment #0)
> > When candlepin is unable to send further events to qpidd (e.g. due to
> > katello_event_queue full), it stores further events in
> 
> Hi, can you provide more details on the circumstances that lead to the
> artemis journal growing?
> You mention katello_event_queue was full, is that what actually happened, or
> just an example?

That actually happened and it is also a good and simple way of reproducer; To fully simulate this in Sat6, it is sufficient to:

1) comment out one line in /opt/theforeman/tfm/root/usr/share/gems/gems/katello-*/lib/katello/engine.rb :

    initializer "katello.initialize_cp_listener", :before => :finisher_hook do
      unless Rails.env.test?
        ForemanTasks.dynflow.config.post_executor_init do |world|
          #::Actions::Candlepin::ListenOnCandlepinEvents.ensure_running(world)
          ::Actions::Katello::EventQueue::Monitor.ensure_running(world)
        end
      end
    end

2) restart dynflowd service - these 2 steps cause LOCE task wont consume messages from the queue

3) Now wait until the queue depth reaches 80M in _size_ (approx 110k messages).

4) Since now, each and every message is enqueued in 2 minutes (some qpid-java client behaviour I can explain) - no more throughput, really such slow only; here candlepin is able to send messages, so not yet in suspended mode

5) This "1msg per 2 minutes" lasts until 100M queue depth in size is reached, and only then candlepin enters suspended mode.

From 4) to 5), you have to enqueue around 30k messages (depending on the event info in each), so with 2m per message throughput, you would have to wait 42 days(!) (30k/30/24) to get candlepin in suspended mode.

But even then, surprisingly, candlepin does _not_ enter suspended state. It still behaves the same way - sends a message to "full" queue that has queue depth in bytes bigger than "--default-queue-limit BYTES (104857600)" qpidd's default (this is something strange behaviour on qpidd's side that should reject messages and not be silent).

Anyway, within the 42 days between 4) and 5), Artemis journals can grow anyhow huge..


> The candlepin.log for the time that candlepin could not send messages would
> also be very helpful.

Sorry, dont have it after more than half year..


> It is very unusual for the artemis journal to grow to be 557 files, so the
> best course of action is to fix whatever caused that to grow.

This bug can happen as a consequence of some another issue occuring, that's right. But those events happen time to time (and the bug report is based on a real customer experience).

Comment 17 Pavel Moravec 2020-03-08 14:36:43 UTC
Maybe a different bug but I have noticed few times already, that having ActiveMQ journal with many messages (like 10k journal files there), candlepin *can* start, but it forget to delete messages after sending to qpidd. In particular:

1) it sends all (tens or hundreds of) thousands of messages from the ActiveMQ broker to qpidd. This spams qpidd and dynflow_executor and puts candlepin to suspended mode, since candlepin is faster in sending messages than dynflow in reading them. And this can take few hours to recover.
2) THE PROBLEM IS, that the messages are _not_ deleted from ActiveMQ broker after they are sent. So tomcat restart will flush *all* the event messages to qpidd *again*.

So imho there is some threshold of messages / journals size, that reaching it causes messages are not further removed from ActiveMQ after sending them to qpidd.

Is that something we can deal within this BZ or does it deserve a new BZ?

Comment 18 Nikos Moumoulidis 2020-03-09 14:32:37 UTC
(In reply to Pavel Moravec from comment #17)

> Is that something we can deal within this BZ or does it deserve a new BZ?

I'd suggest opening a new bz for that particular case (messages not being deleted after being sent), with as much information as possible, since it seems like a different issue with a potentially different solution.

The solution to this bz would be to simply not get into the situation of filling up the journal when we can't send messages (for whatever reason), and instead Candlepin should go into suspend mode. Although, I see the upstream bz (1667952) was closed due to insufficient data / not being able to reproduce, but unsure why this did not get updated.

Comment 19 Nikos Moumoulidis 2020-03-09 14:46:19 UTC
We have had trouble reproducing the root cause of this (not tomcat hanging/not starting, but rather getting artemis to fill the journal with a large amount of messages in the first place).
Is there any more information that can help us (in the form of reproduction steps, logs, or anything really)?

Comment 34 Brad Buckingham 2020-06-18 14:24:36 UTC
There is a corresponding change in the Satellite installer to clean up the event queue.  That is tracked by bug 1847515.

Comment 39 Mike McCune 2020-08-20 17:57:41 UTC
*** Satellite 6.6.3 Hotfix Available ***

1) Download https://people.redhat.com/~mmccune/hotfix/candlepin-RHBZ1665893.tar.gz to your Satellite and unpack the tar.gz

2) stop services:

satellite-maintain service stop

3) Install:

rpm -Uvh candlepin-*.rpm

4) restart:

satellite-maintain service start

5) resume operations

Comment 42 errata-xmlrpc 2020-10-27 12:58:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.8 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4366


Note You need to log in before you can comment on or make changes to this bug.