Bug 1401896

Summary: engine server threads keep running after tasks are gone
Product: [oVirt] ovirt-engine Reporter: sefi litmanovich <slitmano>
Component: BLL.InfraAssignee: Oved Ourfali <oourfali>
Status: CLOSED DUPLICATE QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.0.6.1CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-06 11:47:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs.tar - engine.log from the past 2 weeks, server logs, thread dump (out.log) none

Description sefi litmanovich 2016-12-06 11:19:18 UTC
Created attachment 1228416 [details]
logs.tar - engine.log from the past 2 weeks, server logs, thread dump (out.log)

Description of problem:

I have an engine rhevm-4.0.6.1-0.1.el7ev.noarch.
My env's current state is such that I can't really open any new task at all. When I looked at the log after an attempt I see:

2016-12-05 17:19:51,932 WARN  [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (default task-11) [] The thread pool failed to execute list of tasks: Task java.util.concurrent.FutureTask@6f8619c0 rejected from org.ovirt.engine.core.
utils.threadpool.ThreadPoolUtil$InternalThreadExecutor@31835b5f[Running, pool size = 500, active threads = 500, queued tasks = 100, completed tasks = 24428]
2016-12-05 17:19:51,933 ERROR [org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner] (default task-11) [] Failed to execute multiple actions of type 'StopVm': java.util.concurrent.RejectedExecutionException: Task java.util.concurr
ent.FutureTask@6f8619c0 rejected from org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalThreadExecutor@31835b5f[Running, pool size = 500, active threads = 500, queued tasks = 100, completed tasks = 24428]
2016-12-05 17:19:51,933 ERROR [org.ovirt.engine.core.bll.PrevalidatingMultipleActionsRunner] (default task-11) [] Exception: java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask
@6f8619c0 rejected from org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalThreadExecutor@31835b5f[Running, pool size = 500, active threads = 500, queued tasks = 100, completed tasks = 24428]

Then tried to see how many threads were open by ovirt-engine server and I get:

[root@~]# ps huH p 28300 | wc -l
587

It should be noted that the env has not been yum updated for some time and there are a lot of eap7 packages that require update.

As for java I did not see a required update and the current running version is:
[root@~]# java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-b15)
OpenJDK 64-Bit Server VM (build 25.111-b15, mixed mode)

What have I been doing with this env past few weeks:

Other then just managing a few vms and running some manual test cases I have ran a script to check a possible vdsm memory leak. The script was starting 5-8 vms and stopping them with 1:30 minutes between each action, and it ran using python-ovirt-engine-sdk4-4.0.2-1.el7ev.x86_64 for around 10 days (on and off, but mostly on).
This might be some lead to what may have caused the problem.

Attached is engine, server logs and thread dump (out.log - dump of jstack -J-d64 <pid>)

Version-Release number of selected component (if applicable):
rhevm-4.0.6.1-0.1.el7ev.noarch

How reproducible:
Once

Steps to Reproduce:
Not sure

Actual results:
Engine has 587 running threads and no new task can be started.

Expected results:
Threads should not be stuck.

Comment 1 sefi litmanovich 2016-12-06 11:47:48 UTC

*** This bug has been marked as a duplicate of bug 1401585 ***