Bug 1408307
| Summary: | Temporary file descriptor leak after running storage job | ||
|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Nir Soffer <nsoffer> |
| Component: | Core | Assignee: | Nir Soffer <nsoffer> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.19.0 | CC: | bugs, nsoffer, tnisan, ybronhei |
| Target Milestone: | ovirt-4.1.0-beta | Flags: | rule-engine:
ovirt-4.1+
|
| Target Release: | 4.19.2 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-02-01 14:34:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Nir Soffer
2016-12-22 20:45:07 UTC
After job is done, open files amount in /proc/%VDSM-PID%/fd returns to what it was before it started. Tested for move image and for snapshot cold merge.
========================
Before job started:
========================
Every 2.0s: vdsm-client Host getJobs Wed Jan 18 11:25:48 2017
{}
========================
Every 2.0s: ls -1 /proc/29847/fd | wc -l Wed Jan 18 11:26:08 2017
100
========================
During running job (move image):
========================
Every 2.0s: vdsm-client Host getJobs Wed Jan 18 11:27:08 2017
{
"ff0dc6cb-f9b4-4f88-9b02-763792fd52eb": {
"status": "running",
"progress": 2.08,
"job_type": "storage",
"id": "ff0dc6cb-f9b4-4f88-9b02-763792fd52eb",
"description": "copy_data"
}
}
========================
Every 2.0s: ls -1 /proc/29847/fd | wc -l Wed Jan 18 11:27:19 2017
112
========================
After job is done (move image):
========================
Every 2.0s: vdsm-client Host getJobs Wed Jan 18 11:29:43 2017
{
"ff0dc6cb-f9b4-4f88-9b02-763792fd52eb": {
"status": "done",
"progress": 100.0,
"job_type": "storage",
"id": "ff0dc6cb-f9b4-4f88-9b02-763792fd52eb",
"description": "copy_data"
}
}
========================
Every 2.0s: ls -1 /proc/29847/fd | wc -l Wed Jan 18 11:31:10 2017
100
========================
Tested using
vdsm-4.20.0-231.git4739c81.el7.centos.x86_64
ovirt-engine-4.1.0-0.4.master.20170117182028.git4678f8c.el7.centos.noarch
(In reply to Elad from comment #1) > Tested using > vdsm-4.20.0-231.git4739c81.el7.centos.x86_64 You tested master build - we need to test 4.1 (4.19.x). Thanks for this detailed verification! Thanks Nir,
Tested again with latest 4.1 (vdsm-4.19.1-56.gitb2ac850.el7.centos.x86_64)
========================
Before job started:
========================
Every 2.0s: vdsm-client Host getJobs Wed Jan 18 12:12:30 2017
{}
========================
Every 2.0s: ls -1 /proc/14171/fd | wc -l Wed Jan 18 12:12:56 2017
107
========================
During running job (move image):
========================
Every 2.0s: vdsm-client Host getJobs Wed Jan 18 12:13:44 2017
{
"2a704487-4234-46bb-83c9-f2601a903746": {
"status": "running",
"progress": 9.38,
"job_type": "storage",
"id": "2a704487-4234-46bb-83c9-f2601a903746",
"description": "copy_data"
}
}
========================
Every 2.0s: ls -1 /proc/14171/fd | wc -l Wed Jan 18 12:14:13 2017
163
========================
After job is done (move image):
========================
Every 2.0s: vdsm-client Host getJobs Wed Jan 18 12:16:18 2017
{
"2a704487-4234-46bb-83c9-f2601a903746": {
"status": "done",
"progress": 100.0,
"job_type": "storage",
"id": "2a704487-4234-46bb-83c9-f2601a903746",
"description": "copy_data"
}
}
========================
Every 2.0s: ls -1 /proc/14171/fd | wc -l Wed Jan 18 12:16:28 2017
107
========================
BTW, why is the job being cleared only after an hour? 2017-01-18 11:29:36,039 INFO (tasks/1) [root] Job 'ff0dc6cb-f9b4-4f88-9b02-763792fd52eb' completed (jobs:203) 2017-01-18 11:29:36,040 INFO (tasks/1) [root] Job 'ff0dc6cb-f9b4-4f88-9b02-763792fd52eb' will be deleted in 3600 seconds (jobs:245) (In reply to Elad from comment #4) > BTW, why is the job being cleared only after an hour? This is a tradeoff, ensuring that engine get the result of the job in most cases even if you shut down engine in the middle of an operation, and making it easy to debug the system. The jobs data is very small, keeping last jobs in memory for an hour should not matter to anyone. |