Bug 1038975 - SPM is stopped and pool is disconnected while asynchronous task is scheduled
Summary: SPM is stopped and pool is disconnected while asynchronous task is scheduled
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.4.0
Assignee: Liron Aravot
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-06 09:41 UTC by Nir Soffer
Modified: 2016-02-10 18:10 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-02 13:01:18 UTC
oVirt Team: Storage
Target Upstream Version:


Attachments (Terms of Use)
Log of CI job failing (1.01 MB, application/octet-stream)
2013-12-06 09:46 UTC, Nir Soffer
no flags Details

Description Nir Soffer 2013-12-06 09:41:22 UTC
Description of problem:

After stopping spm and disconnecting storage pool, and then connecting again to storage pool, a reference to the old pool object is released when a copyImage task is scheculed on a thread. Looks like spm was stopped when an asynchronous task was scheduled.

Version-Release number of selected component (if applicable):
is26 + debugging patch

How reproducible:
Radnom

Steps to Reproduce:
1. Use this debugging patch: http://gerrit.ovirt.org/#/c/21932/
1. Run jenkins  rhevm 3.3 automation coretools two hosts restapi vms nfs rest factory vdsm until it fails with "Low space error" (error is a symbpthom of bug 1032925).

Actual results:

- Test fail with "Low space error" (but there is lot of space)
- In the log, we can see that old pool was deleted (__del__) when a copyImage task was commited. Looks like the task thread was holding a reference to the old pool that was recently disconnected.
- Looks like engine stop spm and disconnect storage pool when copyImage task was schecudled

Expected results:

- copyImage task should be canceled or spm stop should fail. We cannot have spm operations scheuled or running when spm is stopped.

Additional info:

This may be engine issue (stopping spm when it should not), and vdsm issue (allowing stop spm when it should fail), or both.

Comment 1 Nir Soffer 2013-12-06 09:46:34 UTC
Created attachment 833513 [details]
Log of CI job failing

Comment 2 Nir Soffer 2013-12-07 23:29:56 UTC
I did not found any issue regarding incorect stopping of spm in the logs.

Liron, can you verify this and confirm that engine is operating correctly?

Comment 3 Vered Volansky 2013-12-08 05:57:43 UTC
Nir, please provide logs (engine + vdsm).

Comment 4 Nir Soffer 2013-12-08 11:55:46 UTC
(In reply to Vered Volansky from comment #3)
> Nir, please provide logs (engine + vdsm).

I already did:
https://bugzilla.redhat.com/attachment.cgi?id=833513

Comment 5 Nir Soffer 2014-01-02 13:01:18 UTC
We did not find any issue regarding stopping spm or disconnecting from storage. The real problem was that old pool was kept by the thread pool and deleted many minutes after the pool was diconnnected. This issue was resolved by http://gerrit.ovirt.org/22136.

Comment 6 Nir Soffer 2014-01-02 14:32:58 UTC
To make it clear after Allon change the close reason to UPSTREAM:

1. This bug is invalid - there was no such bug - there was no active task when spm was stopped and pool was disconnected.
2. The bug was not fixed in upstream since there was nothing to fix :-)

There seems to be no reasonable close reason in this bugzilla. Hopefully someone can add INVALID status.


Note You need to log in before you can comment on or make changes to this bug.