Hide Forgot
Description of problem: After stopping spm and disconnecting storage pool, and then connecting again to storage pool, a reference to the old pool object is released when a copyImage task is scheculed on a thread. Looks like spm was stopped when an asynchronous task was scheduled. Version-Release number of selected component (if applicable): is26 + debugging patch How reproducible: Radnom Steps to Reproduce: 1. Use this debugging patch: http://gerrit.ovirt.org/#/c/21932/ 1. Run jenkins rhevm 3.3 automation coretools two hosts restapi vms nfs rest factory vdsm until it fails with "Low space error" (error is a symbpthom of bug 1032925). Actual results: - Test fail with "Low space error" (but there is lot of space) - In the log, we can see that old pool was deleted (__del__) when a copyImage task was commited. Looks like the task thread was holding a reference to the old pool that was recently disconnected. - Looks like engine stop spm and disconnect storage pool when copyImage task was schecudled Expected results: - copyImage task should be canceled or spm stop should fail. We cannot have spm operations scheuled or running when spm is stopped. Additional info: This may be engine issue (stopping spm when it should not), and vdsm issue (allowing stop spm when it should fail), or both.
Created attachment 833513 [details] Log of CI job failing
I did not found any issue regarding incorect stopping of spm in the logs. Liron, can you verify this and confirm that engine is operating correctly?
Nir, please provide logs (engine + vdsm).
(In reply to Vered Volansky from comment #3) > Nir, please provide logs (engine + vdsm). I already did: https://bugzilla.redhat.com/attachment.cgi?id=833513
We did not find any issue regarding stopping spm or disconnecting from storage. The real problem was that old pool was kept by the thread pool and deleted many minutes after the pool was diconnnected. This issue was resolved by http://gerrit.ovirt.org/22136.
To make it clear after Allon change the close reason to UPSTREAM: 1. This bug is invalid - there was no such bug - there was no active task when spm was stopped and pool was disconnected. 2. The bug was not fixed in upstream since there was nothing to fix :-) There seems to be no reasonable close reason in this bugzilla. Hopefully someone can add INVALID status.