Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1622352

Summary: Disk remains in Lock State even though transfer_service.finalize() is called.
Product: [oVirt] ovirt-engine Reporter: Pratik Narode <pratik.narode>
Component: RestAPIAssignee: Daniel Erez <derez>
Status: CLOSED NOTABUG QA Contact: Petr Matyáš <pmatyas>
Severity: high Docs Contact:
Priority: medium    
Version: ---CC: bugs, dfediuck, juan.hernandez, lsvaty, michal.skrivanek, mperina, nsoffer, omachace, pratik.narode, tnisan
Target Milestone: ovirt-4.4.1Keywords: Reopened
Target Release: ---Flags: pm-rhel: ovirt-4.4+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-26 09:48:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Image showing disk in locked state.
none
Image showing vm in locked state
none
The db query output
none
Engine log file
none
daemon logs
none
vdsm logs
none
proxy logs none

Description Pratik Narode 2018-08-26 17:07:29 UTC
Created attachment 1478801 [details]
Image showing disk in locked state.

Description of problem:
I am using ovirt python sdk api's for backing up virtual machines in Ovirt. In doing the same, I am creating a disk transfer request using transfer service. After the disk is transferred and at the end when transfer_service.finalize() is called to release all the resources, the disk still remains locked and the status is show as finalised.

Actual results:
The disk goes in locked state for ever even though transfer_service.finalize() is called.

Expected results:
The disk should get unlocked.

Comment 1 Pratik Narode 2018-08-26 18:44:39 UTC
Created attachment 1478841 [details]
Image showing vm in locked state

Also Sometime even though the Disk gets unlocked, the vm still remains in locked state for too long or sometimes even goes into complete lock state. But then the db query on ovirt-engine won't show any locked entities.

Comment 2 Pratik Narode 2018-08-26 18:45:32 UTC
Created attachment 1478842 [details]
The db query output

Comment 3 Daniel Erez 2018-08-28 18:21:07 UTC
@Pratik - can you please add engine/vdsm/imageio-proxy/imageio-daemon logs.

Comment 4 Pratik Narode 2018-08-28 18:46:21 UTC
(In reply to Daniel Erez from comment #3)
> @Pratik - can you please add engine/vdsm/imageio-proxy/imageio-daemon logs.

Ok This was like 2-3 days back. But will try to get the logs for the same if I can Thanks.

Comment 5 Tal Nisan 2018-10-03 12:00:33 UTC
Hi Pratik,
I'm closing for now as we don't have the necessary info, please reopen if you hit the issue again and provide logs

Comment 6 Pratik Narode 2018-10-28 18:56:20 UTC
Created attachment 1498366 [details]
Engine log file

Comment 7 Pratik Narode 2018-10-28 19:15:14 UTC
Hi Tal,

I am reopening this bug because I have faced same issue again. I have attached engine log's which I have captured when my disk entered locked state.
The sequence of operation I performed are:
1. Create a VM Snapshot using python-sdk.
2. Created a transfer request for the snapshotted disk using python-sdk
3. Did the data transfer.
4. Finalized the disk transfer.

Disk Name: Win2k8_Disk1
Disk Id: e00d4b57-e967-4cd9-8226-271367431a23
Image Id: 0cb92b72-f59f-4689-ba08-41048a547c7d
TicketId: d57d8ae1-ceb8-4b42-96a0-4280a2d32cc6
In engine log's at line no: 21 you will find entry for snapshot.
The transfer request is register on line no: 69
And the transfer error starts on line no: 115
And after that it continues.

Comment 8 Daniel Erez 2018-10-28 19:34:51 UTC
Hi Pratik,

* Can you please attach vdsm/imageio-daemon/imageio-proxy logs?
* Also, if possible, attach the script used for downloading.

Comment 9 Pratik Narode 2018-10-29 05:13:55 UTC
Created attachment 1498407 [details]
daemon logs

Comment 10 Pratik Narode 2018-10-29 05:15:09 UTC
Created attachment 1498408 [details]
vdsm logs

Comment 11 Pratik Narode 2018-10-29 05:16:08 UTC
Created attachment 1498409 [details]
proxy logs

Comment 12 Daniel Erez 2018-10-29 12:25:39 UTC
According to the daemon log [1], the daemon service occasionally shutdown (signal 15). Did you manually restart/kill the service? Is there insufficient available space on host perhaps? Is it possible to check whether the issue reproduce on a different host?

[1]
2018-10-25 08:42:01,331 INFO    (MainThread) [server] Received signal 15, shutting down
2018-10-25 08:42:01,331 ERROR   (MainThread) [server] Service failed (remote_service=<server.RemoteService object at 0x7f8e599eed10>, control_service=<server.ControlService object at 0x7f8e59884150>, running=False)
Traceback (most recent call last):
  File "/opt/ovirt-imageio/daemon/server.py", line 75, in main
    stop()
  File "/opt/ovirt-imageio/daemon/server.py", line 112, in stop
    imageio_server.stop()
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 106, in stop
    remote_service.stop()
AttributeError: 'NoneType' object has no attribute 'stop'

Comment 13 Pratik Narode 2018-10-29 13:51:07 UTC
Hi Daniel,

No the issue is not due to Daemon I think. Because when I monitored the daemon log when the transfer was in progress, the transfer was successfully done and ticket was gracefully removed. Also this issue occurs on every host in my setup, where I have about 4 different host.

Comment 14 Daniel Erez 2018-10-31 09:45:00 UTC
@Pratik - is the vdsm log from around '2018-10-26 03:23:38' still available? According to the engine log[1] seems TeardownImage failed on vdsm. Also, the download failure seems to occur due to timeout[2], so no need to call finalize() manually in this scenario. Is it possible to attach the script you're using for download?


[1] 2018-10-26 03:23:38,746-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [ec25a082-5dbc-4103-b1cb-f8b87c4791b1] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host4.triliodata.demo command TeardownImageVDS failed: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\'  Logical volume ff5b243d-ecd6-4cc9-9eaa-933f54be592a/9776e93c-eb85-4965-830d-597900046d9b in use.\', \'  Logical volume ff5b243d-ecd6-4cc9-9eaa-933f54be592a/b47d15fd-0a08-45d2-9128-e0591ed1bca3 in use.\', \'  Logical volume ff5b243d-ecd6-4cc9-9eaa-933f54be592a/aa3af403-a2af-49ea-98c9-4e9fe680a9c3 in use.\', \'  Logical volume ff5b243d-ecd6-4cc9-9eaa-933f54be592a/9aec6c03-1324-4da3-8c17-ad9c693fbef4 in use.\']\\nff5b243d-ecd6-4cc9-9eaa-933f54be592a/[\'9776e93c-eb85-4965-830d-597900046d9b\', \'6aa99dab-bffc-49c3-ace9-e9d5a6df36f6\', \'0cb92b72-f59f-4689-ba08-41048a547c7d\', \'b47d15fd-0a08-45d2-9128-e0591ed1bca3\', \'aa3af403-a2af-49ea-98c9-4e9fe680a9c3\', \'9aec6c03-1324-4da3-8c17-ad9c693fbef4\', \'505e66cd-cf84-48fb-8411-de92010956a0\']",)',)

[2]
2018-10-26 03:22:26,573-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-76) [ec25a082-5dbc-4103-b1cb-f8b87c4791b1] EVENT_ID: DOWNLOAD_IMAGE_CANCELED_TIMEOUT(1,072), Download was canceled by system. Reason: timeout due to transfer inactivity.

Comment 15 Nir Soffer 2018-10-31 19:21:22 UTC
Pratik, are you running unmodified 1.4.4 version?

Looking at the 1.4.4 code, I don't see how receiving signal 15 can cause this error:

2018-10-04 07:40:58,803 INFO    (MainThread) [server] Received signal 15, shutting down
2018-10-04 07:40:58,803 ERROR   (MainThread) [server] Service failed (remote_service=<server.RemoteService object at 0x7f64fc426d50>, control_service=<server.ControlService object at 0x7f64fc303190>, running=False)
Traceback (most recent call last):
  File "/opt/ovirt-imageio/daemon/server.py", line 75, in main
    stop()
  File "/opt/ovirt-imageio/daemon/server.py", line 112, in stop
    imageio_server.stop()
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 106, in stop
    remote_service.stop()
AttributeError: 'NoneType' object has no attribute 'stop'

Also, in 1.4.4 the stop() call in main is in line 65:

 64         finally:
 65             stop() 
 66         log.info("Stopped")

So it seems that you are not running release 1.4.4.

Comment 16 Nir Soffer 2018-10-31 19:47:55 UTC
(In reply to Pratik Narode from comment #7)
> The sequence of operation I performed are:
> 1. Create a VM Snapshot using python-sdk.
> 2. Created a transfer request for the snapshotted disk using python-sdk
> 3. Did the data transfer.
> 4. Finalized the disk transfer.
> 
> Disk Name: Win2k8_Disk1
> Disk Id: e00d4b57-e967-4cd9-8226-271367431a23
> Image Id: 0cb92b72-f59f-4689-ba08-41048a547c7d
> TicketId: d57d8ae1-ceb8-4b42-96a0-4280a2d32cc6
> In engine log's at line no: 21 you will find entry for snapshot.
> The transfer request is register on line no: 69
> And the transfer error starts on line no: 115
> And after that it continues.

Pratik, the daemon log show that you did not do any transfer, so the system 
probably timed out and paused the transfer. This can explain why finalize 
kept the disk locked.

2018-10-26 03:22:43,154 INFO    (Thread-1) [tickets] [local] ADD ticket={u'uuid': u'd57d8ae1-ceb8-4b42-96a0-4280a2d32cc6', u'ops': [u'read'], u'url': u'file:///rhev/data-center/mnt/blockSD/ff5b243d-ecd6-4cc9-9eaa-933f54be592a/images/e00d4b57-e967-4cd9-8226-271367431a23/0cb92b72-f59f-4689-ba08-41048a547c7d', u'sparse': False, u'timeout': 300, u'size': 1073741824}
...
2018-10-26 03:23:47,152 INFO    (Thread-23) [tickets] [local] EXTEND timeout=300 ticket=d57d8ae1-ceb8-4b42-96a0-4280a2d32cc6
...
2018-10-26 03:24:07,332 INFO    (Thread-29) [tickets] [local] REMOVE ticket=d57d8ae1-ceb8-4b42-96a0-4280a2d32cc6

Additionally, the log:

    [tasks] Retrieving task cc8cc08f-f14a-4f85-9d60-1a4a361a2499

Is not logged by any version of ovirt-imageio. Maybe you run by mistake
a modified version[1]?

It is fine to adapt ovirt-imageio as you need, as long as you keep 
the GPL licence terms, but we cannot support modified code.

Please reproduce with official ovirt released rpm if you want us to
investigate this issue.

[1] https://github.com/PNarode/ovirt-imageio-1

Comment 17 Pratik Narode 2018-11-01 07:07:46 UTC
Hello Nir,

Yup I have customised the code for ovirt-imageio. Will try to reproduce this same with official ovirt released rpm. If it occurs again or am able to reproduces it then will log the issue again.

Thanks.

Comment 18 Sandro Bonazzola 2019-01-28 09:44:43 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 22 Lukas Svaty 2020-03-26 09:40:02 UTC
I don't believe this flow is used anymore, removing.

Comment 23 Michal Skrivanek 2020-03-26 09:48:35 UTC
(In reply to Pratik Narode from comment #17)
> Hello Nir,
> 
> Yup I have customised the code for ovirt-imageio. Will try to reproduce this
> same with official ovirt released rpm. If it occurs again or am able to
> reproduces it then will log the issue again.
> 
> Thanks.

Thanks, closing until then.