Bug 1622352
| Summary: | Disk remains in Lock State even though transfer_service.finalize() is called. | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Pratik Narode <pratik.narode> | ||||||||||||||||
| Component: | RestAPI | Assignee: | Daniel Erez <derez> | ||||||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Petr Matyáš <pmatyas> | ||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||
| Priority: | medium | ||||||||||||||||||
| Version: | --- | CC: | bugs, dfediuck, juan.hernandez, lsvaty, michal.skrivanek, mperina, nsoffer, omachace, pratik.narode, tnisan | ||||||||||||||||
| Target Milestone: | ovirt-4.4.1 | Keywords: | Reopened | ||||||||||||||||
| Target Release: | --- | Flags: | pm-rhel:
ovirt-4.4+
|
||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||
| Whiteboard: | |||||||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||
| Last Closed: | 2020-03-26 09:48:35 UTC | Type: | Bug | ||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
| Embargoed: | |||||||||||||||||||
| Attachments: |
|
||||||||||||||||||
Created attachment 1478841 [details]
Image showing vm in locked state
Also Sometime even though the Disk gets unlocked, the vm still remains in locked state for too long or sometimes even goes into complete lock state. But then the db query on ovirt-engine won't show any locked entities.
Created attachment 1478842 [details]
The db query output
@Pratik - can you please add engine/vdsm/imageio-proxy/imageio-daemon logs. (In reply to Daniel Erez from comment #3) > @Pratik - can you please add engine/vdsm/imageio-proxy/imageio-daemon logs. Ok This was like 2-3 days back. But will try to get the logs for the same if I can Thanks. Hi Pratik, I'm closing for now as we don't have the necessary info, please reopen if you hit the issue again and provide logs Created attachment 1498366 [details]
Engine log file
Hi Tal, I am reopening this bug because I have faced same issue again. I have attached engine log's which I have captured when my disk entered locked state. The sequence of operation I performed are: 1. Create a VM Snapshot using python-sdk. 2. Created a transfer request for the snapshotted disk using python-sdk 3. Did the data transfer. 4. Finalized the disk transfer. Disk Name: Win2k8_Disk1 Disk Id: e00d4b57-e967-4cd9-8226-271367431a23 Image Id: 0cb92b72-f59f-4689-ba08-41048a547c7d TicketId: d57d8ae1-ceb8-4b42-96a0-4280a2d32cc6 In engine log's at line no: 21 you will find entry for snapshot. The transfer request is register on line no: 69 And the transfer error starts on line no: 115 And after that it continues. Hi Pratik, * Can you please attach vdsm/imageio-daemon/imageio-proxy logs? * Also, if possible, attach the script used for downloading. Created attachment 1498407 [details]
daemon logs
Created attachment 1498408 [details]
vdsm logs
Created attachment 1498409 [details]
proxy logs
According to the daemon log [1], the daemon service occasionally shutdown (signal 15). Did you manually restart/kill the service? Is there insufficient available space on host perhaps? Is it possible to check whether the issue reproduce on a different host?
[1]
2018-10-25 08:42:01,331 INFO (MainThread) [server] Received signal 15, shutting down
2018-10-25 08:42:01,331 ERROR (MainThread) [server] Service failed (remote_service=<server.RemoteService object at 0x7f8e599eed10>, control_service=<server.ControlService object at 0x7f8e59884150>, running=False)
Traceback (most recent call last):
File "/opt/ovirt-imageio/daemon/server.py", line 75, in main
stop()
File "/opt/ovirt-imageio/daemon/server.py", line 112, in stop
imageio_server.stop()
File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 106, in stop
remote_service.stop()
AttributeError: 'NoneType' object has no attribute 'stop'
Hi Daniel, No the issue is not due to Daemon I think. Because when I monitored the daemon log when the transfer was in progress, the transfer was successfully done and ticket was gracefully removed. Also this issue occurs on every host in my setup, where I have about 4 different host. @Pratik - is the vdsm log from around '2018-10-26 03:23:38' still available? According to the engine log[1] seems TeardownImage failed on vdsm. Also, the download failure seems to occur due to timeout[2], so no need to call finalize() manually in this scenario. Is it possible to attach the script you're using for download?
[1] 2018-10-26 03:23:38,746-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [ec25a082-5dbc-4103-b1cb-f8b87c4791b1] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host4.triliodata.demo command TeardownImageVDS failed: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume ff5b243d-ecd6-4cc9-9eaa-933f54be592a/9776e93c-eb85-4965-830d-597900046d9b in use.\', \' Logical volume ff5b243d-ecd6-4cc9-9eaa-933f54be592a/b47d15fd-0a08-45d2-9128-e0591ed1bca3 in use.\', \' Logical volume ff5b243d-ecd6-4cc9-9eaa-933f54be592a/aa3af403-a2af-49ea-98c9-4e9fe680a9c3 in use.\', \' Logical volume ff5b243d-ecd6-4cc9-9eaa-933f54be592a/9aec6c03-1324-4da3-8c17-ad9c693fbef4 in use.\']\\nff5b243d-ecd6-4cc9-9eaa-933f54be592a/[\'9776e93c-eb85-4965-830d-597900046d9b\', \'6aa99dab-bffc-49c3-ace9-e9d5a6df36f6\', \'0cb92b72-f59f-4689-ba08-41048a547c7d\', \'b47d15fd-0a08-45d2-9128-e0591ed1bca3\', \'aa3af403-a2af-49ea-98c9-4e9fe680a9c3\', \'9aec6c03-1324-4da3-8c17-ad9c693fbef4\', \'505e66cd-cf84-48fb-8411-de92010956a0\']",)',)
[2]
2018-10-26 03:22:26,573-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-76) [ec25a082-5dbc-4103-b1cb-f8b87c4791b1] EVENT_ID: DOWNLOAD_IMAGE_CANCELED_TIMEOUT(1,072), Download was canceled by system. Reason: timeout due to transfer inactivity.
Pratik, are you running unmodified 1.4.4 version?
Looking at the 1.4.4 code, I don't see how receiving signal 15 can cause this error:
2018-10-04 07:40:58,803 INFO (MainThread) [server] Received signal 15, shutting down
2018-10-04 07:40:58,803 ERROR (MainThread) [server] Service failed (remote_service=<server.RemoteService object at 0x7f64fc426d50>, control_service=<server.ControlService object at 0x7f64fc303190>, running=False)
Traceback (most recent call last):
File "/opt/ovirt-imageio/daemon/server.py", line 75, in main
stop()
File "/opt/ovirt-imageio/daemon/server.py", line 112, in stop
imageio_server.stop()
File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 106, in stop
remote_service.stop()
AttributeError: 'NoneType' object has no attribute 'stop'
Also, in 1.4.4 the stop() call in main is in line 65:
64 finally:
65 stop()
66 log.info("Stopped")
So it seems that you are not running release 1.4.4.
(In reply to Pratik Narode from comment #7) > The sequence of operation I performed are: > 1. Create a VM Snapshot using python-sdk. > 2. Created a transfer request for the snapshotted disk using python-sdk > 3. Did the data transfer. > 4. Finalized the disk transfer. > > Disk Name: Win2k8_Disk1 > Disk Id: e00d4b57-e967-4cd9-8226-271367431a23 > Image Id: 0cb92b72-f59f-4689-ba08-41048a547c7d > TicketId: d57d8ae1-ceb8-4b42-96a0-4280a2d32cc6 > In engine log's at line no: 21 you will find entry for snapshot. > The transfer request is register on line no: 69 > And the transfer error starts on line no: 115 > And after that it continues. Pratik, the daemon log show that you did not do any transfer, so the system probably timed out and paused the transfer. This can explain why finalize kept the disk locked. 2018-10-26 03:22:43,154 INFO (Thread-1) [tickets] [local] ADD ticket={u'uuid': u'd57d8ae1-ceb8-4b42-96a0-4280a2d32cc6', u'ops': [u'read'], u'url': u'file:///rhev/data-center/mnt/blockSD/ff5b243d-ecd6-4cc9-9eaa-933f54be592a/images/e00d4b57-e967-4cd9-8226-271367431a23/0cb92b72-f59f-4689-ba08-41048a547c7d', u'sparse': False, u'timeout': 300, u'size': 1073741824} ... 2018-10-26 03:23:47,152 INFO (Thread-23) [tickets] [local] EXTEND timeout=300 ticket=d57d8ae1-ceb8-4b42-96a0-4280a2d32cc6 ... 2018-10-26 03:24:07,332 INFO (Thread-29) [tickets] [local] REMOVE ticket=d57d8ae1-ceb8-4b42-96a0-4280a2d32cc6 Additionally, the log: [tasks] Retrieving task cc8cc08f-f14a-4f85-9d60-1a4a361a2499 Is not logged by any version of ovirt-imageio. Maybe you run by mistake a modified version[1]? It is fine to adapt ovirt-imageio as you need, as long as you keep the GPL licence terms, but we cannot support modified code. Please reproduce with official ovirt released rpm if you want us to investigate this issue. [1] https://github.com/PNarode/ovirt-imageio-1 Hello Nir, Yup I have customised the code for ovirt-imageio. Will try to reproduce this same with official ovirt released rpm. If it occurs again or am able to reproduces it then will log the issue again. Thanks. This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. I don't believe this flow is used anymore, removing. (In reply to Pratik Narode from comment #17) > Hello Nir, > > Yup I have customised the code for ovirt-imageio. Will try to reproduce this > same with official ovirt released rpm. If it occurs again or am able to > reproduces it then will log the issue again. > > Thanks. Thanks, closing until then. |
Created attachment 1478801 [details] Image showing disk in locked state. Description of problem: I am using ovirt python sdk api's for backing up virtual machines in Ovirt. In doing the same, I am creating a disk transfer request using transfer service. After the disk is transferred and at the end when transfer_service.finalize() is called to release all the resources, the disk still remains locked and the status is show as finalised. Actual results: The disk goes in locked state for ever even though transfer_service.finalize() is called. Expected results: The disk should get unlocked.