Bug 1326900
Summary: | OVirt remove single disk snapshot. Task does not end | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Markus Stockhausen <mst> | ||||||||||||
Component: | Backend.Core | Assignee: | Ala Hino <ahino> | ||||||||||||
Status: | CLOSED WORKSFORME | QA Contact: | Aharon Canan <acanan> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 3.6.4.1 | CC: | ahino, amureini, bugs, mst, nsoffer, tnisan, ylavi | ||||||||||||
Target Milestone: | ovirt-3.6.7 | Flags: | tnisan:
ovirt-3.6.z?
rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack? |
||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2016-05-29 12:06:35 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Markus Stockhausen
2016-04-13 17:32:01 UTC
Created attachment 1146920 [details]
snap2
Created attachment 1146921 [details]
snap1
Markus, can you attach engine and vdsm logs? Created attachment 1146924 [details]
engine.log
Created attachment 1146926 [details]
vdsm.log
Et voila In vdsm log we see: 1. We start a live merge jsonrpc.Executor/1::DEBUG::2016-04-13 19:12:00,527::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.merge' in bridge with {u'topVolUUID': u'a5acfb9f-f774-4 f05-a527-d6dd4250ab95', u'vmID': u'0a25649f-97fb-4ced-bbf8-e03159ceb158', u'drive': {u'domainID': u'c86d8409-4dd6-4e30-86dc-b5175a7ceb86', u'volumeID': u'a5acfb9f-f774-4f05- a527-d6dd4250ab95', u'poolID': u'94ed7a19-fade-4bd6-83f2-2cbb2f730b95', u'imageID': u'30338c17-6644-4f9d-9277-2b5fbbface04'}, u'bandwidth': u'0', u'jobUUID': u'c8045708-c52f -453e-989a-60d79d10d108', u'baseVolUUID': u'691778c3-d3af-4263-aea4-fa9556ef8986'} 2. Live merge starts without an error jsonrpc.Executor/1::DEBUG::2016-04-13 19:12:02,118::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'VM.merge' in bridge with True 3. Vdsm fails to connect to qemu VM Channels Listener::DEBUG::2016-04-13 19:12:07,474::guestagent::225::virt.vm::(_connect) vmId=`dd04fad6-69dc-4a76-a0f4-b627cf8dbeb2`::Attempting connection to /var/lib/libvirt/qemu/channels/dd04fad6-69dc-4a76-a0f4-b627cf8dbeb2.com.redhat.rhevm.vdsm VM Channels Listener::DEBUG::2016-04-13 19:12:07,475::guestagent::239::virt.vm::(_connect) vmId=`dd04fad6-69dc-4a76-a0f4-b627cf8dbeb2`::Failed to connect to /var/lib/libvirt/qemu/channels/dd04fad6-69dc-4a76-a0f4-b627cf8dbeb2.com.redhat.rhevm.vdsm with 11 This is EAGAIN - this error is expected when using non-blocking io, looks like bad log. 3. Huge amound of periodic workers created and discarded vdsm.Scheduler::DEBUG::2016-04-13 19:12:17,095::executor::209::Executor::(_discard) Worker periodic/9891 discarded vdsm.Scheduler::DEBUG::2016-04-13 19:12:17,095::executor::157::Executor::(__init__) Starting worker periodic/9900 periodic/9900::DEBUG::2016-04-13 19:12:17,096::executor::171::Executor::(_run) Worker started periodic/9891::DEBUG::2016-04-13 19:12:19,780::executor::178::Executor::(_run) Worker was discarded Typically there are 4! periodic threads - here we are starting the 9900 thread, and discarding the 9891 thread. 4. Error checking drive watermark periodic/9894::WARNING::2016-04-13 19:12:20,858::periodic::258::virt.periodic.VmDispatcher::(__call__) could not run <class 'virt.periodic.DriveWatermarkMonitor'> on [u'542affb0-7a2f-4297-84f4-728a315a6abd'] Generally on the vdsm side, this look like issues with libvirt and qemu, expected when we consider bug 1319400. I guess that libvirt is not strating the block job, because qemu is not responding, and this fools engine, thinking that the block job has completed while it did not start yet. Needs deeper inspection on the engine side. Ala, can you take a look? Regarding block starting or not. From our system metrics we can see that qemu indeed starts the live merge and runs disk IO. See picture attached. Created attachment 1146943 [details]
live merge disk write
Second pass at vdsm log show the the merge succeeded: 1. Starting merge jsonrpc.Executor/1::DEBUG::2016-04-13 19:12:00,527::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.merge' in bridge with {u'topVolUUID': u'a5acfb9f-f774-4 f05-a527-d6dd4250ab95', u'vmID': u'0a25649f-97fb-4ced-bbf8-e03159ceb158', u'drive': {u'domainID': u'c86d8409-4dd6-4e30-86dc-b5175a7ceb86', u'volumeID': u'a5acfb9f-f774-4f05- a527-d6dd4250ab95', u'poolID': u'94ed7a19-fade-4bd6-83f2-2cbb2f730b95', u'imageID': u'30338c17-6644-4f9d-9277-2b5fbbface04'}, u'bandwidth': u'0', u'jobUUID': u'c8045708-c52f -453e-989a-60d79d10d108', u'baseVolUUID': u'691778c3-d3af-4263-aea4-fa9556ef8986'} 2. Starting block job jsonrpc.Executor/1::INFO::2016-04-13 19:12:00,631::vm::4874::virt.vm::(merge) vmId=`0a25649f-97fb-4ced-bbf8-e03159ceb158`::Starting merge with jobUUID='c8045708-c52f-453e-989a-60d79d10d108' 3. Merge finished, trying to pivot to the merged image Thread-46628::INFO::2016-04-13 19:14:35,854::vm::5159::virt.vm::(tryPivot) vmId=`0a25649f-97fb-4ced-bbf8-e03159ceb158`::Requesting pivot to complete active layer commit (job c8045708-c52f-453e-989a-60d79d10d108) 4. Pivot completed, using new merged image Thread-46628::INFO::2016-04-13 19:14:35,897::vm::5172::virt.vm::(tryPivot) vmId=`0a25649f-97fb-4ced-bbf8-e03159ceb158`::Pivot completed (job c8045708-c52f-453e-989a-60d79d10d108) 5. Synchronizing volume chain Thread-46628::INFO::2016-04-13 19:14:35,897::vm::5195::virt.vm::(run) vmId=`0a25649f-97fb-4ced-bbf8-e03159ceb158`::Synchronizing volume chain after live merge (job c8045708-c52f-453e-989a-60d79d10d108) ... Thread-46628::INFO::2016-04-13 19:14:36,001::vm::5201::virt.vm::(run) vmId=`0a25649f-97fb-4ced-bbf8-e03159ceb158`::Synchronization completed (job c8045708-c52f-453e-989a-60d79d10d108) So this is an issue on the engine side. We made a lot of fixes in ovirt 3.6.7 to address all the live merge issues we had. Can you test and see if this resolves your issues? Waiting for response from submitter to see if issue fixed in ovirt 3.6.7 (In reply to Ala Hino from comment #12) > Waiting for response from submitter to see if issue fixed in ovirt 3.6.7 All the relevant fixes [that we know about] were included in 3.6.6 We can't reproduce this on the latest 3.6.6 - closing. If you're able to reproduce it, please reopen and attach the steps to reproduce and updated logs. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |