Bug 1326900
| Summary: | OVirt remove single disk snapshot. Task does not end | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Markus Stockhausen <mst> | ||||||||||||
| Component: | Backend.Core | Assignee: | Ala Hino <ahino> | ||||||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Aharon Canan <acanan> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | unspecified | ||||||||||||||
| Version: | 3.6.4.1 | CC: | ahino, amureini, bugs, mst, nsoffer, tnisan, ylavi | ||||||||||||
| Target Milestone: | ovirt-3.6.7 | Flags: | tnisan:
ovirt-3.6.z?
rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack? |
||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2016-05-29 12:06:35 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Markus Stockhausen
2016-04-13 17:32:01 UTC
Created attachment 1146920 [details]
snap2
Created attachment 1146921 [details]
snap1
Markus, can you attach engine and vdsm logs? Created attachment 1146924 [details]
engine.log
Created attachment 1146926 [details]
vdsm.log
Et voila In vdsm log we see:
1. We start a live merge
jsonrpc.Executor/1::DEBUG::2016-04-13 19:12:00,527::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.merge' in bridge with {u'topVolUUID': u'a5acfb9f-f774-4
f05-a527-d6dd4250ab95', u'vmID': u'0a25649f-97fb-4ced-bbf8-e03159ceb158', u'drive': {u'domainID': u'c86d8409-4dd6-4e30-86dc-b5175a7ceb86', u'volumeID': u'a5acfb9f-f774-4f05-
a527-d6dd4250ab95', u'poolID': u'94ed7a19-fade-4bd6-83f2-2cbb2f730b95', u'imageID': u'30338c17-6644-4f9d-9277-2b5fbbface04'}, u'bandwidth': u'0', u'jobUUID': u'c8045708-c52f
-453e-989a-60d79d10d108', u'baseVolUUID': u'691778c3-d3af-4263-aea4-fa9556ef8986'}
2. Live merge starts without an error
jsonrpc.Executor/1::DEBUG::2016-04-13 19:12:02,118::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'VM.merge' in bridge with True
3. Vdsm fails to connect to qemu
VM Channels Listener::DEBUG::2016-04-13 19:12:07,474::guestagent::225::virt.vm::(_connect) vmId=`dd04fad6-69dc-4a76-a0f4-b627cf8dbeb2`::Attempting connection to /var/lib/libvirt/qemu/channels/dd04fad6-69dc-4a76-a0f4-b627cf8dbeb2.com.redhat.rhevm.vdsm
VM Channels Listener::DEBUG::2016-04-13 19:12:07,475::guestagent::239::virt.vm::(_connect) vmId=`dd04fad6-69dc-4a76-a0f4-b627cf8dbeb2`::Failed to connect to /var/lib/libvirt/qemu/channels/dd04fad6-69dc-4a76-a0f4-b627cf8dbeb2.com.redhat.rhevm.vdsm with 11
This is EAGAIN - this error is expected when using non-blocking io,
looks like bad log.
3. Huge amound of periodic workers created and discarded
vdsm.Scheduler::DEBUG::2016-04-13 19:12:17,095::executor::209::Executor::(_discard) Worker periodic/9891 discarded
vdsm.Scheduler::DEBUG::2016-04-13 19:12:17,095::executor::157::Executor::(__init__) Starting worker periodic/9900
periodic/9900::DEBUG::2016-04-13 19:12:17,096::executor::171::Executor::(_run) Worker started
periodic/9891::DEBUG::2016-04-13 19:12:19,780::executor::178::Executor::(_run) Worker was discarded
Typically there are 4! periodic threads - here we are starting the 9900
thread, and discarding the 9891 thread.
4. Error checking drive watermark
periodic/9894::WARNING::2016-04-13 19:12:20,858::periodic::258::virt.periodic.VmDispatcher::(__call__) could not run <class 'virt.periodic.DriveWatermarkMonitor'> on [u'542affb0-7a2f-4297-84f4-728a315a6abd']
Generally on the vdsm side, this look like issues with libvirt and qemu, expected
when we consider bug 1319400.
I guess that libvirt is not strating the block job, because qemu is not responding,
and this fools engine, thinking that the block job has completed while it did not
start yet.
Needs deeper inspection on the engine side. Ala, can you take a look?
Regarding block starting or not. From our system metrics we can see that qemu indeed starts the live merge and runs disk IO. See picture attached. Created attachment 1146943 [details]
live merge disk write
Second pass at vdsm log show the the merge succeeded:
1. Starting merge
jsonrpc.Executor/1::DEBUG::2016-04-13 19:12:00,527::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.merge' in bridge with {u'topVolUUID': u'a5acfb9f-f774-4
f05-a527-d6dd4250ab95', u'vmID': u'0a25649f-97fb-4ced-bbf8-e03159ceb158', u'drive': {u'domainID': u'c86d8409-4dd6-4e30-86dc-b5175a7ceb86', u'volumeID': u'a5acfb9f-f774-4f05-
a527-d6dd4250ab95', u'poolID': u'94ed7a19-fade-4bd6-83f2-2cbb2f730b95', u'imageID': u'30338c17-6644-4f9d-9277-2b5fbbface04'}, u'bandwidth': u'0', u'jobUUID': u'c8045708-c52f
-453e-989a-60d79d10d108', u'baseVolUUID': u'691778c3-d3af-4263-aea4-fa9556ef8986'}
2. Starting block job
jsonrpc.Executor/1::INFO::2016-04-13 19:12:00,631::vm::4874::virt.vm::(merge) vmId=`0a25649f-97fb-4ced-bbf8-e03159ceb158`::Starting merge with jobUUID='c8045708-c52f-453e-989a-60d79d10d108'
3. Merge finished, trying to pivot to the merged image
Thread-46628::INFO::2016-04-13 19:14:35,854::vm::5159::virt.vm::(tryPivot) vmId=`0a25649f-97fb-4ced-bbf8-e03159ceb158`::Requesting pivot to complete active layer commit (job c8045708-c52f-453e-989a-60d79d10d108)
4. Pivot completed, using new merged image
Thread-46628::INFO::2016-04-13 19:14:35,897::vm::5172::virt.vm::(tryPivot) vmId=`0a25649f-97fb-4ced-bbf8-e03159ceb158`::Pivot completed (job c8045708-c52f-453e-989a-60d79d10d108)
5. Synchronizing volume chain
Thread-46628::INFO::2016-04-13 19:14:35,897::vm::5195::virt.vm::(run) vmId=`0a25649f-97fb-4ced-bbf8-e03159ceb158`::Synchronizing volume chain after live merge (job c8045708-c52f-453e-989a-60d79d10d108)
...
Thread-46628::INFO::2016-04-13 19:14:36,001::vm::5201::virt.vm::(run) vmId=`0a25649f-97fb-4ced-bbf8-e03159ceb158`::Synchronization completed (job c8045708-c52f-453e-989a-60d79d10d108)
So this is an issue on the engine side.
We made a lot of fixes in ovirt 3.6.7 to address all the live merge issues we had. Can you test and see if this resolves your issues? Waiting for response from submitter to see if issue fixed in ovirt 3.6.7 (In reply to Ala Hino from comment #12) > Waiting for response from submitter to see if issue fixed in ovirt 3.6.7 All the relevant fixes [that we know about] were included in 3.6.6 We can't reproduce this on the latest 3.6.6 - closing. If you're able to reproduce it, please reopen and attach the steps to reproduce and updated logs. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |