Bug 1394538

Summary: LiveMerge: block-commit hangs at 100%
Product: [oVirt] vdsm Reporter: Markus Stockhausen <mst>
Component: CoreAssignee: Ala Hino <ahino>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: high    
Version: 4.17.28CC: ahino, bugs, michal.skrivanek, mst, tnisan, ylavi
Target Milestone: ovirt-4.1.1Keywords: ZStream
Target Release: ---Flags: ahino: needinfo? (mst)
rule-engine: ovirt-4.1+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1427180 (view as bug list) Environment:
Last Closed: 2017-02-27 14:43:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1427180    
Attachments:
Description Flags
VDM log
none
engine log none

Description Markus Stockhausen 2016-11-13 08:25:20 UTC
Description of problem:

We tried to free a storage domain in OVirt 4.0.4. therefore we used LieveStorageMigration heavily. After ~20 successful execution the 21st breaks in the last step of the flow "delete auto-generated snapshot". It is initiated but never finishes. 

Version-Release number of selected component (if applicable):

Ovirt 4.0.4 (current release)
vdsm 4.17.28 (last Centos release from 3.6.7)

How reproducible:

5%

Steps to Reproduce:
1. do LiveStorage Migration.
2.
3.

Actual results:

Task is started in engine but never finishes

Expected results:

Task should finish.

Additional info:

Engine & VDSM logs attached.

Comment 1 Markus Stockhausen 2016-11-13 08:27:41 UTC
Created attachment 1220124 [details]
VDM log

Comment 2 Markus Stockhausen 2016-11-13 08:28:10 UTC
Created attachment 1220125 [details]
engine log

Comment 3 Markus Stockhausen 2016-11-13 08:29:01 UTC
The live migrated disk that fails in the logs is "colvm21_Disk1"

Comment 4 Markus Stockhausen 2016-11-13 18:23:36 UTC
According to virsh qemu is still working with the snapshot:

virsh # domblklist colvm21
Ziel       Quelle
------------------------------------------------
...
vda        /rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/47202573-6e83-42fd-a274-d11f05eca2dd/images/c7590ac7-8ba6-4738-9909-0267c21a056c/4f041c43-74a5-4538-a171-84507e0fdc17
...

qemu-img info 4f041c43-74a5-4538-a171-84507e0fdc17
image: 4f041c43-74a5-4538-a171-84507e0fdc17
file format: qcow2
virtual size: 35G (37580963840 bytes)
disk size: 971M
cluster_size: 65536
backing file: ../c7590ac7-8ba6-4738-9909-0267c21a056c/6471d8f5-4f6d-419a-8432-7ec2c0dc46ae
backing file format: raw
Format specific information:
    compat: 0.10
    refcount bits: 16

Comment 5 Markus Stockhausen 2016-11-13 18:35:37 UTC
According to virsh the block-commit is still active at 100% 

virsh # blockjob colvm21 /rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/47202573-6e83-42fd-a274-d11f05eca2dd/images/c7590ac7-8ba6-4738-9909-0267c21a056c/4f041c43-74a5-4538-a171-84507e0fdc17 --info
Active Block Commit: [100 %]

The second VM disk shows no running action

virsh # blockjob colvm21 /rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/e74a14d7-3400-47d3-901b-b64e3a9118a4/7aac8b54-dad9-4e47-89fa-68b769ea5435 --info
No current block job for /rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/e74a14d7-3400-47d3-901b-b64e3a9118a4/7aac8b54-dad9-4e47-89fa-68b769ea5435

Comment 6 Markus Stockhausen 2016-11-13 18:39:37 UTC
qemu Version is 2.3.0-31.el7_2.10.1

Comment 7 Markus Stockhausen 2016-11-13 18:40:54 UTC
libvirtd version is 1.2.17-13.el7_2.4

Comment 8 Ala Hino 2017-01-02 08:17:46 UTC
Hi Markus,

As long as the block job runs, the corresponding task will not finish. This is expected. The question here is why the block job seems not to end.

Is there any high I/O on that specific disk during the migration?
Could this be similar to BZ 1376580 also reported by you?

Comment 9 Sandro Bonazzola 2017-01-25 07:56:59 UTC
4.0.6 has been the last oVirt 4.0 release, please re-target this bug.

Comment 11 Tal Nisan 2017-02-27 14:43:20 UTC
Closing as there's no new info for nearly 2 months, feel free to reopen if new data will arrive