Bug 1672699

Summary: [v2v] Cancel migration do not stop creating volume, instance and network port on OSP and VMs on RHV
Product: Red Hat CloudForms Management Engine Reporter: Satoe Imaishi <simaishi>
Component: V2VAssignee: Fabien Dupont <fdupont>
Status: CLOSED ERRATA QA Contact: Yadnyawalk Tale <ytale>
Severity: high Docs Contact: Red Hat CloudForms Documentation <cloudforms-docs>
Priority: urgent    
Version: 5.10.0CC: bthurber, dmetzger, fdupont, jhajyahy, tgolembi, ytale
Target Milestone: GAKeywords: ZStream
Target Release: 5.10.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: v2v
Fixed In Version: 5.10.1.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1666799 Environment:
Last Closed: 2019-03-06 09:50:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: V2V Target Upstream Version:
Embargoed:
Bug Depends On: 1666799    
Bug Blocks:    
Attachments:
Description Flags
failed-cleanup-osp-wrapper.log
none
failed_volume_osp.png none

Comment 2 Satoe Imaishi 2019-02-05 23:22:49 UTC
Backported to Hammer branch:

commit 0e878b781b1cfdf4b3cd5c35fbfb29578ad30679
Author: Adam Grare <agrare>
Date:   Fri Jan 18 14:37:51 2019 -0500

    Merge pull request #18372 from fdupont-redhat/v2v_fix_virtv2v_kill
    
    V2V - Collect virt-v2v PID from conversion host in kill_virtv2v
    
    (cherry picked from commit 526045d22faae29ad8d74e4625c9f09206d5f97e)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1672699

Comment 4 Yadnyawalk Tale 2019-02-11 11:44:54 UTC
Fixed, tested with RHV42 and CFME 5.10.1.0.20190206171834_d399434.
We are able to cancel ongoing migration from UI and it does stops migrations processes from host after that.


```
2019-02-11 06:33:08,594:DEBUG: Updated progress: 2.03 (virt-v2v-wrapper:903)
2019-02-11 06:33:18,605:DEBUG: Updated progress: 3.05 (virt-v2v-wrapper:903)
2019-02-11 06:33:28,619:DEBUG: Updated progress: 4.07 (virt-v2v-wrapper:903)
2019-02-11 06:33:43,637:DEBUG: Updated progress: 6.08 (virt-v2v-wrapper:903)
2019-02-11 06:33:53,649:DEBUG: Updated progress: 7.13 (virt-v2v-wrapper:903)
2019-02-11 06:34:08,670:DEBUG: Updated progress: 9.14 (virt-v2v-wrapper:903)
2019-02-11 06:34:13,675:DEBUG: Updated progress: 10.16 (virt-v2v-wrapper:903)


2019-02-11 06:34:23,689:DEBUG: Updated progress: 11.20 (virt-v2v-wrapper:903)
2019-02-11 06:34:33,701:DEBUG: Updated progress: 12.21 (virt-v2v-wrapper:903)
2019-02-11 06:34:33,702:DEBUG: Updated progress: 13.24 (virt-v2v-wrapper:903)
2019-02-11 06:34:38,709:INFO: virt-v2v terminated with return code -15 (virt-v2v-wrapper:1056)
2019-02-11 06:34:38,714:DEBUG: Cleanup phase (virt-v2v-wrapper:1368)
2019-02-11 06:34:38,978:INFO: Canceling transfer id=762d2aeb-c3d4-4c57-a760-004c03f17b43 for disk=0234ae0a-45dc-4e97-8595-81b801e30c74 (virt-v2v-wrapper:468)
2019-02-11 06:34:39,001:INFO: Removing disks: [] (virt-v2v-wrapper:479)
2019-02-11 06:34:39,055:INFO: Removing password files (virt-v2v-wrapper:1376)
2019-02-11 06:34:39,059:INFO: Finished (virt-v2v-wrapper:1398)
```

Cancelled on timestamp '2019-02-11 06:34:13,675', then it has stopped disk migration and cleaned stuff.

Comment 6 CFME Bot 2019-02-13 04:21:36 UTC
New commit detected on ManageIQ/manageiq/hammer:

https://github.com/ManageIQ/manageiq/commit/0e878b781b1cfdf4b3cd5c35fbfb29578ad30679
commit 0e878b781b1cfdf4b3cd5c35fbfb29578ad30679
Author:     Adam Grare <agrare>
AuthorDate: Fri Jan 18 14:37:51 2019 -0500
Commit:     Adam Grare <agrare>
CommitDate: Fri Jan 18 14:37:51 2019 -0500

    Merge pull request #18372 from fdupont-redhat/v2v_fix_virtv2v_kill

    V2V - Collect virt-v2v PID from conversion host in kill_virtv2v

    (cherry picked from commit 526045d22faae29ad8d74e4625c9f09206d5f97e)

    https://bugzilla.redhat.com/show_bug.cgi?id=1672699

 app/models/service_template_transformation_plan_task.rb | 5 +-
 spec/models/service_template_transformation_plan_task_spec.rb | 9 +-
 2 files changed, 4 insertions(+), 10 deletions(-)

Comment 7 Yadnyawalk Tale 2019-02-15 17:11:02 UTC
Created attachment 1535272 [details]
failed-cleanup-osp-wrapper.log

Not fixed for OSP, instance and port cleaned up but I can still see volumes are there which is not we wanted. Please check attached wrapper log and screenshot.

Checked with CFME 5.10.1.1.20190212171432_83eb777.
Used latest brew image of conversion appliance - https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=839959

Comment 8 Yadnyawalk Tale 2019-02-15 17:12:02 UTC
Created attachment 1535273 [details]
failed_volume_osp.png

Checked on OSP13 deployed on baremetal

Comment 9 Fabien Dupont 2019-02-15 20:17:05 UTC
Could you also provide the virt-v2v log, please ?

Comment 11 Tomáš Golembiovský 2019-02-16 22:39:11 UTC
The problems seems to be related to bug 1668049. Can you confirm you have the latest wrapper deployed?

Comment 12 Yadnyawalk Tale 2019-02-18 11:15:40 UTC
@Tomáš, I have used https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=839959 which is latest conversion appliance, found virt-v2v-1.38.2-12.28.lp.el7ev.x86_64 on that. (can you confirm if it is latest!?)

I can see BZ1668049 mentioned issue with same version of virt-v2v and that regex patch was fixing it. Current virt-v2v-1.38.2-12.28 do contains that regex patch but migration till failing, don't you think it is suspicious. Need thoughts?

Comment 13 Fabien Dupont 2019-02-18 11:23:39 UTC
Tomas was asking for the version of ovirt-ansible-v2v-conversion-host package. This fix is implemented in virt-v2v-wrapper.py, not virt-v2v.

Comment 14 Yadnyawalk Tale 2019-02-18 11:29:59 UTC
Thanks for correcting there @Fabien, rhosp-v2v-appliance-14.0-20190205.4 contains ovirt-ansible-v2v-conversion-host-1.9.1-2.el7ev.noarch.

Comment 15 Fabien Dupont 2019-02-18 22:32:30 UTC
Thanks. Seems that it's the latest build that includes the regex patch.
@Tomas, any idea ?

Comment 16 Tomáš Golembiovský 2019-02-19 14:36:10 UTC
After inspection I have identified two issues:

- wrapper collects volume IDs only after the conversion; of course that is too late if we need to clean those up. I will modify wrapper to collect the IDs as soon as possible. That alone should fix this bug. But there is another issue

- virt-v2v should clean the volumes itself. At the very least it has a code to do that. At first I assumed that code is buggy, but on a second look the problem may be in how we kill virt-v2v to abort the conversion. In virt-v2v log that is attached here I don't see any of the cleanup tasks normally done by virt-v2v, rather it seems virt-v2v terminated immediately. Was there some change in how we cancel the conversion in CFME?

Comment 17 Fabien Dupont 2019-02-19 15:17:12 UTC
We still kill it in two steps, first TERM signal, then KILL signal 30 seconds later. When we implemented that, Richard told us 30 seconds would be enough. We can extend that delay if needed.

Comment 19 Yadnyawalk Tale 2019-02-26 13:26:10 UTC
Fixed! Cancellation of migration cleaning volumes and instance on OSP side.
Checked with ovirt-ansible-v2v-conversion-host-1.9.1-3.el7ev.noarch and 
CFME 5.10.1.2.20190219165527_7a4a22b.

Expected image need to use for the patch - https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=849441
Thanks for all the help @Tomáš, @Fabien.

Comment 20 Yadnyawalk Tale 2019-02-27 04:19:34 UTC
Fixed as per Comment #19. Thanks @Dennis.

Comment 22 errata-xmlrpc 2019-03-06 09:50:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0453