Bug 1749347

Summary: [v2v] virt-v2v-wrapper Failed to start transient service unit
Product: Red Hat Enterprise Virtualization Manager Reporter: Ilanit Stein <istein>
Component: v2v-conversion-hostAssignee: Tomáš Golembiovský <tgolembi>
Status: CLOSED ERRATA QA Contact: Maayan Hadasi <mguetta>
Severity: medium Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: dagur, mguetta, mtessun, rdlugyhe, rjones, tgolembi
Target Milestone: ovirt-4.4.0   
Target Release: 4.4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rhv-4.4.0-29 Doc Type: Bug Fix
Doc Text:
Previously, systemd units from failed conversions were not removed from the host. These could cause collisions and prevent subsequent conversions from starting because the service name was already "in use." The current release fixes this issue. If the conversion fails, the units are explicitly removed so they cannot interfere with subsequent conversions.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-04 13:27:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
v2v-import-wrapper.log none

Description Ilanit Stein 2019-09-05 12:10:38 UTC
Description of problem:
v2v VM migration fail on virt-v2v-wrapper log error:

2019-09-05 07:00:46,246:INFO: systemd-run returned: Failed to start transient service unit: Unit run-6566.service already exists.
 (virt-v2v-wrapper:1447)
2019-09-05 07:00:46,247:INFO: Waiting for PID... (virt-v2v-wrapper:1454)
2019-09-05 07:00:46,298:INFO: Running with PID: 6567 (virt-v2v-wrapper:1462)
2019-09-05 07:00:56,310:INFO: have exception: ('Error while monitoring virt-v2v',) {} (virt-v2v-wrapper:1078)
2019-09-05 07:00:56,310:ERROR: Error while monitoring virt-v2v (virt-v2v-wrapper:1079)
Traceback (most recent call last):
  File "/usr/bin/virt-v2v-wrapper.py", line 1892, in wrapper
    with log_parser(not data['daemonize']) as parser:
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/bin/virt-v2v-wrapper.py", line 1726, in log_parser
    parser = OutputParser(duplicate)
  File "/usr/bin/virt-v2v-wrapper.py", line 1161, in __init__
    self._log = open(state.v2v_log, 'rbU')
IOError: [Errno 2] No such file or directory: '/var/log/vdsm/import/v2v-import-20190905T070045-6473.log'


Richard W.M. Jones:
==================
"Actually it looks as if we need to run ‘systemctl reset-failed [service]’
in the wrapper for units which fail (which is a bit rubbish to say the
least).

See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1255432"

Version-Release number of selected component (if applicable):
v2v-conversion-host-wrapper-1.14.2-1.el7ev.noarch


How reproducible:
So far it happened once,
to one VM out of 20 migrated, using ssh transport method


Additional info:
virt-v2v corresponding log was not created (actually the failure is on that this file is missing)

ls -ltr  /run/systemd/system/*.service 
[root@f02-h21-000-r620 import]# systemctl list-units | grep run-
show the same output of 28 run-XXXX.service

Comment 1 Ilanit Stein 2019-09-05 12:11:23 UTC
Created attachment 1611942 [details]
v2v-import-wrapper.log

Comment 2 Tomáš Golembiovský 2019-09-09 11:17:18 UTC
What seems more worrying to me is the fact that auto-generated names can lead to duplicates. Still, it looks like we can fix the behavior by using the --collect argument.

Comment 3 Tomáš Golembiovský 2019-09-09 11:24:20 UTC
But it requires EL8.

Comment 4 Richard W.M. Jones 2019-09-09 12:28:17 UTC
(In reply to Tomáš Golembiovský from comment #3)
> But it requires EL8.

That's annoying.  Would another possibility (for RHEL 7 only) be to generate
our own names, ie adding the --unit=... option?  We can ensure they don't
conflict.

Comment 5 Tomáš Golembiovský 2019-09-09 13:21:09 UTC
(In reply to Richard W.M. Jones from comment #4)
> (In reply to Tomáš Golembiovský from comment #3)
> > But it requires EL8.
> 
> That's annoying.  Would another possibility (for RHEL 7 only) be to generate
> our own names, ie adding the --unit=... option?  We can ensure they don't
> conflict.

It may be good idea to add --unit with uniqe name on both RHEL7/RHEL8. But that solves only a part of the problem. I think we still need to always call `reset-failed` on the failed units to avoid leaving garbage around.

Comment 6 Tomáš Golembiovský 2019-12-04 15:45:03 UTC
Fixed upstream in: https://github.com/oVirt/v2v-conversion-host/commit/d8c9b24dedefe36a43f7aaf30e9c5545f125ee02

Comment 8 Daniel Gur 2020-07-03 08:54:28 UTC
Tomas, As this bug not reproduced, can you provide specific validation instructions or we will close it as not reproduced?

Comment 9 Tomáš Golembiovský 2020-07-08 07:34:59 UTC
What do you mean by "not reproduced"? Which version did you use to reproduce the problem?

It has been fixed in 1.16.0-3.

Comment 10 Daniel Gur 2020-07-09 10:36:03 UTC
So as we talked in the mail. This issue happened once and we don't actually know how to reproduce or to validate it.
Waiting for your input Tomas as per your mail.

Comment 11 Tomáš Golembiovský 2020-07-09 14:12:27 UTC
I just realized that we don't have to reproduce the collision from the original report. We just have to check that there are no stale services after failed conversion. This turns the verification/reproduction from super difficult to super easy. So to verify:

1) start a conversion
2) wait for virt-v2v to start
3) kill the virt-v2v process to fail the conversion
4) from wrapper log find out name of the transient service, the line looks like this:

    2020-06-24 20:02:40,508:INFO: systemd-run returned: b'Running as unit: run-9574.service\n' (runners:135)

5) verify with systemctl status that no such service exists in the system:

   $ systemctl status run-9574.service
   Unit run-9574.service could not be found.


With the fix systemctl in step 5) should report there is no such unit. Without the fix systemctl will report information about the service and that the service failed.

Comment 12 Maayan Hadasi 2020-07-14 13:08:21 UTC
Verified as fixed.

I tested it on RHV 4.3 host - once with v2v-conversion-host-1.16.0.3 (includes fix) and once with v2v-conversion-host-1.15.0-1 (should not include fix)
And both tries have been passed - it means that I am getting the expected behavior that indicates as the bug was fixed for both versions

Verification steps were:
1. start a migration plan (by CFME UI)
2. Wait for virt-v2v to start
3. kill the virt-v2v process to fail the conversion (by 'Cancel Migration' button in CFME or by 'kill' command - both ways have been tried)
4. from wrapper log find out name of the transient service, the line looks like this:

    2020-06-24 20:02:40,508:INFO: systemd-run returned: b'Running as unit: run-9574.service\n' (runners:135)

5. verify with systemctl status that no such service exists in the conversion host:

   $ systemctl status run-9574.service
   Unit run-9574.service could not be found.

Note: the bug was mentioned as fixed in RHV 4.4 -> it was also verified with RHV 4.4 host, v2v-conversion-host-1.16.2-3, same result.
We cannot downgrade the v2v-conversion-host version in this case due to other bugs related to RHV 4.4

Comment 20 errata-xmlrpc 2020-08-04 13:27:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3246