Description of problem: v2v VM migration fail on virt-v2v-wrapper log error: 2019-09-05 07:00:46,246:INFO: systemd-run returned: Failed to start transient service unit: Unit run-6566.service already exists. (virt-v2v-wrapper:1447) 2019-09-05 07:00:46,247:INFO: Waiting for PID... (virt-v2v-wrapper:1454) 2019-09-05 07:00:46,298:INFO: Running with PID: 6567 (virt-v2v-wrapper:1462) 2019-09-05 07:00:56,310:INFO: have exception: ('Error while monitoring virt-v2v',) {} (virt-v2v-wrapper:1078) 2019-09-05 07:00:56,310:ERROR: Error while monitoring virt-v2v (virt-v2v-wrapper:1079) Traceback (most recent call last): File "/usr/bin/virt-v2v-wrapper.py", line 1892, in wrapper with log_parser(not data['daemonize']) as parser: File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/bin/virt-v2v-wrapper.py", line 1726, in log_parser parser = OutputParser(duplicate) File "/usr/bin/virt-v2v-wrapper.py", line 1161, in __init__ self._log = open(state.v2v_log, 'rbU') IOError: [Errno 2] No such file or directory: '/var/log/vdsm/import/v2v-import-20190905T070045-6473.log' Richard W.M. Jones: ================== "Actually it looks as if we need to run ‘systemctl reset-failed [service]’ in the wrapper for units which fail (which is a bit rubbish to say the least). See also: https://bugzilla.redhat.com/show_bug.cgi?id=1255432" Version-Release number of selected component (if applicable): v2v-conversion-host-wrapper-1.14.2-1.el7ev.noarch How reproducible: So far it happened once, to one VM out of 20 migrated, using ssh transport method Additional info: virt-v2v corresponding log was not created (actually the failure is on that this file is missing) ls -ltr /run/systemd/system/*.service [root@f02-h21-000-r620 import]# systemctl list-units | grep run- show the same output of 28 run-XXXX.service
Created attachment 1611942 [details] v2v-import-wrapper.log
What seems more worrying to me is the fact that auto-generated names can lead to duplicates. Still, it looks like we can fix the behavior by using the --collect argument.
But it requires EL8.
(In reply to Tomáš Golembiovský from comment #3) > But it requires EL8. That's annoying. Would another possibility (for RHEL 7 only) be to generate our own names, ie adding the --unit=... option? We can ensure they don't conflict.
(In reply to Richard W.M. Jones from comment #4) > (In reply to Tomáš Golembiovský from comment #3) > > But it requires EL8. > > That's annoying. Would another possibility (for RHEL 7 only) be to generate > our own names, ie adding the --unit=... option? We can ensure they don't > conflict. It may be good idea to add --unit with uniqe name on both RHEL7/RHEL8. But that solves only a part of the problem. I think we still need to always call `reset-failed` on the failed units to avoid leaving garbage around.
Fixed upstream in: https://github.com/oVirt/v2v-conversion-host/commit/d8c9b24dedefe36a43f7aaf30e9c5545f125ee02
Tomas, As this bug not reproduced, can you provide specific validation instructions or we will close it as not reproduced?
What do you mean by "not reproduced"? Which version did you use to reproduce the problem? It has been fixed in 1.16.0-3.
So as we talked in the mail. This issue happened once and we don't actually know how to reproduce or to validate it. Waiting for your input Tomas as per your mail.
I just realized that we don't have to reproduce the collision from the original report. We just have to check that there are no stale services after failed conversion. This turns the verification/reproduction from super difficult to super easy. So to verify: 1) start a conversion 2) wait for virt-v2v to start 3) kill the virt-v2v process to fail the conversion 4) from wrapper log find out name of the transient service, the line looks like this: 2020-06-24 20:02:40,508:INFO: systemd-run returned: b'Running as unit: run-9574.service\n' (runners:135) 5) verify with systemctl status that no such service exists in the system: $ systemctl status run-9574.service Unit run-9574.service could not be found. With the fix systemctl in step 5) should report there is no such unit. Without the fix systemctl will report information about the service and that the service failed.
Verified as fixed. I tested it on RHV 4.3 host - once with v2v-conversion-host-1.16.0.3 (includes fix) and once with v2v-conversion-host-1.15.0-1 (should not include fix) And both tries have been passed - it means that I am getting the expected behavior that indicates as the bug was fixed for both versions Verification steps were: 1. start a migration plan (by CFME UI) 2. Wait for virt-v2v to start 3. kill the virt-v2v process to fail the conversion (by 'Cancel Migration' button in CFME or by 'kill' command - both ways have been tried) 4. from wrapper log find out name of the transient service, the line looks like this: 2020-06-24 20:02:40,508:INFO: systemd-run returned: b'Running as unit: run-9574.service\n' (runners:135) 5. verify with systemctl status that no such service exists in the conversion host: $ systemctl status run-9574.service Unit run-9574.service could not be found. Note: the bug was mentioned as fixed in RHV 4.4 -> it was also verified with RHV 4.4 host, v2v-conversion-host-1.16.2-3, same result. We cannot downgrade the v2v-conversion-host version in this case due to other bugs related to RHV 4.4
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3246