Bug 1559750

Summary: ansible playbook doesn't correctly wait for local VM shutdown to complete
Product: [oVirt] ovirt-hosted-engine-setup Reporter: Simone Tiraboschi <stirabos>
Component: GeneralAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.2.14CC: bugs, nsednev, yzhao
Target Milestone: ovirt-4.2.2Keywords: Triaged
Target Release: ---Flags: rule-engine: ovirt-4.2+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-setup-2.2.15-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-05 09:56:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1458709    

Description Simone Tiraboschi 2018-03-23 08:21:25 UTC
Description of problem:
ansible playbook is shutting down the local VM via ansible virt module but, according to https://github.com/ansible/ansible/issues/27905 , this doesn't guarantee that the shutdown is effective before moving to the next tasks (including copying the disk to the shared storage).

The engine is down and we don't have any pending operation on the engine VM and indeed we never saw side effects but it will be better to explicitly wait for the shutdown before moving to the next task.

Version-Release number of selected component (if applicable):
2.2.14

How reproducible:
100%

Steps to Reproduce:
1. Deploy hosted-engine
2. Wait for "Shutdown local VM" message
3. Ensure that when "Undefine local VM" massage is reported the VM is really down.

Actual results:
We see "Undefine local VM" while the VM is still shutting down

Expected results:
It explicitly wait for the VM shutdown before moving to the next task

Additional info:
Up to now we never got any consequence of this but better to fix to ensure that the disk copy is consistent

Comment 1 Simone Tiraboschi 2018-03-26 12:23:55 UTC
*** Bug 1560419 has been marked as a duplicate of this bug. ***

Comment 2 Nikolai Sednev 2018-03-27 15:08:20 UTC
Not specific for FC storage type, but deployment failed on these components:
ovirt-hosted-engine-setup-2.2.14-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch

[ INFO  ] TASK [Copy local VM disk to shared storage]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["qemu-img", "convert", "-n", "-O", "raw", "/var/tmp/localvmNeQJ02/images/44b5a089-6ce7-4a23-b4d1-b9dab3fe2687/5ef59632-5a32-4285-95d9-54c59673e2f3", "/rhev/data-center/mnt/blockSD/deb44f68-13bb-4fde-a1b6-e9501fd40cc3/images/d36828ed-ce94-407c-ad77-9391bae1a7d6/d501f4b3-21e8-4cda-b998-c3f7a933a9ff"], "delta": "0:00:00.168324", "end": "2018-03-27 17:47:10.583911", "msg": "non-zero return code", "rc": 1, "start": "2018-03-27 17:47:10.415587", "stderr": "qemu-img: Could not open '/var/tmp/localvmNeQJ02/images/44b5a089-6ce7-4a23-b4d1-b9dab3fe2687/5ef59632-5a32-4285-95d9-54c59673e2f3': Failed to get shared \"write\" lock\nIs another process using the image?", "stderr_lines": ["qemu-img: Could not open '/var/tmp/localvmNeQJ02/images/44b5a089-6ce7-4a23-b4d1-b9dab3fe2687/5ef59632-5a32-4285-95d9-54c59673e2f3': Failed to get shared \"write\" lock", "Is another process using the image?"], "stdout": "", "stdout_lines": []}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180327174716.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.


Waiting for ovirt-hosted-engine-setup-2.2.15-1 to arrive to QA.

Comment 3 Nikolai Sednev 2018-03-27 16:39:26 UTC
I still see these during deployment over CLI:
[ INFO  ] TASK [Shutdown local VM]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Wait for local VM shutdown]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Undefine local VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Detect spmId]
.
.
.
[ INFO  ] Hosted Engine successfully deployed

VM had been properly shutdown and deployment succeeded on these components:
ovirt-hosted-engine-ha-2.2.9-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Moving to verified.

Comment 4 Sandro Bonazzola 2018-04-05 09:56:41 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.