Bug 1585739 - ovirt-hosted-engine-cleanup causing to "The host has been set in non_operational status" during redeployment.
Summary: ovirt-hosted-engine-cleanup causing to "The host has been set in non_operatio...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: Tools
Version: 2.2.16
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Simone Tiraboschi
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-04 15:00 UTC by Nikolai Sednev
Modified: 2018-06-05 14:01 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-05 14:01:52 UTC
oVirt Team: Integration
Embargoed:


Attachments (Terms of Use)
sosreport from alma04 (9.57 MB, application/x-xz)
2018-06-04 15:00 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1542849 0 unspecified CLOSED hosted-engine plugin collects only last log, alphabetically-sorted 2021-02-22 00:41:40 UTC

Internal Links: 1542849

Description Nikolai Sednev 2018-06-04 15:00:42 UTC
Created attachment 1447518 [details]
sosreport from alma04

Description of problem:
ovirt-hosted-engine-cleanup causing to "The host has been set in non_operational status" during redeployment.

[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Wait for the host to be up]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Check host status]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n"}
[ INFO  ] TASK [Fetch logs from the engine VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set destination directory path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Create destination directory]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Find the local appliance image]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set local_vm_disk_path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Give the vm time to flush dirty buffers]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Copy engine logs]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch logs from the engine VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set destination directory path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Create destination directory]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Find the local appliance image]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set local_vm_disk_path]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Give the vm time to flush dirty buffers]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Copy engine logs]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] ok: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180604174956.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180604173358-ypdpgt.log


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-2.2.13-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch
rhvm-appliance-4.2-20180601.0.el7.noarch
Red Hat Enterprise Linux Server release 7.5 (Maipo)
Linux 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15 18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
100%

Steps to Reproduce:
1. Run hosted-engine deploy and interrupt the deployment at:
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Add host]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Wait for the host to be up]
^C[ ERROR ] Failed to execute stage 'Closing up': SIG2
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
^C[ ERROR ] Failed to execute stage 'Clean up': SIG2
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180604173237.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180604172012-vb0flx.log

2. Run ovirt-hosted-engine-cleanup.
3. Run hosted-engine deploy again on host.

Actual results:
alma04 ~]# ovirt-hosted-engine-cleanup
 This will de-configure the host to run ovirt-hosted-engine-setup from scratch. 
Caution, this operation should be used with care.

Are you sure you want to proceed? [y/n]
y
  -=== Destroy hosted-engine VM ===- 
You must run deploy first
  -=== Killing left-behind HostedEngine processes ===- 
error: failed to get domain 'HostedEngine'
error: Domain not found: no domain with matching name 'HostedEngine'

  -=== Stop HA services ===- 
  -=== Shutdown sanlock ===- 
shutdown force 1 wait 0
shutdown done 0
  -=== Disconnecting the hosted-engine storage domain ===- 
You must run deploy first
  -=== De-configure VDSM networks ===- 
  -=== Stop other services ===- 
  -=== De-configure external daemons ===- 
  -=== Removing configuration files ===- 
? /etc/init/libvirtd.conf already missing
- removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml
? /etc/ovirt-hosted-engine/answers.conf already missing
? /etc/ovirt-hosted-engine/hosted-engine.conf already missing
- removing /etc/vdsm/vdsm.conf
- removing /etc/pki/vdsm/certs/cacert.pem
- removing /etc/pki/vdsm/certs/vdsmcert.pem
- removing /etc/pki/vdsm/keys/vdsmkey.pem
- removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-key.pem
- removing /etc/pki/CA/cacert.pem
- removing /etc/pki/libvirt/clientcert.pem
- removing /etc/pki/libvirt/private/clientkey.pem
? /etc/pki/ovirt-vmconsole/*.pem already missing
- removing /var/cache/libvirt/qemu
? /var/run/ovirt-hosted-engine-ha/* already missing
alma04 ~]# hosted-engine --check-deployed
The hosted engine has not been deployed
[root@alma04 ~]# virsh -r list --all
error: failed to connect to the hypervisor
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock-ro': No such file or directory

[root@alma04 ~]# hosted-engine --deploy
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n"}
[ INFO  ] TASK [Fetch logs from the engine VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set destination directory path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Create destination directory]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Find the local appliance image]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set local_vm_disk_path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Give the vm time to flush dirty buffers]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Copy engine logs]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch logs from the engine VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set destination directory path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Create destination directory]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Find the local appliance image]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set local_vm_disk_path]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Give the vm time to flush dirty buffers]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Copy engine logs]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] ok: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180604174956.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180604173358-ypdpgt.log

Expected results:
Redeployment succeeds.

Additional info:
Sosreport from host is attached.
Deployment did not reached to any type of storages.
Deployment was made for Node 0.

Comment 1 Simone Tiraboschi 2018-06-05 10:30:49 UTC
Something bad happened at host-deploy time and so "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n".
Unfortunately we need host-deploy logs to understand what really happened.

As for https://bugzilla.redhat.com/1578404 hosted-engine-setup is collecting them but, due to https://bugzilla.redhat.com/1542849
sos is collecting only last file in the directory.

1542849 has been fixed upstream with
https://github.com/sosreport/sos/commit/211475ce45719b7c330452e31906a1162f3729cb#diff-6af329b889d36ce6586bdbee6ce0fa07
but it's still not in sos-3.5-7

Comment 2 Simone Tiraboschi 2018-06-05 12:17:58 UTC
Unable to reproduce.

Comment 3 Nikolai Sednev 2018-06-05 13:56:34 UTC
I've tried to reproduce twice and couldn't catch it again.
I think that we may close it for now.

Comment 4 Simone Tiraboschi 2018-06-05 14:01:52 UTC
Thanks, closing it as WORKSFORME.
Please reopen it if reproducible.


Note You need to log in before you can comment on or make changes to this bug.