Bug 1538934 - [RFE] hosted-engine --vm-status should provide a way to detect and warn about failed deployments
Summary: [RFE] hosted-engine --vm-status should provide a way to detect and warn about...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 2.2.6
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-4.2.3
: ---
Assignee: Simone Tiraboschi
QA Contact: Yihui Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1458709
TreeView+ depends on / blocked
 
Reported: 2018-01-26 07:58 UTC by Yihui Zhao
Modified: 2018-05-10 06:32 UTC (History)
14 users (show)

Fixed In Version: ovirt-hosted-engine-setup-2.2.18-1.el7ev
Clone Of:
Environment:
Last Closed: 2018-05-10 06:32:31 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+
mavital: testing_plan_complete?
ylavi: planning_ack+
sbonazzo: devel_ack+
yzhao: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 90382 0 'None' MERGED Detect attempted deployments 2020-09-23 00:31:11 UTC
oVirt gerrit 90390 0 'None' MERGED Detect attempted deployments 2020-09-23 00:31:07 UTC

Description Yihui Zhao 2018-01-26 07:58:00 UTC
Description of problem: 
Cannot get the correct HE-VM status information with CLI or Cockpit.

When deployed the HE via cockpit based otopi failed, then deploy the HE via CLI with noansible deployment.

--------------------------------------------------------------------------------------------
[root@dell-per515-02 ~]# hosted-engine --deploy --noansible
[ INFO  ] Stage: Initializing
[ INFO  ] Generating a temporary VNC password.
[ INFO  ] Stage: Environment setup
          During customization use CTRL-D to abort.
          Continuing will configure this host for serving as hypervisor and create a VM where you have to install the engine afterwards.
          Are you sure you want to continue? (Yes, No)[Yes]: 
          It has been detected that this program is executed through an SSH connection without using screen.
          Continuing with the installation may lead to broken installation if the network connection fails.
          It is highly recommended to abort the installation and run it inside a screen session using command "screen".
          Do you want to continue anyway? (Yes, No)[No]: yes
[ INFO  ] Hardware supports virtualization
          Configuration files: []
          Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180126002300-z4f6rb.log
          Version: otopi-1.7.6 (otopi-1.7.6-1.el7ev)
[ INFO  ] Detecting available oVirt engine appliances
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup
[ ERROR ] The following VMs have been found: 63470a43-31ff-4330-977c-6716f38a1fc1
[ ERROR ] Failed to execute stage 'Environment setup': Cannot setup Hosted Engine with other VMs running
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180126002309.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180126002300-z4f6rb.log
-----------------------------------------------------------------------------------------------------

But check the HE-VM status, it seems that no vm status here.
----------------------------------------------------------------
[root@dell-per515-02 ~]# hosted-engine --vm-status
You must run deploy first
-----------------------------------------------------------------



Version-Release number of selected component (if applicable): 
cockpit-ws-157-1.el7.x86_64
cockpit-bridge-157-1.el7.x86_64
cockpit-storaged-157-1.el7.noarch
cockpit-dashboard-157-1.el7.x86_64
cockpit-157-1.el7.x86_64
cockpit-ovirt-dashboard-0.11.5-0.1.el7ev.noarch
cockpit-system-157-1.el7.noarch
ovirt-hosted-engine-setup-2.2.8-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch
rhvm-appliance-4.2-20171219.0.el7.noarch
rhvh-4.2.1.2-0.20180125.0+1

How reproducible: 
100% 


Steps to Reproduce: 
1. Clean install RHVH4.2.1 (rhvh-4.2.1.2-0.20180125.0+1) with ks
2. Deploy HE via cockpit based otopi
3. Redeploy HE via CLI with noansible deployment (hosted-engine --deploy --noansible)
4. Check the HE-VM status( hosted-engine --vm-status)

Actual results: 
1. After step2,  deploy failed with some issues

2. After step3, deploy failed due to other VMs running
[root@dell-per515-02 ~]# hosted-engine --deploy --noansible
[ INFO  ] Stage: Initializing
[ INFO  ] Generating a temporary VNC password.
[ INFO  ] Stage: Environment setup
          During customization use CTRL-D to abort.
          Continuing will configure this host for serving as hypervisor and create a VM where you have to install the engine afterwards.
          Are you sure you want to continue? (Yes, No)[Yes]: 
          It has been detected that this program is executed through an SSH connection without using screen.
          Continuing with the installation may lead to broken installation if the network connection fails.
          It is highly recommended to abort the installation and run it inside a screen session using command "screen".
          Do you want to continue anyway? (Yes, No)[No]: yes
[ INFO  ] Hardware supports virtualization
          Configuration files: []
          Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180126002300-z4f6rb.log
          Version: otopi-1.7.6 (otopi-1.7.6-1.el7ev)
[ INFO  ] Detecting available oVirt engine appliances
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup
[ ERROR ] The following VMs have been found: 63470a43-31ff-4330-977c-6716f38a1fc1
[ ERROR ] Failed to execute stage 'Environment setup': Cannot setup Hosted Engine with other VMs running
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180126002309.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180126002300-z4f6rb.log

3. After step4, check the HE-VM status , it seeems to get the incorrect HostedEngine VM status information.
[root@dell-per515-02 ~]# hosted-engine --vm-status
You must run deploy first

Expected results: 
Get the HE-VM status information successfully.

Additional info: 

On the cockpit , after step4, it also seems that it is the clean ENV and no VM running.

Comment 1 Ryan Barry 2018-01-26 10:02:50 UTC
This seems reasonable.

If the engine is not deployed, the status cannot be checked. Please re-open if this persists after a successful deployment.

Comment 2 Yihui Zhao 2018-01-26 10:15:51 UTC
Yes.  The issue is that the vm is running(the engine is not ok).

1. But for the user, he don't know the ENV is clean or vm running from the cockpit . 

2. If deploy failed first time, user re-deploy the HE with noansible deployment, it will raise the error "Cannot HostedEngine setup with running VM"

So, how to know the deployment status from the cockpit or CLI.

Comment 3 Ryan Barry 2018-01-26 10:19:26 UTC
Perhaps we need another status for --vm-status to report that there was a failed deployment.

Comment 4 Yaniv Lavi 2018-02-14 13:32:35 UTC
This doesn't make sense to me, please open a new RFE on the use case, not the solution and we will consider how to best address it.

Comment 5 Yihui Zhao 2018-02-27 02:21:17 UTC
(In reply to Yaniv Lavi from comment #4)
> This doesn't make sense to me, please open a new RFE on the use case, not
> the solution and we will consider how to best address it.

What about your idea to open a new RFE , I confused that

Comment 6 Ryan Barry 2018-02-27 02:27:42 UTC
The use case here is very clear.

Attempt to deploy over ansible. A VM is cleared. Deployment fails for some reason.

The system is now in an inconsistent state. --vm-status shows that it is clean. Trying to deploy HE fails because it is not clean.

--vm-status would, ideally, check whether a VM for Node Zero is running and return some other result if it's present but ha-agent does not think it's deployed.

Without this, the UX in cockpit doesn't let users know until after a failure.

Yes, users should already know to clean up a failed deployment,but that's true of many bugs/RFEa...

Comment 7 Yedidyah Bar David 2018-03-01 09:49:31 UTC
Do we want here only a single true/false flag? Would it be enough if it output 'It seems like a previous attempt to deploy hosted-engine failed. Please reinstall the OS before trying again'?

IMO current behavior is reasonable. 'hosted-engine --vm-status' is not designed to analyze this state, and doing a really good job (checking what's the status, what's good, what's bad, what failed, how to fix, etc) is a very big project.

If you/we want something in-between above two, please state what exactly. I do not think we want to repeat in '--vm-status' all the checks that '--deploy' does, and remember that the code in '--deploy --noansible' is going to be removed in 4.3, if all goes well.

Also, 'hosted-engine --deploy', in this state, fails very quickly after the start, before doing much interaction from the user. So does not waste too much time/effort.

Comment 8 Ryan Barry 2018-03-01 11:51:17 UTC
In my opinion, it would be enough to output that, yes.

We don't really need it to know exactly what's good and what's bad, just "a previous attempt failed, please clean/redeploy before trying again". We an rely on `hosted-engine --cleanup` to handle the edge cases.

Comment 9 Ying Cui 2018-03-14 09:30:06 UTC
The solution is under discussion, we will provide qa_ack if the fix in UI only or move it to default QA contact to ack. Thanks.

Comment 10 Yihui Zhao 2018-04-19 07:32:16 UTC
Tested with ovirt-hosted-engine-setup-2.2.18-1.el7ev

If fails during the deployment, use 'hosted-engine --vm-status' to check the vm status, give the hint like here:


#hosted-engine --vm-status
It seems like a previous attempt to deploy hosted-engine failed or it's still in progress. Please clean it up before trying again


So, moving to verified.

Comment 11 Sandro Bonazzola 2018-05-10 06:32:31 UTC
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.