Bug 1528813 - ovirt-hosted-engine-cleanup - does not kill local VM (in case of Ansible based hosted engine deployment)
Summary: ovirt-hosted-engine-cleanup - does not kill local VM (in case of Ansible base...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: Tools
Version: 2.2.1
Hardware: Unspecified
OS: Unspecified
high
low
Target Milestone: ovirt-4.2.2
: ---
Assignee: Yedidyah Bar David
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On: 1528816
Blocks: 1458709
TreeView+ depends on / blocked
 
Reported: 2017-12-24 11:13 UTC by Yedidyah Bar David
Modified: 2018-03-29 11:09 UTC (History)
4 users (show)

Fixed In Version: ovirt-hosted-engine-setup-2.2.13-1
Clone Of:
Environment:
Last Closed: 2018-03-29 11:09:43 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+
ylavi: blocker+


Attachments (Terms of Use)
sosreport (9.20 MB, application/x-xz)
2017-12-24 11:18 UTC, Yedidyah Bar David
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 88451 0 'None' MERGED cleanup: Kill left-behind HostedEngine processes 2020-10-28 17:36:44 UTC
oVirt gerrit 88473 0 'None' MERGED cleanup: Kill left-behind HostedEngine processes 2020-10-28 17:36:59 UTC

Description Yedidyah Bar David 2017-12-24 11:13:32 UTC
Description of problem:

hosted-engine deploy from cockpit failed.

Ran ovirt-hosted-engine-cleanup.

It correctly cleaned the shared storage, and said it also cleaned some other things, but:

1. Left the engine vm running. I manually killed the qemu process to stop it.

2. Left libvirtd down and failing to start.

Output of the script:

========================================================================
[root@lvc7host1 ~]# ovirt-hosted-engine-cleanup
 This will de-configure the host to run ovirt-hosted-engine-setup from scratch. 
Caution, this operation should be used with care.

Are you sure you want to proceed? [y/n]
y
  -=== Destroy hosted-engine VM ===-
You must run deploy first
  -=== Stop HA services ===-
  -=== Shutdown sanlock ===-
shutdown force 1 wait 0
shutdown done 0
  -=== Disconnecting the hosted-engine storage domain ===-
You must run deploy first
  -=== De-configure VDSM networks ===-
  -=== Stop other services ===-
  -=== De-configure external daemons ===-
  -=== Removing configuration files ===-
? /etc/init/libvirtd.conf already missing
- removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml
? /etc/ovirt-hosted-engine/answers.conf already missing
? /etc/ovirt-hosted-engine/hosted-engine.conf already missing
- removing /etc/vdsm/vdsm.conf
- removing /etc/pki/vdsm/certs/cacert.pem
- removing /etc/pki/vdsm/certs/vdsmcert.pem
- removing /etc/pki/vdsm/keys/vdsmkey.pem
- removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-key.pem
- removing /etc/pki/CA/cacert.pem
- removing /etc/pki/libvirt/clientcert.pem
- removing /etc/pki/libvirt/private/clientkey.pem
? /etc/pki/ovirt-vmconsole/*.pem already missing
- removing /var/cache/libvirt/qemu
? /var/run/ovirt-hosted-engine-ha/* already missing
========================================================================

Version-Release number of selected component (if applicable):

current master

How reproducible:

Not sure

Steps to Reproduce:
1. Partially(?)/Unsuccessfully deploy hosted-engine
2. Run cleanup
3. Deploy again

Actual results:

Fails

Expected results:
Succeeds

Additional info:

Comment 1 Yedidyah Bar David 2017-12-24 11:18:16 UTC
Created attachment 1371859 [details]
sosreport

Comment 2 Yedidyah Bar David 2017-12-24 11:49:34 UTC
(In reply to Yedidyah Bar David from comment #0)
> 2. Left libvirtd down and failing to start.

Opened bug 1528816 for this.

Comment 3 Nikolai Sednev 2018-03-07 08:56:28 UTC
For the reproduction, should this be verified on ansible or vintage?

Comment 4 Yedidyah Bar David 2018-03-07 10:49:18 UTC
(In reply to Nikolai Sednev from comment #3)
> For the reproduction, should this be verified on ansible or vintage?

ansible.

You are welcome to try also vintage, in my tests it already worked there. But now cleanup does not care which, it will always kill if exists.

Comment 5 Nikolai Sednev 2018-03-18 17:38:14 UTC
I see that on cleanly deployed and then cleaned and then redeployed environment these:https://bugzilla.redhat.com/show_bug.cgi?id=1557793.
Could you take a look on it?

Comment 6 Nikolai Sednev 2018-03-18 17:57:06 UTC
Moving this bug to verified as it worked just fine for me on these components:
ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

I've started NFS deployment of Node 0 on host and then interrupted it, while host had been added to VM, just in the middle of addition, then ran ovirt-hosted-engine-cleanup and it finished successfully:
alma03 ~]# ovirt-hosted-engine-cleanup
 This will de-configure the host to run ovirt-hosted-engine-setup from scratch. 
Caution, this operation should be used with care.

Are you sure you want to proceed? [y/n]
y
  -=== Destroy hosted-engine VM ===- 
You must run deploy first
  -=== Killing left-behind HostedEngine processes ===- 
/usr/sbin/ovirt-hosted-engine-cleanup: line 48: kill: (13758) - No such process
  -=== Stop HA services ===- 
  -=== Shutdown sanlock ===- 
shutdown force 1 wait 0
shutdown done -111
  -=== Disconnecting the hosted-engine storage domain ===- 
You must run deploy first
  -=== De-configure VDSM networks ===- 
  -=== Stop other services ===- 
  -=== De-configure external daemons ===- 
  -=== Removing configuration files ===- 
? /etc/init/libvirtd.conf already missing
? /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml already missing
? /etc/ovirt-hosted-engine/answers.conf already missing
? /etc/ovirt-hosted-engine/hosted-engine.conf already missing
- removing /etc/vdsm/vdsm.conf
- removing /etc/pki/vdsm/certs/cacert.pem
- removing /etc/pki/vdsm/certs/vdsmcert.pem
- removing /etc/pki/vdsm/keys/vdsmkey.pem
- removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-key.pem
- removing /etc/pki/CA/cacert.pem
- removing /etc/pki/libvirt/clientcert.pem
- removing /etc/pki/libvirt/private/clientkey.pem
? /etc/pki/ovirt-vmconsole/*.pem already missing
- removing /var/cache/libvirt/qemu
? /var/run/ovirt-hosted-engine-ha/* already missing

alma03 ~]# hosted-engine --check-deployed
The hosted engine has not been deployed

Comment 7 Yedidyah Bar David 2018-03-18 20:44:53 UTC
(In reply to Nikolai Sednev from comment #5)
> I see that on cleanly deployed and then cleaned and then redeployed
> environment these:https://bugzilla.redhat.com/show_bug.cgi?id=1557793.
> Could you take a look on it?

Might look later on, or Simone will.

It's possible that we are waiting on releasing various locks there. Not sure what we need to do, can discuss there.

(In reply to Nikolai Sednev from comment #6)
> Moving this bug to verified as it worked just fine for me on these
> components:
> ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
> ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch
> rhvm-appliance-4.2-20180202.0.el7.noarch
> Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64
> x86_64 x86_64 GNU/Linux
> Red Hat Enterprise Linux Server release 7.5 (Maipo)
> 
> I've started NFS deployment of Node 0 on host and then interrupted it, while
> host had been added to VM, just in the middle of addition, then ran
> ovirt-hosted-engine-cleanup and it finished successfully:
> alma03 ~]# ovirt-hosted-engine-cleanup
>  This will de-configure the host to run ovirt-hosted-engine-setup from
> scratch. 
> Caution, this operation should be used with care.
> 
> Are you sure you want to proceed? [y/n]
> y
>   -=== Destroy hosted-engine VM ===- 
> You must run deploy first
>   -=== Killing left-behind HostedEngine processes ===- 
> /usr/sbin/ovirt-hosted-engine-cleanup: line 48: kill: (13758) - No such
> process
>   -=== Stop HA services ===- 
>   -=== Shutdown sanlock ===- 
> shutdown force 1 wait 0
> shutdown done -111
>   -=== Disconnecting the hosted-engine storage domain ===- 
> You must run deploy first
>   -=== De-configure VDSM networks ===- 
>   -=== Stop other services ===- 
>   -=== De-configure external daemons ===- 
>   -=== Removing configuration files ===- 
> ? /etc/init/libvirtd.conf already missing
> ? /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml already missing
> ? /etc/ovirt-hosted-engine/answers.conf already missing
> ? /etc/ovirt-hosted-engine/hosted-engine.conf already missing
> - removing /etc/vdsm/vdsm.conf
> - removing /etc/pki/vdsm/certs/cacert.pem
> - removing /etc/pki/vdsm/certs/vdsmcert.pem
> - removing /etc/pki/vdsm/keys/vdsmkey.pem
> - removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem
> - removing /etc/pki/vdsm/libvirt-spice/server-cert.pem
> - removing /etc/pki/vdsm/libvirt-spice/server-key.pem
> - removing /etc/pki/CA/cacert.pem
> - removing /etc/pki/libvirt/clientcert.pem
> - removing /etc/pki/libvirt/private/clientkey.pem
> ? /etc/pki/ovirt-vmconsole/*.pem already missing
> - removing /var/cache/libvirt/qemu
> ? /var/run/ovirt-hosted-engine-ha/* already missing
> 
> alma03 ~]# hosted-engine --check-deployed
> The hosted engine has not been deployed

That's a step forward, but imo verification of a cleanup bug is:

1. Start setup/deploy as relevant
2. Kill it in the middle, make it fail etc.
3. cleanup
4. Start setup/deploy again and see that it finishes successfully.

If you do not do (4.), how can you know if the machine is "clean enough"? (Obviously, even (4.) is not a proof that everything was fully cleaned up).

Comment 8 Nikolai Sednev 2018-03-19 05:50:42 UTC
(In reply to Yedidyah Bar David from comment #7)
> (In reply to Nikolai Sednev from comment #5)
> > I see that on cleanly deployed and then cleaned and then redeployed
> > environment these:https://bugzilla.redhat.com/show_bug.cgi?id=1557793.
> > Could you take a look on it?
> 
> Might look later on, or Simone will.
> 
> It's possible that we are waiting on releasing various locks there. Not sure
> what we need to do, can discuss there.
> 
> (In reply to Nikolai Sednev from comment #6)
> > Moving this bug to verified as it worked just fine for me on these
> > components:
> > ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
> > ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch
> > rhvm-appliance-4.2-20180202.0.el7.noarch
> > Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64
> > x86_64 x86_64 GNU/Linux
> > Red Hat Enterprise Linux Server release 7.5 (Maipo)
> > 
> > I've started NFS deployment of Node 0 on host and then interrupted it, while
> > host had been added to VM, just in the middle of addition, then ran
> > ovirt-hosted-engine-cleanup and it finished successfully:
> > alma03 ~]# ovirt-hosted-engine-cleanup
> >  This will de-configure the host to run ovirt-hosted-engine-setup from
> > scratch. 
> > Caution, this operation should be used with care.
> > 
> > Are you sure you want to proceed? [y/n]
> > y
> >   -=== Destroy hosted-engine VM ===- 
> > You must run deploy first
> >   -=== Killing left-behind HostedEngine processes ===- 
> > /usr/sbin/ovirt-hosted-engine-cleanup: line 48: kill: (13758) - No such
> > process
> >   -=== Stop HA services ===- 
> >   -=== Shutdown sanlock ===- 
> > shutdown force 1 wait 0
> > shutdown done -111
> >   -=== Disconnecting the hosted-engine storage domain ===- 
> > You must run deploy first
> >   -=== De-configure VDSM networks ===- 
> >   -=== Stop other services ===- 
> >   -=== De-configure external daemons ===- 
> >   -=== Removing configuration files ===- 
> > ? /etc/init/libvirtd.conf already missing
> > ? /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml already missing
> > ? /etc/ovirt-hosted-engine/answers.conf already missing
> > ? /etc/ovirt-hosted-engine/hosted-engine.conf already missing
> > - removing /etc/vdsm/vdsm.conf
> > - removing /etc/pki/vdsm/certs/cacert.pem
> > - removing /etc/pki/vdsm/certs/vdsmcert.pem
> > - removing /etc/pki/vdsm/keys/vdsmkey.pem
> > - removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem
> > - removing /etc/pki/vdsm/libvirt-spice/server-cert.pem
> > - removing /etc/pki/vdsm/libvirt-spice/server-key.pem
> > - removing /etc/pki/CA/cacert.pem
> > - removing /etc/pki/libvirt/clientcert.pem
> > - removing /etc/pki/libvirt/private/clientkey.pem
> > ? /etc/pki/ovirt-vmconsole/*.pem already missing
> > - removing /var/cache/libvirt/qemu
> > ? /var/run/ovirt-hosted-engine-ha/* already missing
> > 
> > alma03 ~]# hosted-engine --check-deployed
> > The hosted engine has not been deployed
> 
> That's a step forward, but imo verification of a cleanup bug is:
> 
> 1. Start setup/deploy as relevant
> 2. Kill it in the middle, make it fail etc.
> 3. cleanup
> 4. Start setup/deploy again and see that it finishes successfully.
> 
> If you do not do (4.), how can you know if the machine is "clean enough"?
> (Obviously, even (4.) is not a proof that everything was fully cleaned up).

This is what I did exactly in https://bugzilla.redhat.com/show_bug.cgi?id=1528813#c6.

Comment 9 Yedidyah Bar David 2018-03-19 06:59:09 UTC
(In reply to Nikolai Sednev from comment #8)
> This is what I did exactly in
> https://bugzilla.redhat.com/show_bug.cgi?id=1528813#c6.

Good, thanks.

(You didn't write this, though. You only wrote "hosted-engine --check-deployed", which is not the same thing.)

Comment 10 Nikolai Sednev 2018-03-19 09:02:06 UTC
(In reply to Yedidyah Bar David from comment #9)
> (In reply to Nikolai Sednev from comment #8)
> > This is what I did exactly in
> > https://bugzilla.redhat.com/show_bug.cgi?id=1528813#c6.
> 
> Good, thanks.
> 
> (You didn't write this, though. You only wrote "hosted-engine
> --check-deployed", which is not the same thing.)

You're right, my bad, I've forgot to mention that I've redeployed on NFS after interruption several times with success.

Comment 11 Sandro Bonazzola 2018-03-29 11:09:43 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.