Description of problem: hosted-engine deploy from cockpit failed. Ran ovirt-hosted-engine-cleanup. It correctly cleaned the shared storage, and said it also cleaned some other things, but: 1. Left the engine vm running. I manually killed the qemu process to stop it. 2. Left libvirtd down and failing to start. Output of the script: ======================================================================== [root@lvc7host1 ~]# ovirt-hosted-engine-cleanup This will de-configure the host to run ovirt-hosted-engine-setup from scratch. Caution, this operation should be used with care. Are you sure you want to proceed? [y/n] y -=== Destroy hosted-engine VM ===- You must run deploy first -=== Stop HA services ===- -=== Shutdown sanlock ===- shutdown force 1 wait 0 shutdown done 0 -=== Disconnecting the hosted-engine storage domain ===- You must run deploy first -=== De-configure VDSM networks ===- -=== Stop other services ===- -=== De-configure external daemons ===- -=== Removing configuration files ===- ? /etc/init/libvirtd.conf already missing - removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml ? /etc/ovirt-hosted-engine/answers.conf already missing ? /etc/ovirt-hosted-engine/hosted-engine.conf already missing - removing /etc/vdsm/vdsm.conf - removing /etc/pki/vdsm/certs/cacert.pem - removing /etc/pki/vdsm/certs/vdsmcert.pem - removing /etc/pki/vdsm/keys/vdsmkey.pem - removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-key.pem - removing /etc/pki/CA/cacert.pem - removing /etc/pki/libvirt/clientcert.pem - removing /etc/pki/libvirt/private/clientkey.pem ? /etc/pki/ovirt-vmconsole/*.pem already missing - removing /var/cache/libvirt/qemu ? /var/run/ovirt-hosted-engine-ha/* already missing ======================================================================== Version-Release number of selected component (if applicable): current master How reproducible: Not sure Steps to Reproduce: 1. Partially(?)/Unsuccessfully deploy hosted-engine 2. Run cleanup 3. Deploy again Actual results: Fails Expected results: Succeeds Additional info:
Created attachment 1371859 [details] sosreport
(In reply to Yedidyah Bar David from comment #0) > 2. Left libvirtd down and failing to start. Opened bug 1528816 for this.
For the reproduction, should this be verified on ansible or vintage?
(In reply to Nikolai Sednev from comment #3) > For the reproduction, should this be verified on ansible or vintage? ansible. You are welcome to try also vintage, in my tests it already worked there. But now cleanup does not care which, it will always kill if exists.
I see that on cleanly deployed and then cleaned and then redeployed environment these:https://bugzilla.redhat.com/show_bug.cgi?id=1557793. Could you take a look on it?
Moving this bug to verified as it worked just fine for me on these components: ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch rhvm-appliance-4.2-20180202.0.el7.noarch Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) I've started NFS deployment of Node 0 on host and then interrupted it, while host had been added to VM, just in the middle of addition, then ran ovirt-hosted-engine-cleanup and it finished successfully: alma03 ~]# ovirt-hosted-engine-cleanup This will de-configure the host to run ovirt-hosted-engine-setup from scratch. Caution, this operation should be used with care. Are you sure you want to proceed? [y/n] y -=== Destroy hosted-engine VM ===- You must run deploy first -=== Killing left-behind HostedEngine processes ===- /usr/sbin/ovirt-hosted-engine-cleanup: line 48: kill: (13758) - No such process -=== Stop HA services ===- -=== Shutdown sanlock ===- shutdown force 1 wait 0 shutdown done -111 -=== Disconnecting the hosted-engine storage domain ===- You must run deploy first -=== De-configure VDSM networks ===- -=== Stop other services ===- -=== De-configure external daemons ===- -=== Removing configuration files ===- ? /etc/init/libvirtd.conf already missing ? /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml already missing ? /etc/ovirt-hosted-engine/answers.conf already missing ? /etc/ovirt-hosted-engine/hosted-engine.conf already missing - removing /etc/vdsm/vdsm.conf - removing /etc/pki/vdsm/certs/cacert.pem - removing /etc/pki/vdsm/certs/vdsmcert.pem - removing /etc/pki/vdsm/keys/vdsmkey.pem - removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-key.pem - removing /etc/pki/CA/cacert.pem - removing /etc/pki/libvirt/clientcert.pem - removing /etc/pki/libvirt/private/clientkey.pem ? /etc/pki/ovirt-vmconsole/*.pem already missing - removing /var/cache/libvirt/qemu ? /var/run/ovirt-hosted-engine-ha/* already missing alma03 ~]# hosted-engine --check-deployed The hosted engine has not been deployed
(In reply to Nikolai Sednev from comment #5) > I see that on cleanly deployed and then cleaned and then redeployed > environment these:https://bugzilla.redhat.com/show_bug.cgi?id=1557793. > Could you take a look on it? Might look later on, or Simone will. It's possible that we are waiting on releasing various locks there. Not sure what we need to do, can discuss there. (In reply to Nikolai Sednev from comment #6) > Moving this bug to verified as it worked just fine for me on these > components: > ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch > ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch > rhvm-appliance-4.2-20180202.0.el7.noarch > Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64 > x86_64 x86_64 GNU/Linux > Red Hat Enterprise Linux Server release 7.5 (Maipo) > > I've started NFS deployment of Node 0 on host and then interrupted it, while > host had been added to VM, just in the middle of addition, then ran > ovirt-hosted-engine-cleanup and it finished successfully: > alma03 ~]# ovirt-hosted-engine-cleanup > This will de-configure the host to run ovirt-hosted-engine-setup from > scratch. > Caution, this operation should be used with care. > > Are you sure you want to proceed? [y/n] > y > -=== Destroy hosted-engine VM ===- > You must run deploy first > -=== Killing left-behind HostedEngine processes ===- > /usr/sbin/ovirt-hosted-engine-cleanup: line 48: kill: (13758) - No such > process > -=== Stop HA services ===- > -=== Shutdown sanlock ===- > shutdown force 1 wait 0 > shutdown done -111 > -=== Disconnecting the hosted-engine storage domain ===- > You must run deploy first > -=== De-configure VDSM networks ===- > -=== Stop other services ===- > -=== De-configure external daemons ===- > -=== Removing configuration files ===- > ? /etc/init/libvirtd.conf already missing > ? /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml already missing > ? /etc/ovirt-hosted-engine/answers.conf already missing > ? /etc/ovirt-hosted-engine/hosted-engine.conf already missing > - removing /etc/vdsm/vdsm.conf > - removing /etc/pki/vdsm/certs/cacert.pem > - removing /etc/pki/vdsm/certs/vdsmcert.pem > - removing /etc/pki/vdsm/keys/vdsmkey.pem > - removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem > - removing /etc/pki/vdsm/libvirt-spice/server-cert.pem > - removing /etc/pki/vdsm/libvirt-spice/server-key.pem > - removing /etc/pki/CA/cacert.pem > - removing /etc/pki/libvirt/clientcert.pem > - removing /etc/pki/libvirt/private/clientkey.pem > ? /etc/pki/ovirt-vmconsole/*.pem already missing > - removing /var/cache/libvirt/qemu > ? /var/run/ovirt-hosted-engine-ha/* already missing > > alma03 ~]# hosted-engine --check-deployed > The hosted engine has not been deployed That's a step forward, but imo verification of a cleanup bug is: 1. Start setup/deploy as relevant 2. Kill it in the middle, make it fail etc. 3. cleanup 4. Start setup/deploy again and see that it finishes successfully. If you do not do (4.), how can you know if the machine is "clean enough"? (Obviously, even (4.) is not a proof that everything was fully cleaned up).
(In reply to Yedidyah Bar David from comment #7) > (In reply to Nikolai Sednev from comment #5) > > I see that on cleanly deployed and then cleaned and then redeployed > > environment these:https://bugzilla.redhat.com/show_bug.cgi?id=1557793. > > Could you take a look on it? > > Might look later on, or Simone will. > > It's possible that we are waiting on releasing various locks there. Not sure > what we need to do, can discuss there. > > (In reply to Nikolai Sednev from comment #6) > > Moving this bug to verified as it worked just fine for me on these > > components: > > ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch > > ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch > > rhvm-appliance-4.2-20180202.0.el7.noarch > > Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64 > > x86_64 x86_64 GNU/Linux > > Red Hat Enterprise Linux Server release 7.5 (Maipo) > > > > I've started NFS deployment of Node 0 on host and then interrupted it, while > > host had been added to VM, just in the middle of addition, then ran > > ovirt-hosted-engine-cleanup and it finished successfully: > > alma03 ~]# ovirt-hosted-engine-cleanup > > This will de-configure the host to run ovirt-hosted-engine-setup from > > scratch. > > Caution, this operation should be used with care. > > > > Are you sure you want to proceed? [y/n] > > y > > -=== Destroy hosted-engine VM ===- > > You must run deploy first > > -=== Killing left-behind HostedEngine processes ===- > > /usr/sbin/ovirt-hosted-engine-cleanup: line 48: kill: (13758) - No such > > process > > -=== Stop HA services ===- > > -=== Shutdown sanlock ===- > > shutdown force 1 wait 0 > > shutdown done -111 > > -=== Disconnecting the hosted-engine storage domain ===- > > You must run deploy first > > -=== De-configure VDSM networks ===- > > -=== Stop other services ===- > > -=== De-configure external daemons ===- > > -=== Removing configuration files ===- > > ? /etc/init/libvirtd.conf already missing > > ? /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml already missing > > ? /etc/ovirt-hosted-engine/answers.conf already missing > > ? /etc/ovirt-hosted-engine/hosted-engine.conf already missing > > - removing /etc/vdsm/vdsm.conf > > - removing /etc/pki/vdsm/certs/cacert.pem > > - removing /etc/pki/vdsm/certs/vdsmcert.pem > > - removing /etc/pki/vdsm/keys/vdsmkey.pem > > - removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem > > - removing /etc/pki/vdsm/libvirt-spice/server-cert.pem > > - removing /etc/pki/vdsm/libvirt-spice/server-key.pem > > - removing /etc/pki/CA/cacert.pem > > - removing /etc/pki/libvirt/clientcert.pem > > - removing /etc/pki/libvirt/private/clientkey.pem > > ? /etc/pki/ovirt-vmconsole/*.pem already missing > > - removing /var/cache/libvirt/qemu > > ? /var/run/ovirt-hosted-engine-ha/* already missing > > > > alma03 ~]# hosted-engine --check-deployed > > The hosted engine has not been deployed > > That's a step forward, but imo verification of a cleanup bug is: > > 1. Start setup/deploy as relevant > 2. Kill it in the middle, make it fail etc. > 3. cleanup > 4. Start setup/deploy again and see that it finishes successfully. > > If you do not do (4.), how can you know if the machine is "clean enough"? > (Obviously, even (4.) is not a proof that everything was fully cleaned up). This is what I did exactly in https://bugzilla.redhat.com/show_bug.cgi?id=1528813#c6.
(In reply to Nikolai Sednev from comment #8) > This is what I did exactly in > https://bugzilla.redhat.com/show_bug.cgi?id=1528813#c6. Good, thanks. (You didn't write this, though. You only wrote "hosted-engine --check-deployed", which is not the same thing.)
(In reply to Yedidyah Bar David from comment #9) > (In reply to Nikolai Sednev from comment #8) > > This is what I did exactly in > > https://bugzilla.redhat.com/show_bug.cgi?id=1528813#c6. > > Good, thanks. > > (You didn't write this, though. You only wrote "hosted-engine > --check-deployed", which is not the same thing.) You're right, my bad, I've forgot to mention that I've redeployed on NFS after interruption several times with success.
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.