Bug 1343425 - hosted-engine --upgrade-appliance doesn't do full cleanup/rollback after failure
Summary: hosted-engine --upgrade-appliance doesn't do full cleanup/rollback after failure
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 2.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Simone Tiraboschi
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-07 10:17 UTC by Jiri Belka
Modified: 2022-02-25 08:35 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-14 12:51:42 UTC
oVirt Team: Integration
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-44926 0 None None None 2022-02-25 08:35:37 UTC

Description Jiri Belka 2016-06-07 10:17:18 UTC
Description of problem:

hosted-engine --upgrade-appliance failed with following:

~~~
[ INFO  ] Still waiting for new engine VM disk to be created. This may take several minutes...
[ ERROR ] Timed out while waiting for the disk to be created. Please check engine logs.
[ ERROR ] Timed out while waiting for the disk to be created. Please check engine logs.
[ ERROR ] Failed to execute stage 'Misc configuration': Failed creating the new engine VM disk
[ INFO  ] Stage: Clean up
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine upgrade failed: this system is not reliable, please check the issue, fix and try again
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160607113048-3yonnb.log
~~~

So I gave the command another try and it failed because of free space issue:

~~~
[ ERROR ] On the hosted-engine storage domain there is not enough available space to create a new disk for the new appliance: required 50GiB - available 10GiB. Please extend the hosted-engine storage domain.
[ ERROR ] Failed to execute stage 'Setup validation': Not enough free space on the hosted-engine storage domain
[ INFO  ] Stage: Clean up
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine upgrade failed: this system is not reliable, please check the issue, fix and try again
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160607115256-8wqit5.log
~~~

It seemed something occupied free space on HE shared storage:

~~~
# ( cd /rhev/data-center/mnt/10.34.63.199:_jbelka_jb-vhe1 ; find . -type f -size +100M | xargs ls -lh )
-rw-rw----. 1 vdsm kvm  50G Jun  7 11:49 ./f310489b-a6fe-4f8e-b685-c10d6be57abe/images/0a6e398c-1db7-4cce-b5c5-4121a665d64c/f7242275-7cb3-45bc-9d39-2b754016ebf3
-rw-rw----. 1 vdsm kvm 1.0G Jun  6 15:56 ./f310489b-a6fe-4f8e-b685-c10d6be57abe/images/70ece018-4e4f-4d0e-ae27-e0a38f38c195/1e0b19ec-6bed-4580-8ddc-82cd073c0ec7
-rw-rw----. 1 vdsm kvm  50G Jun  7  2016 ./f310489b-a6fe-4f8e-b685-c10d6be57abe/images/97c39ec3-0318-4dca-9ca5-0af112c80a63/0bc5708f-e522-4c5f-b972-12250178b4bd
-rw-rw----. 1 vdsm kvm  50G Jun  7 08:22 ./f310489b-a6fe-4f8e-b685-c10d6be57abe/images/ddd7b4a7-b20c-46b5-86bb-e1ede5a978cb/bd497503-ae8b-4543-b9d6-730b7e96c804
# date
Tue Jun  7 12:02:22 CEST 2016
~~~

I suppose at least the first one was made by last run and had stayed there without being removed after failure.

~~~
# grep 'add_vm_disk.*vol:' /var/log/ovirt-hosted-engine-setup/*
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160607113048-3yonnb.log:2016-06-07 11:34:51 DEBUG otopi.plugins.gr_he_upgradeappliance.engine.add_vm_disk add_vm_disk._create_disk:221 vol: f7242275-7cb3-45bc-9d39-2b754016ebf3
~~~


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.0.0-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. hosted-engine --upgrade-appliance (and somehow cause a failure in the end
   see first output for stage of setup)
2. run again hosted-engine and/or observe if on shared HE storage is located
   new image from last run
3.

Actual results:
it seems shared HE storage is occupied by an image created during hosted-engine --upgrade-appliance which failed on one of last steps

Expected results:
hosted-engine --upgrade-appliance should do full cleanup/rollbackup if there is a failure not to occupy free space

Additional info:

Comment 2 Simone Tiraboschi 2016-06-08 09:21:33 UTC
doing it automatically is not that simple, since at that point we will need to trigger the disk deletion from the engine.
So we eventually need for the engine VM to start again from the 3.6 disk and use the REST API from that to remove the new disk for 4.0 engine.

Maybe we can simply print a warning to the user to remember that the failed attempt created a new disk and he can remove it from the web admin interface.

Comment 3 Sandro Bonazzola 2016-09-01 07:34:34 UTC
Yaniv is comment #2 suggestion acceptable on your side?

> Maybe we can simply print a warning to the user to remember that the failed
> attempt created a new disk and he can remove it from the web admin interface.

Comment 4 Yaniv Lavi 2016-09-13 11:43:53 UTC
(In reply to Sandro Bonazzola from comment #3)
> Yaniv is comment #2 suggestion acceptable on your side?
> 
> > Maybe we can simply print a warning to the user to remember that the failed
> > attempt created a new disk and he can remove it from the web admin interface.

Would this increase the likelihood that the next attempts will fail due to size of the storage domain?

Comment 5 Simone Tiraboschi 2016-10-14 10:01:42 UTC
(In reply to Yaniv Dary from comment #4)
> Would this increase the likelihood that the next attempts will fail due to
> size of the storage domain?

Yes, the user have eventually to rollback, start the engine and delete past backups from there.


Note You need to log in before you can comment on or make changes to this bug.