+++ This bug is an upstream to downstream clone. The original bug is: +++ +++ bug 1403903 +++ ====================================================================== Description of problem: Engine setup failed on the appliance. Followed steps from https://bugzilla.redhat.com/show_bug.cgi?id=1399053 in order to verify it, but failed with: [ ERROR ] Engine setup failed on the appliance [ ERROR ] Failed to execute stage 'Closing up': Engine setup failed on the appliance Please check its log on the appliance. [ ERROR ] Hosted Engine upgrade failed: this system is not reliable, you can use --rollback-upgrade option to recover the engine VM disk from a backup Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20161212162707-5ujgti.log Version-Release number of selected component (if applicable): Host: rhevm-appliance-20161130.0-1.el7ev.noarch vdsm-4.18.18-4.git198e48d.el7ev.x86_64 ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-imageio-daemon-0.4.0-0.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64 ovirt-hosted-engine-ha-2.0.6-1.el7ev.noarch ovirt-setup-lib-1.0.2-1.el7ev.noarch ovirt-imageio-common-0.3.0-0.el7ev.noarch libvirt-client-2.0.0-10.el7_3.2.x86_64 ovirt-vmconsole-1.0.4-1.el7ev.noarch mom-0.5.8-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.4.1-2.el7ev.noarch ovirt-host-deploy-1.5.3-1.el7ev.noarch Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016 Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) How reproducible: 100% Steps to Reproduce: 1.Get 3.4.0.6 HE deployed on host with two NFS data storage domains and over NFS. 2.Make backup for your engine, copy it to your host to none root directory, while your host set to maintenance. 3.Install rhevm-appliance-20161130.0-1.el7ev.noarch on your host and run hosted-engine --upgrade-appliance Actual results: Upgrade failed. Expected results: Upgrade should pass. Additional info: See the attached sosreport from host and ovirt-hosted-engine-setup-20161212162707-5ujgti.log (Originally by Nikolai Sednev)
Created attachment 1230829 [details] ovirt-hosted-engine-setup-20161212162707-5ujgti.log (Originally by Nikolai Sednev)
Adding sosreport from host here https://drive.google.com/a/redhat.com/file/d/0B85BEaDBcF88NDByaFhiZ3NwMW8/view?usp=sharing (Originally by Nikolai Sednev)
1. What's the difference between this bz and bug 1399053? 2. Which upgrade documentation are you following? (Originally by Doron Fediuck)
(In reply to Doron Fediuck from comment #3) > 1. What's the difference between this bz and bug 1399053? > 2. Which upgrade documentation are you following? 1.I was trying to verify the 1399053, but as I've failed, so opened this bug. 2.I'm following the same steps for engine's db backup, which worked before, e.g. making engine db's backup, copying those to host on which I'm trying to make an upgrade, then during upgrade of the engine, providing the path to backup files on host. (Originally by Nikolai Sednev)
(In reply to Nikolai Sednev from comment #4) > (In reply to Doron Fediuck from comment #3) > > 1. What's the difference between this bz and bug 1399053? > > 2. Which upgrade documentation are you following? > > 1.I was trying to verify the 1399053, but as I've failed, so opened this bug. This is indeed a different issue. The root cause here is: 2016-12-12 16:27:07 DEBUG otopi.plugins.otopi.packagers.dnfpackager dnfpackager._boot:163 Cannot initialize minidnf Traceback (most recent call last): File "/usr/share/otopi/plugins/otopi/packagers/dnfpackager.py", line 150, in _boot constants.PackEnv.DNF_DISABLED_PLUGINS File "/usr/share/otopi/plugins/otopi/packagers/dnfpackager.py", line 60, in _getMiniDNF from otopi import minidnf File "/usr/lib/python2.7/site-packages/otopi/minidnf.py", line 16, in <module> import dnf ImportError: No module named dnf > 2.I'm following the same steps for engine's db backup, which worked before, > e.g. making engine db's backup, copying those to host on which I'm trying to > make an upgrade, then during upgrade of the engine, providing the path to > backup files on host. We're now ensuring upgrade flows use the supported documentation, since there are multiple environments (el6/el7, HE/non-HE, 3.5/3.6). So unless this is a standard backup and restore (to the same version) please use only the relevant upgrade flow. (Originally by Doron Fediuck)
(In reply to Doron Fediuck from comment #5) > (In reply to Nikolai Sednev from comment #4) > > (In reply to Doron Fediuck from comment #3) > > > 1. What's the difference between this bz and bug 1399053? > > > 2. Which upgrade documentation are you following? > > > > 1.I was trying to verify the 1399053, but as I've failed, so opened this bug. > > This is indeed a different issue. The root cause here is: > 2016-12-12 16:27:07 DEBUG otopi.plugins.otopi.packagers.dnfpackager > dnfpackager._boot:163 Cannot initialize minidnf > Traceback (most recent call last): > File "/usr/share/otopi/plugins/otopi/packagers/dnfpackager.py", line 150, > in _boot > constants.PackEnv.DNF_DISABLED_PLUGINS > File "/usr/share/otopi/plugins/otopi/packagers/dnfpackager.py", line 60, > in _getMiniDNF > from otopi import minidnf > File "/usr/lib/python2.7/site-packages/otopi/minidnf.py", line 16, in > <module> > import dnf > ImportError: No module named dnf > > > 2.I'm following the same steps for engine's db backup, which worked before, > > e.g. making engine db's backup, copying those to host on which I'm trying to > > make an upgrade, then during upgrade of the engine, providing the path to > > backup files on host. > We're now ensuring upgrade flows use the supported documentation, since > there are multiple environments (el6/el7, HE/non-HE, 3.5/3.6). So unless > this is a standard backup and restore (to the same version) please use only > the relevant upgrade flow. 2.In this case I've tried backing up and restoring the same db on the same appliance, just to see if /root issue was fixed, before getting a bit far and make the whole upgrade flow. (Originally by Nikolai Sednev)
(In reply to Doron Fediuck from comment #5) > This is indeed a different issue. The root cause here is: > 2016-12-12 16:27:07 DEBUG otopi.plugins.otopi.packagers.dnfpackager > dnfpackager._boot:163 Cannot initialize minidnf > Traceback (most recent call last): > File "/usr/share/otopi/plugins/otopi/packagers/dnfpackager.py", line 150, > in _boot > constants.PackEnv.DNF_DISABLED_PLUGINS > File "/usr/share/otopi/plugins/otopi/packagers/dnfpackager.py", line 60, > in _getMiniDNF > from otopi import minidnf > File "/usr/lib/python2.7/site-packages/otopi/minidnf.py", line 16, in > <module> > import dnf > ImportError: No module named dnf This is not critical. In RHEL and CentOS there's no dnf and yum is used as fallback. (Originally by Sandro Bonazzola)
As per instructions provided on the failure: RuntimeError: Engine setup failed on the appliance Please check its log on the appliance. Can you please attach engine setup logs from within the appliance? (Originally by Sandro Bonazzola)
Created attachment 1231169 [details] ovirt-engine-setup-20161212190249-4t6x44.log (Originally by Nikolai Sednev)
The real issue is here: 2016-12-12 19:02:49 DEBUG otopi.ovirt_engine_setup.engine_common.database database.getCredentials:164 dbenv: {'OVESETUP_DWH_DB/database': 'ovirt_engine_history', 'OVESETUP_DWH_DB/host': 'localhost', 'OVESETUP_DWH_DB/port': 5432, 'OVESETUP_DWH_DB/securedHostValidation': False, 'OVESETUP_DWH_DB/secured': False, 'OVESETUP_DWH_DB/password': '1', 'OVESETUP_DWH_DB/user': 'ovirt_engine_history'} 2016-12-12 19:02:49 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:177 Database: 'None', Statement: ' select count(*) as count from pg_catalog.pg_tables where schemaname = 'public'; ', args: {} 2016-12-12 19:02:49 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:182 Creating own connection 2016-12-12 19:02:49 DEBUG otopi.ovirt_engine_setup.engine_common.database database.getCredentials:189 database connection failed Traceback (most recent call last): File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/engine_common/database.py", line 187, in getCredentials ] = self.isNewDatabase() File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/engine_common/database.py", line 370, in isNewDatabase transaction=False, File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/engine_common/database.py", line 191, in execute database=database, File "/usr/share/ovirt-engine/setup/ovirt_engine_setup/engine_common/database.py", line 125, in connect sslmode=sslmode, File "/usr/lib64/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect conn = _connect(dsn, connection_factory=connection_factory, async=async) OperationalError: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5432? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5432? (Originally by Simone Tiraboschi)
Nikolay, you uploaded the wrong engine-setup log file. The real issue was here on hosted-engine-setup logs: 2016-12-12 16:57:16 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND |- [ ERROR ] It seems that you are running your engine inside of the hosted-engine VM and are not in "Global Maintenance" mode. In that case you should put the system into the "Global Maintenance" mode before running engine-setup, or the hosted-engine HA agent might kill the machine, which might corrupt your data. hosted-engine-setup checked for global maintenance mode and it was fine: 2016-12-12 16:27:11 INFO otopi.plugins.gr_he_common.vm.misc misc._late_setup:65 Checking maintenance mode 2016-12-12 16:27:11 DEBUG otopi.plugins.gr_he_common.vm.misc misc._late_setup:68 hosted-engine-status: {'engine_vm_up': True, 'all_host_stats': {1: {'live-data': True, 'extra': 'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=99379 (Mon Dec 12 16:26:40 2016)\nhost-id=1\nscore=3400\nmaintenance=False\nstate=GlobalMaintenance\nstopped=False\n', 'hostname': 'alma04.qa.lab.tlv.redhat.com', 'host-id': 1, 'engine-status': '{"health": "good", "vm": "up", "detail": "up"}', 'score': 3400, 'stopped': False, 'maintenance': False, 'crc32': 'c315b57c', 'host-ts': 99379}}, 'engine_vm_host': 'alma04.qa.lab.tlv.redhat.com', 'global_maintenance': True} The point is the engine-setup is checking for global-maintenance mode in the restored DB and not on the host. So, if you really took the DB when hosted-engine-setup asked you to create a DB you are fine. Nikolay, were you trying to restore a previous DB took outside global maintenance mode? The open gap is that engine-backup is not able to restore a DB took on a hosted-engine env when not in maintenance mode. (Originally by Simone Tiraboschi)
I've manually created db backup on engine, using " engine-backup --mode=backup --file=nsednev1 --log=Log_1" I've took db backup from engine and copied it to host's none root folder, while host was not in global maintenance, then I've set host to global maintenance. When I was trying to make an upgrade, db had inside of it information about host, which was not in global maintenance. (Originally by Nikolai Sednev)
If the user follows the instruction and he creates the backup when asked by hosted-engine-setup everything will be fine. (Originally by Simone Tiraboschi)
(In reply to Simone Tiraboschi from comment #13) > If the user follows the instruction and he creates the backup when asked by > hosted-engine-setup everything will be fine. That's true for migration. For backup/restore, we do not require maint during backup. Perhaps we should, so far we tried not to. (Originally by didi)
IMHO the docs should be updated at https://access.redhat.com/labs/rhevupgradehelper/, the first step there is "Step 1: Stop the ovirt engine service.", while "Step 4: Disable the high-availability agents on all the self-hosted engine hosts. To do this run the following command on any host in the cluster." should come first. (Originally by Nikolai Sednev)
Also after engine's db was copied to host, the engine's service should get started back, otherwise it will fail on "hosted-engine --upgrade-appliance", as service must be up on engine during the upgrade. (Originally by Nikolai Sednev)
Verified on: rhevm-4.0.7.4-0.1.el7ev.noarch # rpm -qa | grep hosted ovirt-hosted-engine-setup-2.0.4.3-3.el7ev.noarch ovirt-hosted-engine-ha-2.0.7-2.el7ev.noarch 1. Deploy HE environment 2. Add the storage domain to the engine(to start auto-import process) 3. Wait until the engine will have HE VM 4. Backup the engine: # engine-backup --mode=backup --file=engine.backup --log=engine-backup.log 5. Copy the backup file from the HE VM to the host 6. Clean host from HE deploy(reprovisioning) 7. Run the HE deployment again 8. Answer No on the question "Automatically execute engine-setup on the engine appliance on first boot (Yes, No)[Yes]? " 9. Enter to the HE VM and copy the backup file from the host to the HE VM 10. Run restore command: # engine-backup --mode=restore --scope=all --file=engine.backup --log=engine-restore.log --he-remove-storage-vm --he-remove-hosts --restore-permissions --provision-dwh-db --provision-db 11. Run engine setup: # engine-setup --offline 12. Finish HE deployment process Engine UP and have HE SD and HE VM in the active state
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0542.html