Description of problem: Yes the title is right :) The engine will fail to start, and therefore make hosted-engine --restore-from-file fail if the backup contains a VM that pinned to the deploy host and that VM also has no icons set. Let me try to explain step by step... 1. User has the following in the backup: A VM that is pinned to a host, without icons: engine=> select vm_guid,vm_name,dedicated_vm_for_vds,small_icon_id,large_icon_id,os from vm_static where vm_name = 'phobos-epd-node1'; vm_guid | vm_name | dedicated_vm_for_vds | small_icon_id | large_icon_id | os --------------------------------------+------------------+---------------------------------------------------------------------------+---------------+---------------+---- 982cce6c-2958-451c-b4fa-69936457da0e | vm.example.com | 2ee05b38-566d-487c-a995-4a4cfac1ff20,68215598-ffbd-4ed0-b2db-927fb281c104 | | | 24 Here is the pinning in vm_host_pinning_map engine=# select * from vm_host_pinning_map ; vm_id | vds_id --------------------------------------+-------------------------------------- 982cce6c-2958-451c-b4fa-69936457da0e | 2ee05b38-566d-487c-a995-4a4cfac1ff20 <---- 982cce6c-2958-451c-b4fa-69936457da0e | 68215598-ffbd-4ed0-b2db-927fb281c104 This is the host: engine=# select vds_id,vds_name,vds_unique_id from vds; vds_id | vds_name | vds_unique_id --------------------------------------+--------------------------+-------------------------------------- 2ee05b38-566d-487c-a995-4a4cfac1ff20 | rhvh1.example | 31393638-3132-5a43-3238-343030395a53 <---- 4f7bf150-e4fb-4479-b2ab-582638b4f803 | rhvh5.example | 31393638-3132-5a43-3238-343930314d35 be4c8207-6798-4415-bf89-6838848e74ca | rhvh3.example | 31393638-3132-5a43-3238-343030395a56 68215598-ffbd-4ed0-b2db-927fb281c104 | rhvh2.example | 31393638-3132-5a43-3238-343030395a51 cbb45084-2295-4f06-89b1-5a2e87038532 | rhvh4.example | 31393638-3132-5a43-3238-343930314d34 65b28e84-16dd-4711-99c3-0172a4fe9c02 | rhvh6.example | 31393638-3132-5a43-3238-343930314d33 (6 rows) 2. User will restore hosted-engine from backup. Takes a backup and is using rhvh1 to restore. The ansible does this: 2020-11-03 10:42:45,007+0200 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Remove host used to redeploy] 2020-11-03 10:42:53,125+0200 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 db_remove_he_host: {'stderr_lines': [], u'changed': True, u'end': u'2020-11-03 10:42:47.144851', u'stdout': u' deletevds \n-----------\n \n(1 row)', u'cmd': [u'scl', u'enable', u'rh-postgresql10', u'--', u'psql', u'-d', u'engine', u'-c', u"SELECT deletevds(vds_id) FROM (SELECT vds_id FROM vds WHERE upper(vds_unique_id)=upper('31393638-3132-5a43-3238-343030395a53')) t"], 'failed': False, u'delta': u'0:00:00.070959', u'stderr': u'', u'rc': 0, 'stdout_lines': [u' deletevds ', u'-----------', u' ', u'(1 row)'], u'start': u'2020-11-03 10:42:47.073892'} 3. Now we are at this state: Here is the pinning in vm_host_pinning_map, the entry is gone: engine=# select * from vm_host_pinning_map ; vm_id | vds_id --------------------------------------+-------------------------------------- 982cce6c-2958-451c-b4fa-69936457da0e | 68215598-ffbd-4ed0-b2db-927fb281c104 And the host is done too. engine=# select vds_id,vds_name,vds_unique_id from vds; vds_id | vds_name | vds_unique_id --------------------------------------+--------------------------+-------------------------------------- 4f7bf150-e4fb-4479-b2ab-582638b4f803 | rhvh5.example | 31393638-3132-5a43-3238-343930314d35 be4c8207-6798-4415-bf89-6838848e74ca | rhvh3.example | 31393638-3132-5a43-3238-343030395a56 68215598-ffbd-4ed0-b2db-927fb281c104 | rhvh2.example | 31393638-3132-5a43-3238-343030395a51 cbb45084-2295-4f06-89b1-5a2e87038532 | rhvh4.example | 31393638-3132-5a43-3238-343930314d34 65b28e84-16dd-4711-99c3-0172a4fe9c02 | rhvh6.example | 31393638-3132-5a43-3238-343930314d33 But here we still have a reference to the deleted host, which will trigger the problem later... engine=> select vm_guid,vm_name,dedicated_vm_for_vds,small_icon_id,large_icon_id,os from vm_static where vm_name = 'phobos-epd-node1'; vm_guid | vm_name | dedicated_vm_for_vds | small_icon_id | large_icon_id | os --------------------------------------+------------------+---------------------------------------------------------------------------+---------------+---------------+---- 982cce6c-2958-451c-b4fa-69936457da0e | vm.example.com | 2ee05b38-566d-487c-a995-4a4cfac1ff20,68215598-ffbd-4ed0-b2db-927fb281c104 | | | 24 So at this point we already see an issue, we have a left behind UUID in dedicated_vm_for_vds (see BZ1431138) 4. Now after the backup is restored, start the engine to continue HE deployment... 5. Engine will run this on startup, to fill the icons of VMs with missing icons: backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/IconLoader.java private void init() { loadIconsToDatabase(); ensureDefaultOsIconExists(); updateVmIconDefaultsTable(); updateVmStaticTable(); } Which calls this: private void updateVmStaticTable() { for (VmStatic vmStatic : vmStaticDao.getAllWithoutIcon()) { <---- load setIconsByOs(vmStatic); <---- set icons vmStaticDao.update(vmStatic); <---- write } And here the problem starts. getAllWithoutIcon() finds the VM above, as it has no icons. Then it sets the icons and calls UpdateVmStatic stored procedure, which, among other things: Create or replace FUNCTION UpdateVmStatic(v_description VARCHAR(4000) , ~~~ -- Update connections to dedicated hosts PERFORM UpdateDedicatedHostsToVm( <----- v_vm_guid, v_dedicated_vm_for_vds); ~~~ And that will try to insert an entry back in vm_host_pinning_map table, due to the dedicated_vm_for_vds still set in vm_static. But inserting that in vm_host_pinning_map is denied, because the host does not exist in vds_static as it was removed for re-deploy, foreign key doesnt exist. And the engine fails to start. 2020-11-03 10:45:20,213+02 ERROR [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 68) [] Error during initialization: org.jboss.weld.exceptions.WeldException: WELD-000049: Unable to invoke private void org.ovirt.engine.core.bll.IconLoader.init() on org.ovirt.engine.core.bll.IconLoader@7640dd06 at org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:85) [weld-core-impl.jar:3.0.6.Final-redhat-00002] at org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:66) [weld-core-impl.jar:3.0.6.Final-redhat-00002] at org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122) [weld-core-impl.jar:3.0.6.Final-redhat-00002] at org.jboss.weld.bean.ManagedBean.create(ManagedBean.java:162) [weld-core-impl.jar:3.0.6.Final-redhat-00002] at org.jboss.weld.contexts.AbstractContext.get(AbstractContext.java:96) [weld-core-impl.jar:3.0.6.Final-redhat-00002] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_232] at org.jboss.threads.JBossThread.run(JBossThread.java:485) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_232] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_232] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_232] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_232] at org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:83) [weld-core-impl.jar:3.0.6.Final-redhat-00002] ... 70 more Caused by: org.springframework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call updatevmstatic(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}ERROR: insert or update on table "vm_host_pinning_map" violates foreign key constraint "vm_host_pinning_map_vds_id_fkey" Detail: Key (vds_id)=(2ee05b38-566d-487c-a995-4a4cfac1ff20) is not present in table "vds_static". Where: SQL statement "INSERT INTO vm_host_pinning_map ( vm_id, vds_id ) SELECT v_vm_guid, vds_id FROM fnSplitterUuid(v_dedicated_vm_for_vds) AS vds_id" PL/pgSQL function insertdedicatedhoststovm(uuid,text) line 3 at SQL statement SQL statement "SELECT InsertDedicatedHostsToVm(v_vm_guid, v_dedicated_vm_for_vds)" PL/pgSQL function updatededicatedhoststovm(uuid,text) line 7 at PERFORM SQL statement "SELECT UpdateDedicatedHostsToVm( v_vm_guid, v_dedicated_vm_for_vds)" PL/pgSQL function updatevmstatic(character varying,text,integer,integer,integer,integer,uuid,uuid,character varying,uuid,timestamp with time zone,integer,boolean,boolean,boolean,integer,integer,integer,integer,character varying,boolean,boolean,boolean,boolean,character varying,text,integer,integer,integer,integer,integer,integer,character varying,integer,character varying,character varying,character varying,integer,character varying,character varying,integer,uuid,character varying,boolean,boolean,character varying,boolean,uuid,uuid,uuid,uuid,character varying,integer,integer,smallint,character varying,boolean,character varying,boolean,boolean,uuid,boolean,boolean,character varying,integer,character varying,uuid,uuid,uuid,character varying,character varying,character varying,uuid,uuid,boolean) line 83 at PERFORM; nested exception is org.postgresql.util.PSQLException: ERROR: insert or update on table "vm_host_pinning_map" violates foreign key constraint "vm_host_pinning_map_vds_id_fkey" Detail: Key (vds_id)=(2ee05b38-566d-487c-a995-4a4cfac1ff20) is not present in table "vds_static". Where: SQL statement "INSERT INTO vm_host_pinning_map ( vm_id, vds_id ) SELECT v_vm_guid, vds_id FROM fnSplitterUuid(v_dedicated_vm_for_vds) AS vds_id" PL/pgSQL function insertdedicatedhoststovm(uuid,text) line 3 at SQL statement SQL statement "SELECT InsertDedicatedHostsToVm(v_vm_guid, v_dedicated_vm_for_vds)" PL/pgSQL function updatededicatedhoststovm(uuid,text) line 7 at PERFORM SQL statement "SELECT UpdateDedicatedHostsToVm( v_vm_guid, v_dedicated_vm_for_vds)" PL/pgSQL function updatevmstatic(character varying,text,integer,integer,integer,integer,uuid,uuid,character varying,uuid,timestamp with time zone,integer,boolean,boolean,boolean,integer,integer,integer,integer,character varying,boolean,boolean,boolean,boolean,character varying,text,integer,integer,integer,integer,integer,integer,character varying,integer,character varying,character varying,character varying,integer,character varying,character varying,integer,uuid,character varying,boolean,boolean,character varying,boolean,uuid,uuid,uuid,uuid,character varying,integer,integer,smallint,character varying,boolean,character varying,boolean,boolean,uuid,boolean,boolean,character varying,integer,character varying,uuid,uuid,uuid,character varying,character varying,character varying,uuid,uuid,boolean) line 83 at PERFORM at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:247) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1402) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:1065) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.call(JdbcTemplate.java:1104) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.simple.AbstractJdbcCall.executeCallInternal(AbstractJdbcCall.java:414) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.simple.AbstractJdbcCall.doExecute(AbstractJdbcCall.java:374) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(SimpleJdbcCall.java:198) [spring-jdbc.jar:5.0.4.RELEASE] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:135) [dal.jar:] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:130) [dal.jar:] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeModification(SimpleJdbcCallsHandler.java:76) [dal.jar:] at org.ovirt.engine.core.dao.DefaultGenericDao.update(DefaultGenericDao.java:102) [dal.jar:] at org.ovirt.engine.core.dao.DefaultGenericDao.update(DefaultGenericDao.java:98) [dal.jar:] at org.ovirt.engine.core.bll.IconLoader.updateVmStaticTable(IconLoader.java:90) [bll.jar:] at org.ovirt.engine.core.bll.IconLoader.init(IconLoader.java:74) [bll.jar:] ... 75 more Caused by: org.postgresql.util.PSQLException: ERROR: insert or update on table "vm_host_pinning_map" violates foreign key constraint "vm_host_pinning_map_vds_id_fkey" Detail: Key (vds_id)=(2ee05b38-566d-487c-a995-4a4cfac1ff20) is not present in table "vds_static". Where: SQL statement "INSERT INTO vm_host_pinning_map ( vm_id, vds_id ) SELECT v_vm_guid, vds_id FROM fnSplitterUuid(v_dedicated_vm_for_vds) AS vds_id" PL/pgSQL function insertdedicatedhoststovm(uuid,text) line 3 at SQL statement SQL statement "SELECT InsertDedicatedHostsToVm(v_vm_guid, v_dedicated_vm_for_vds)" PL/pgSQL function updatededicatedhoststovm(uuid,text) line 7 at PERFORM SQL statement "SELECT UpdateDedicatedHostsToVm( v_vm_guid, v_dedicated_vm_for_vds)" PL/pgSQL function updatevmstatic(character varying,text,integer,integer,integer,integer,uuid,uuid,character varying,uuid,timestamp with time zone,integer,boolean,boolean,boolean,integer,integer,integer,integer,character varying,boolean,boolean,boolean,boolean,character varying,text,integer,integer,integer,integer,integer,integer,character varying,integer,character varying,character varying,character varying,integer,character varying,character varying,integer,uuid,character varying,boolean,boolean,character varying,boolean,uuid,uuid,uuid,uuid,character varying,integer,integer,smallint,character varying,boolean,character varying,boolean,boolean,uuid,boolean,boolean,character varying,integer,character varying,uuid,uuid,uuid,character varying,character varying,character varying,uuid,uuid,boolean) line 83 at PERFORM at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2178) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155) at org.postgresql.jdbc.PgCallableStatement.executeWithFlags(PgCallableStatement.java:78) at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:144) at org.jboss.jca.adapters.jdbc.CachedPreparedStatement.execute(CachedPreparedStatement.java:303) at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.execute(WrappedPreparedStatement.java:442) at org.springframework.jdbc.core.JdbcTemplate.lambda$call$4(JdbcTemplate.java:1105) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:1050) [spring-jdbc.jar:5.0.4.RELEASE] ... 86 more I understand the engine would eventually clean dedicated_vm_for_vds, but here it failed to start so it cannot do it. I have no idea where you want to fix this, assigning to Virt team first... Version-Release number of selected component (if applicable): rhvm-4.3.7.2-0.1.el7.noarch (customer) rhvm-4.4.2.6-0.2.el8ev.noarch (reproduced) How reproducible: * Partially, I'm not sure how the VM with OS 24 (RHEL7-64) has no icons set, but this can happen with other OS that has no icons too, like rhel6 ppc. Steps to Reproduce: 1. Create VM, pin it to a host 2. Remove its icons: UPDATE vm_static SET small_icon_id=NULL,large_icon_id=NULL WHERE vm_guid='833cea8e-5910-42ed-8e57-13bce68950bd'; 3. Run deletevds like --restore-from-file does on the host that the VM is pinned to SELECT deletevds('eaaeb388-16fc-442a-8997-65a7bf1e7c35'); 4. restart ovirt-engine Actual results: * Fail to start ovirt-engine when restored DB contains VM pinned to deleted host and VM has no icons * Leftover entry in vm_static.dedicated_vm_for_vds when deletevds() runs. Expected results: * Engine starts * No left behind UUID in dedicated_vm_for_vds when host is deleted
I haven't found so far a good way to workaround this. The hook enginevm_before_engine_setup is too early (no DB up) and enginevm_after_engine_setup is too late (engine already starting) Any better ideas than the below? * manual intervention via SSH to the VM * editing the backup I was thinking of enginevm_after_engine_setup to do the DB edit and then also restart ovirt-engine after it, but did not have time to test it today.
IIUC it’s removing the host since it’s replaced by the one you’re running restore on. The code does some crude cleanup at db level for that, and it just doesn’t consider this case. It shouldn’t be too hard to add code changing the vds id for pinned vms. I mean...besides the reported issue we’re basically dropping all pinnings to this host, that’s wrong. The whole practice to remove vds feels wrong, but it may be too intrusive to change, dunno.
(In reply to Germano Veit Michel from comment #1) > Any better ideas than the below? > * manual intervention via SSH to the VM > * editing the backup > > I was thinking of enginevm_after_engine_setup to do the DB edit and then > also restart ovirt-engine after it, but did not have time to test it today. maybe it can work without using a hook - just by letting the engine start (and fail), connecting to the HE VM to change the database and restart ovirt-engine
(In reply to Arik from comment #3) > maybe it can work without using a hook - just by letting the engine start > (and fail), connecting to the HE VM to change the database and restart > ovirt-engine Right, that is what I proposed to the customer. The problem is one needs to keep paying attention to the deployment to do it at the right step. Anyway I think its a very corner case, so a more elegant solution is probably not needed. The issue seems to not be new at all, and we have never seen it before AFAICT. Thanks
(In reply to Michal Skrivanek from comment #2) > It shouldn’t be too hard to add code changing the vds id for pinned vms. yeah, but that's just a consequence of improper modeling in the database I think. we can fix it like that for this particular case, however, if it's not too complex I'd prefer to generate vm_static.dedicated_vm_for_vds from vm_host_pinning_map rather than storing the data in both places.
And this is how we get to VMs without icons set in vm_static: https://bugzilla.redhat.com/show_bug.cgi?id=1897422
For anyone that would read it later on and wonder why "select vm_guid,vm_name,dedicated_vm_for_vds,small_icon_id,large_icon_id,os from vm_static" fails to find dedicated_vm_for_vds, from now on you need to use vm_static_view instead of vm_static to get the dedicated_vm_for_vds field
Unable to reconstruct on latest ovirt-engine-setup-4.4.5.10-0.1.el8ev.noarch. I did not found any OS type without an icon. Backup and restore works fine, even with pinned to host VM. ovirt-hosted-engine-setup-2.4.9-4.el8ev.noarch ovirt-hosted-engine-ha-2.4.6-1.el8ev.noarch Linux 4.18.0-240.21.1.el8_3.x86_64 #1 SMP Wed Mar 17 11:34:58 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux release 8.3 (Ootpa)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) 4.4.z [ovirt-4.4.5] security, bug fix, enhancement), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1169
Due to QE capacity, we are not going to cover this issue in our automation