Description of problem: hosted-engine deploy isn't able to restore a hosted engine (from hyperconverged glsusterfs to a different storage) Version-Release number of selected component (if applicable): ovirt-ansible-hosted-engine-setup-1.0.13-1.el7.noarch ovirt-hosted-engine-setup-2.3.6-1.el7.noarch How reproducible: always Having 4 hosts lvh1..lvh4, where HE gluster storage runs on lvh2..lvh4. HE runs on lvh4. Steps to Reproduce: 1. put lvh3 to maintenance (without stopping glusterfs on this host) 2. doing an engine-backup 3. put HE in global maintenance mode 4. power down he-engine from within engine 5. on lvh3 doing a hosted-engine --deploy --restore-from-file=ovirt-engine-backup.tar.gz --config-append=answers.conf Actual results: [ INFO ] TASK [ovirt.hosted_engine_setup : Wait until OVF update finishes] [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_storage_domains": [{"available": 6570226221056, "backup": false, "block_size": 512, "comment": "", "committed": 57982058496, "critical_space_action_blocker": 5, "data_centers": [{"href": "/ovirt-engine/api/datacenters/57ff41c2-00f4-00e2-02e7-0000000000f7", "id": "57ff41c2-00f4-00e2-02e7-0000000000f7"}], "description": "", "discard_after_delete": false, "disk_profiles": [{"id": "08bd03d0-3f08-49ef-9633-1bbef7b34be9", "name": "hosted_storage"}], "disk_snapshots": [], "disks": [{"id": "d9669327-15ca-4e75-8a52-f504e35240db", "image_id": "f29c7adc-8cba-40f6-9969-0ebd5142d89c", "name": "he_metadata"}, {"id": "1b499bd2-5c50-4906-9e5c-0d46c90c03bc", "image_id": "efb9d6fa-4b38-4245-b2af-411a639fe263", "name": "HostedEngineConfigurationImage"}, {"id": "0c6a2d8a-90dc-4ce7-ae32-a11fddcbc8d7", "image_id": "e5d6e871-e2a1-472a-bcda-e16f1ee8cb02", "name": "he_sanlock"}, {"id": "62572233-3e53-4aac-8733-0303983a208e", "image_id": "6d42f0fb-8c65-4d61-a279-e2d9bbff37c1", "name": "he_virtio_disk"}], "external_status": "ok", "href": "/ovirt-engine/api/storagedomains/d315ea71-d0f7-4638-a5cc-75f1cadc6b27", "id": "d315ea71-d0f7-4638-a5cc-75f1cadc6b27", "master": false, "name": "hosted_storage", "permissions": [{"id": "4d2d6706-079c-44e2-94f9-10bffeacf506"}, {"id": "57ff41cd-0037-0079-010a-000000000083"}, {"id": "57ff41f8-035e-026a-0360-000000000119"}, {"id": "6b0ed12a-6a78-4c91-aaf2-5a41766d7ef5"}, {"id": "7c841084-dd29-42c4-b8ac-002bc16d5140"}, {"id": "95cba4f2-29fc-11e9-9010-00163e2e0d81"}, {"id": "db55898a-337e-43b6-9e3e-75a93c8a97b2"}, {"id": "de173d32-14d1-40ca-8067-2a1e7700e647"}, {"id": "fddcd04c-5848-4924-bdfa-60f75cbc8421"}], "storage": {"address": "nas.fqdn", "mount_options": "", "nfs_version": "auto", "path": "/volume1/engine", "type": "nfs"}, "storage_connections": [{"id": "47f830cf-9f17-4266-a6a8-d3acb70ea15a"}], "storage_format": "v4", "supports_discard": false, "supports_discard_zeroes_data": false, "templates": [], "type": "data", "used": 3516504473600, "vms": [{"id": "8e5b6556-09b7-4d73-8f23-5674780a8718", "name": "HostedEngine"}], "warning_low_space_indicator": 10, "wipe_after_delete": false}]}, "attempts": 12, "changed": false} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook Within temporary vm this probably correlates with engine.log entries: 2019-03-29 11:36:41,437+01 INFO [org.ovirt.engine.core.bll.storage.ovfstore.OvfDataUpdater] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [] Attempting to update VMs/Templates Ovf. 2019-03-29 11:36:41,439+01 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [6dd0f9] Before acquiring and wait lock 'EngineLock:{exclusiveLocks='[57ff41c2-00f4-0 0e2-02e7-0000000000f7=OVF_UPDATE]', sharedLocks=''}' 2019-03-29 11:36:41,439+01 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [6dd0f9] Lock-wait acquired to object 'EngineLock:{exclusiveLocks='[57ff41c2-00f4-00e 2-02e7-0000000000f7=OVF_UPDATE]', sharedLocks=''}' 2019-03-29 11:36:41,440+01 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [6dd0f9] Running command: ProcessOvfUpdateForStoragePoolCommand internal: true. Entit ies affected : ID: 57ff41c2-00f4-00e2-02e7-0000000000f7 Type: StoragePool 2019-03-29 11:36:41,495+01 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [6dd0f9] Attempting to update VM OVFs in Data Center 'gematik' 2019-03-29 11:36:41,906+01 WARN [org.ovirt.engine.core.vdsbroker.builder.vminfo.VmInfoBuildUtils] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [6dd0f9] No host NUMA nodes found for vm HostedEngine (8e5b6556-09b7-4d73-8f23-5674780a8718) 2019-03-29 11:36:42,577+01 ERROR [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [6dd0f9] Command 'org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStora gePoolCommand' failed: Index: 0, Size: 0 2019-03-29 11:36:42,577+01 ERROR [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [6dd0f9] Exception: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) [rt.jar:1.8.0_201] at java.util.ArrayList.get(ArrayList.java:433) [rt.jar:1.8.0_201] at org.ovirt.engine.core.utils.ovf.OvfVmWriter.writeSnapshotsSection(OvfVmWriter.java:135) [utils.jar:] at org.ovirt.engine.core.utils.ovf.OvfVmWriter.writeHardware(OvfVmWriter.java:108) [utils.jar:] at org.ovirt.engine.core.utils.ovf.OvfWriter.buildVirtualSystem(OvfWriter.java:190) [utils.jar:] at org.ovirt.engine.core.utils.ovf.IOvfBuilder.build(IOvfBuilder.java:34) [utils.jar:] at org.ovirt.engine.core.utils.ovf.OvfManager.exportVm(OvfManager.java:76) [bll.jar:] at org.ovirt.engine.core.bll.storage.ovfstore.OvfUpdateProcessHelper.generateVmMetadata(OvfUpdateProcessHelper.java:155) [bll.jar:] at org.ovirt.engine.core.bll.storage.ovfstore.OvfUpdateProcessHelper.buildMetadataDictionaryForVm(OvfUpdateProcessHelper.java:80) [bll.jar:] at org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand.populateVmsMetadataForOvfUpdate(ProcessOvfUpdateForStoragePoolCommand.java:364) [bll.jar:] at org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand.updateOvfForVmsOfStoragePool(ProcessOvfUpdateForStoragePoolCommand.java:187) [bll.jar:] at org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand.executeCommand(ProcessOvfUpdateForStoragePoolCommand.java:121) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1157) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1315) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1964) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1375) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:419) [bll.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:450) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:432) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:377) [bll.jar:] at sun.reflect.GeneratedMethodAccessor296.invoke(Unknown Source) [:1.8.0_201] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_201] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_201] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509) at org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.delegateInterception(Jsr299BindingsInterceptor.java:79) at org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.doMethodInterception(Jsr299BindingsInterceptor.java:89) at org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.processInvocation(Jsr299BindingsInterceptor.java:102) at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.invocationmetrics.ExecutionTimeInterceptor.processInvocation(ExecutionTimeInterceptor.java:43) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.invocationmetrics.ExecutionTimeInterceptor.processInvocation(ExecutionTimeInterceptor.java:43) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ee.concurrent.ConcurrentContextInterceptor.processInvocation(ConcurrentContextInterceptor.java:45) [wildfly-ee-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:40) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:53) at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:52) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:53) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:216) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:418) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:148) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509) at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:81) [weld-ejb-3.0.5.Final.jar:3.0.5.Final] at org.jboss.as.weld.ejb.EjbRequestScopeActivationInterceptor.processInvocation(EjbRequestScopeActivationInterceptor.java:89) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.invocationmetrics.WaitTimeInterceptor.processInvocation(WaitTimeInterceptor.java:47) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.security.SecurityContextInterceptor.processInvocation(SecurityContextInterceptor.java:100) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.deployment.processors.StartupAwaitInterceptor.processInvocation(StartupAwaitInterceptor.java:22) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:67) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ContextClassLoaderInterceptor.processInvocation(ContextClassLoaderInterceptor.java:60) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InterceptorContext.run(InterceptorContext.java:438) at org.wildfly.security.manager.WildFlySecurityManager.doChecked(WildFlySecurityManager.java:618) at org.jboss.invocation.AccessCheckingInterceptor.processInvocation(AccessCheckingInterceptor.java:57) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:53) at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:198) at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:185) at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:81) at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view4.runInternalAction(Unknown Source) [bll.jar:] at sun.reflect.GeneratedMethodAccessor294.invoke(Unknown Source) [:1.8.0_201] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_201] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_201] at org.jboss.weld.util.reflection.Reflections.invokeAndUnwrap(Reflections.java:410) [weld-core-impl-3.0.5.Final.jar:3.0.5.Final] at org.jboss.weld.module.ejb.EnterpriseBeanProxyMethodHandler.invoke(EnterpriseBeanProxyMethodHandler.java:134) [weld-ejb-3.0.5.Final.jar:3.0.5.Final] at org.jboss.weld.bean.proxy.EnterpriseTargetBeanInstance.invoke(EnterpriseTargetBeanInstance.java:56) [weld-core-impl-3.0.5.Final.jar:3.0.5.Final] at org.jboss.weld.module.ejb.InjectionPointPropagatingEnterpriseTargetBeanInstance.invoke(InjectionPointPropagatingEnterpriseTargetBeanInstance.java:68) [weld-ejb-3.0.5.Final.jar:3.0.5.Final] at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:106) [weld-core-impl-3.0.5.Final.jar:3.0.5.Final] at org.ovirt.engine.core.bll.BackendCommandObjectsHandler$BackendInternal$BackendLocal$2049259618$Proxy$_$$_Weld$EnterpriseProxy$.runInternalAction(Unknown Source) [bll.jar:] at org.ovirt.engine.core.bll.storage.ovfstore.OvfDataUpdater.performOvfUpdateForStoragePool(OvfDataUpdater.java:65) [bll.jar:] at org.ovirt.engine.core.bll.storage.ovfstore.OvfDataUpdater.updateOvfData(OvfDataUpdater.java:85) [bll.jar:] at org.ovirt.engine.core.bll.storage.ovfstore.OvfDataUpdater.ovfUpdate(OvfDataUpdater.java:72) [bll.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_201] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_201] at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent-1.0.jar:] at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent-1.0.jar:] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_201] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_201] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_201] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78) 2019-03-29 11:36:42,581+01 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [6dd0f9] Lock freed to object 'EngineLock:{exclusiveLocks='[57ff41c2-00f4-00e2-02e7-0 000000000f7=OVF_UPDATE]', sharedLocks=''}' 2019-03-29 11:36:42,581+01 ERROR [org.ovirt.engine.core.bll.storage.ovfstore.OvfDataUpdater] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [6dd0f9] Exception while trying to update or remove VMs/Templates ovf in Data Center 'gematik'. 2019-03-29 11:36:42,581+01 ERROR [org.ovirt.engine.core.bll.storage.ovfstore.OvfDataUpdater] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [6dd0f9] Data Center '{}' domains list for OVF update returned as NULL Expected results: hosted-engine-setup finish successfully Additional info:
It comes from here: https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/utils/src/main/java/org/ovirt/engine/core/utils/ovf/OvfVmWriter.java#L134 after the restore of a backup. Not sure if just the result of a VM with an inconsistent snapshot in the backup or something like that.
I'm not sure if any of my vms causes this as I purposely don't have any additional vms beside HostedEngine on HE-storage - name: Wait until OVF update finishes ovirt_storage_domain_facts: auth: "{{ ovirt_auth }}" fetch_nested: true nested_attributes: - name - image_id - id pattern: "name={{ he_storage_domain_name }}" retries: 12 delay: 10 until: "ovirt_storage_domains[0].disks | selectattr('name', 'match', '^OVF_STORE$') | list" As a workaround I just allowed the task to fail to finish my recovery by adding a "ignore_errors: yes" Let me know if there are any API calls or SQL scripts to run to get to the bottom of this
(In reply to André Liebe from comment #2) > I'm not sure if any of my vms causes this as I purposely don't have any > additional vms beside HostedEngine on HE-storage I fear it's due to another VM with not existing/inconsistent/... snapshot in the same datacenter (not on the same storage domain).
Seems so, I remember getting OVF update errors for a while while trying to put domains into maintenance, but I've got no clue where this comes from as I hit many bugs in the past, like bug 1670774 or noumerous hyperconverged glusterfs problems Is there a sanity-check tool for database available?
What's the version of ovirt-engine?
That's ovirt-engine-4.3.2.1-1.el7.noarch
Shmuel, thoughts? This doesn't look like the other OVF issues resolved last week
If you need anything more (like a database snapshot, backup file, ...), you'll better tell me now, as I'll be out of office for tree weeks starting friday afternoon (MEST)
If you can reproduce and attach a full log (or grab the engine log for the day this occurred), that would be great
Created attachment 1554236 [details] engine logs I just did a reinstall today (just ignore error messages from missing old HA storage) Is there a way to restrict attachments access to employees, so logs are non public?
(In reply to André Liebe from comment #10) > Is there a way to restrict attachments access to employees, so logs are non > public? Yes, I just uploded a copy of it as a private attachment. Your can simply delete the file you uploaded.
(In reply to Ryan Barry from comment #7) > Shmuel, thoughts? This doesn't look like the other OVF issues resolved last > week The NPE occurs because there is a snapshot with memory and the memory volume does not have a storage ID assigned. This should never happen in normal situation, as far as I understand. But I'm not sure it is related to the backup/restore of the hosted-engine. Is that broken memory snapshot of the hosted-engine VM? Does engine-backup finishes without errors?
I remember seeing this NPE in logs for while. I'm not sure when it appeared first. Probably sometime around 4.1->4.2 upgrade, as I had sereve problems with gluster, NFS (Synology/BtrFS) and sanlock, I igored this error in logs. > Does engine-backup finishes without errors? logs are fine, no errors logged
Any news on this issue? I updated my environment yesterday to 4.3.3.7-1.el7, but still get this NPE when selecting my master storage domain and selecting "Update OVFs". Additionally my hosted-engine hosts started to log this (which I assume to be a side effect of this) ovirt-ha-agent[28894]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs
No reproducer yet Do you have a memory snapshot of the HE VM?
No.
But there may be snapshots on other vms, as I migrated a lot vms from (already deleted) gluster and nfs volumes. Is there a SQL Statement I could use to check?
(In reply to Shmuel Melamud from comment #13) > (In reply to Ryan Barry from comment #7) > > Shmuel, thoughts? This doesn't look like the other OVF issues resolved last > > week > > The NPE occurs because there is a snapshot with memory and the memory volume > does not have a storage ID assigned. This should never happen in normal > situation, as far as I understand. > > But I'm not sure it is related to the backup/restore of the hosted-engine. > Is that broken memory snapshot of the hosted-engine VM? Does engine-backup > finishes without errors? How do I identify the invalid snaphots (and purge them)?
Okay, fixed it myself by attaching remotely to engine jvm and stepped through each vm untill NPE happend. Remebering vm's name I purged it's snapshots from web gui. Would be nice do add a log.debug statement for each VM beeing handled. On the other hand, I would expect engine to gracefully handle a missing disk by throwing a useful warning/error/alert in webgui.
Created attachment 1584289 [details] engine log while deleteing a snapshot (with missing img-file)
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly
ok, closing. Please reopen if still relevant/you want to work on it.