Created attachment 1553484 [details] Logs Description of problem: After restore the connection between gluster domain and host with iptables, the VM doesn't switch to up status. For the first attempt, the VM moves to paused status For the second attempt, the VM moves to "Not-responding" status Version-Release number of selected component (if applicable): vdsm-4.30.12-1.el7ev.x86_64 ovirt-engine-4.3.3.2-0.1.el7.noarch How reproducible: 100% Steps to reproduce: 1. Create a VM with 2 gluster disks, OS and write to one of the disks 2. Running LSM from gluster domain to iscsi domain 3. Block connection between host running the VM and gluster storage (iptables -A OUTPUT -d 10.35.83.240,241,242 (3 nodes of the gluster domain) -j DROP) - host is moved to Non-operational 4. Restore connection to gluster storage - host is moved Up again Actual results: In each attempt to reproduce bug number 1566471, the VM moves to "Paused" or "Not-responding" status. 1. Paused , ran on host_mixed_3 From engine log: 2019-04-07 18:01:41,724+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [] EVENT_ID: VM_PAUSED(1,025), VM shir_vm_1 has been paused. 2019-04-07 18:01:41,742+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [] EVENT_ID: VM_PAUSED_ERROR(139), VM shir_vm_1 has been paused due to unknown storage error. 2. Not responding, ran on host_mixed_2 From engine log: 2019-04-07 18:51:44,894+03 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engine-Thread-5159) [4d019771] Failed to migrate VM 'shir_vm_2' 2019-04-07 18:52:22,058+03 INFO [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [101faad5] Lock Acquired to object 'EngineLock:{exclus iveLocks='[ebe30960-efcd-4b81-8c6f-262b311893bb=PROVIDER]', sharedLocks=''}' 2019-04-07 18:52:22,079+03 INFO [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [101faad5] Running command: SyncNetworkProviderCommand internal: true. 2019-04-07 18:52:22,257+03 INFO [org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default task-50) [] User admin@internal successfully logged in with scopes: ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access 2019-04-07 18:52:22,507+03 INFO [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [101faad5] Lock freed to object 'EngineLock:{exclusive Locks='[ebe30960-efcd-4b81-8c6f-262b311893bb=PROVIDER]', sharedLocks=''}' 2019-04-07 18:52:45,212+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-12) [] VM 'd5bf5866-2f72-4f0e-9997-60dc13d0004b'(shir_vm_2) moved from 'Paused' --> 'Up' 2019-04-07 18:52:45,346+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-12) [] EVENT_ID: VM_RECOVERED_FROM_PAUSE_ERROR(196), VM shir_vm_2 has recovered from paused back to up. 2019-04-07 18:52:50,225+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-21) [] VM 'd5bf5866-2f72-4f0e-9997-60dc13d0004b'(shir_vm_2) moved from 'Up ' --> 'NotResponding' 2019-04-07 18:52:50,243+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-21) [] EVENT_ID: VM_NOT_RESPONDING(126), VM shir_vm_2 is not responding. 2019-04-07 18:54:35,730+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-38) [] EVENT_ID: VM_NOT_RESPONDING(126), VM shir_vm_2 is not responding. Expected results: The VM should switch to up status Additional info: Couldn't finish verified bug number 1566471, since the VM is not getting to "up" status in order to conduct LSM for the second time.
can this be reproduced without running LSM?
Created attachment 1620040 [details] engine logs for the full run
Created attachment 1678431 [details] new_logs
This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.
This bug didn't get any attention in a long time, and it's not planned in foreseeable future. oVirt development team has no plans to work on it. Please feel free to reopen if you have a plan how to contribute this feature/bug fix.