Bug 1697261 - The VM moves to "Paused" or "Not-responding" status after restoring the connection between the Gluster domain and the host with iptables
Summary: The VM moves to "Paused" or "Not-responding" status after restoring the conne...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: future
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact: sshmulev
URL:
Whiteboard:
Depends On:
Blocks: 1566471
TreeView+ depends on / blocked
 
Reported: 2019-04-08 08:35 UTC by Shir Fishbain
Modified: 2022-01-18 11:35 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-29 11:33:06 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
Logs (2.39 MB, application/zip)
2019-04-08 08:35 UTC, Shir Fishbain
no flags Details
engine logs for the full run (848.60 KB, text/plain)
2019-09-27 12:02 UTC, Fedor Gavrilov
no flags Details
new_logs (1.14 MB, application/zip)
2020-04-13 11:59 UTC, Shir Fishbain
no flags Details

Description Shir Fishbain 2019-04-08 08:35:35 UTC
Created attachment 1553484 [details]
Logs

Description of problem:
After restore the connection between gluster domain and host with iptables, the VM doesn't switch to up status.
For the first attempt, the VM moves to paused status
For the second attempt, the VM moves to "Not-responding" status

Version-Release number of selected component (if applicable):
vdsm-4.30.12-1.el7ev.x86_64
ovirt-engine-4.3.3.2-0.1.el7.noarch

How reproducible:
100%

Steps to reproduce:
1. Create a VM with 2 gluster disks, OS and write to one of the disks
2. Running LSM from gluster domain to iscsi domain
3. Block connection between host running the VM and gluster storage (iptables -A OUTPUT -d 10.35.83.240,241,242 (3 nodes of the gluster domain) -j DROP) - host is moved to Non-operational
4. Restore connection to gluster storage - host is moved Up again

Actual results:
In each attempt to reproduce bug number 1566471, the VM moves to "Paused" or "Not-responding" status.

1. Paused , ran on host_mixed_3
From engine log:
2019-04-07 18:01:41,724+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [] EVENT_ID: VM_PAUSED(1,025), VM shir_vm_1 has been paused.
2019-04-07 18:01:41,742+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-87) [] EVENT_ID: VM_PAUSED_ERROR(139), VM shir_vm_1 has been paused due to unknown storage error.

2. Not responding, ran on host_mixed_2 
From engine log:
2019-04-07 18:51:44,894+03 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engine-Thread-5159) [4d019771] Failed to migrate VM 'shir_vm_2'
2019-04-07 18:52:22,058+03 INFO  [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [101faad5] Lock Acquired to object 'EngineLock:{exclus
iveLocks='[ebe30960-efcd-4b81-8c6f-262b311893bb=PROVIDER]', sharedLocks=''}'
2019-04-07 18:52:22,079+03 INFO  [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [101faad5] Running command: SyncNetworkProviderCommand
 internal: true.
2019-04-07 18:52:22,257+03 INFO  [org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default task-50) [] User admin@internal successfully logged in with scopes: ovirt-app-api ovirt-ext=token-info:authz-search
 ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access
2019-04-07 18:52:22,507+03 INFO  [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-81) [101faad5] Lock freed to object 'EngineLock:{exclusive
Locks='[ebe30960-efcd-4b81-8c6f-262b311893bb=PROVIDER]', sharedLocks=''}'
2019-04-07 18:52:45,212+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-12) [] VM 'd5bf5866-2f72-4f0e-9997-60dc13d0004b'(shir_vm_2) moved from 'Paused' --> 'Up'
2019-04-07 18:52:45,346+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-12) [] EVENT_ID: VM_RECOVERED_FROM_PAUSE_ERROR(196), VM shir_vm_2 has recovered from
 paused back to up.
2019-04-07 18:52:50,225+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-21) [] VM 'd5bf5866-2f72-4f0e-9997-60dc13d0004b'(shir_vm_2) moved from 'Up
' --> 'NotResponding'
2019-04-07 18:52:50,243+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-21) [] EVENT_ID: VM_NOT_RESPONDING(126), VM shir_vm_2 is not responding.
2019-04-07 18:54:35,730+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-38) [] EVENT_ID: VM_NOT_RESPONDING(126), VM shir_vm_2 is not responding.

Expected results:
The VM should switch to up status  

Additional info:
Couldn't finish verified bug number 1566471, since the VM is not getting to "up" status in order to conduct LSM for the second time.

Comment 1 Benny Zlotnik 2019-04-08 14:43:14 UTC
can this be reproduced without running LSM?

Comment 8 Fedor Gavrilov 2019-09-27 12:02:28 UTC
Created attachment 1620040 [details]
engine logs for the full run

Comment 16 Shir Fishbain 2020-04-13 11:59:53 UTC
Created attachment 1678431 [details]
new_logs

Comment 19 Michal Skrivanek 2021-08-20 08:27:31 UTC
This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. 
Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.

Comment 20 Michal Skrivanek 2021-09-29 11:33:06 UTC
This bug didn't get any attention in a long time, and it's not planned in foreseeable future. oVirt development team has no plans to work on it.
Please feel free to reopen if you have a plan how to contribute this feature/bug fix.


Note You need to log in before you can comment on or make changes to this bug.