Bug 844291 - ovirt-engine-backend [scalability]: Engine sets hosts to NonOperational based on cached information from a Storage-domain which is no longer problematic.
Summary: ovirt-engine-backend [scalability]: Engine sets hosts to NonOperational based...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.1.0
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: ---
Assignee: mkublin
QA Contact: Omri Hochman
URL:
Whiteboard: infra
: 729262 (view as bug list)
Depends On: 844656
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-07-30 08:49 UTC by Omri Hochman
Modified: 2016-02-10 19:32 UTC (History)
13 users (show)

Fixed In Version: si19
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine.log (669.74 KB, application/octet-stream)
2012-07-30 08:49 UTC, Omri Hochman
no flags Details

Description Omri Hochman 2012-07-30 08:49:14 UTC
Created attachment 601153 [details]
engine.log

ovirt-engine-backend [scalability]: Engine sets hosts to NonOperational based on cached information from a Storage-domain which is no longer problematic.

Description:
*************
I have Setup environment with 6 hosts and 50 iSCSI SD's, one of the hosts is SPM and constantly in 'Up' status, the other 5 hosts are switching between 'Up' and 'Non-Operational' status.  According the engine.logs the hosts have a problem with one of the SDs and there for engine switch the hosts to Non-operational, 
but few minutes later the auto-recovery checks if there's a connection to All Storage-domains and deciding to initiate the hosts back to 'Up'.

Investigation showed: 
**********************
rhevm-engine writes to cache 'problematic Storage-Domains'. engine do not clear the cached list when connection to storage domain resume, there for Engine sets hosts to NonOperational and auto-recovery resume them.

Results: 
********
(5 out of 6) hosts - kept switching between 'Up' to 'non-operational'.
  
Engine.log 
***********
2012-07-30 11:25:52,405 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-29) [cfe82d] vds puma05 reported domain b7491d9b-a0fb-4
323-b787-b782e4695872:3600601601282300032f94e5b78cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational
2012-07-30 11:25:55,334 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-13) [7140b355] starting ProcessDomainRecovery for domai
n d541c3a6-961f-49b3-a6c2-79e8b537f0c8
2012-07-30 11:25:55,363 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-13) [7140b355] vds puma06 reported domain d541c3a6-961f
-49b3-a6c2-79e8b537f0c8:360060160128230006664545578cbe111 as in problem, moving the vds to status NonOperational
2012-07-30 11:25:55,371 WARN  [org.ovirt.engine.core.dal.job.ExecutionMessageDirector] (QuartzScheduler_Worker-13) [7b17c6cf] The message key SetNonOperationalVds is miss
ing from bundles/ExecutionMessages
2012-07-30 11:25:55,410 INFO  [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-13) [7b17c6cf] Running command: SetNonOperationalVdsCommand 
internal: true. Entities affected :  ID: 8b8a000e-d97e-11e1-aaed-97cb62be7a95 Type: VDS
2012-07-30 11:25:55,411 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-13) [7b17c6cf] START, SetVdsStatusVDSCommand(vdsId = 8b8a00
0e-d97e-11e1-aaed-97cb62be7a95, status=NonOperational, nonOperationalReason=STORAGE_DOMAIN_UNREACHABLE), log id: 2cd8cb2c
2012-07-30 11:25:55,431 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-13) [7b17c6cf] FINISH, SetVdsStatusVDSCommand, log id: 2cd8
cb2c
2012-07-30 11:26:02,744 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-72) [2313f95f] starting ProcessDomainRecovery for domai
n 4c84bb36-96fa-4b73-89e1-90f81c394483
2012-07-30 11:26:02,770 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-72) [2313f95f] vds puma09 reported domain 4c84bb36-96fa
-4b73-89e1-90f81c394483:360060160128230006e64545578cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational
2012-07-30 11:26:02,770 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-75) [560acdcf] starting ProcessDomainRecovery for domai
n fc5b0ee4-6542-4c76-bc79-7afa9d2e2c71
2012-07-30 11:26:02,792 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-75) [560acdcf] vds puma09 reported domain fc5b0ee4-6542
-4c76-bc79-7afa9d2e2c71:360060160128230003ef94e5b78cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational
2012-07-30 11:26:02,792 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) [18ddfad7] starting ProcessDomainRecovery for domai
n f29a9642-e5c4-474d-8a0a-6cfbc6dcba15
2012-07-30 11:26:02,819 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) [18ddfad7] vds puma09 reported domain f29a9642-e5c4
-474d-8a0a-6cfbc6dcba15:3600601601282300044f94e5b78cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational
2012-07-30 11:26:02,819 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-8) [b04efda] starting ProcessDomainRecovery for domain 
fde98d91-61e2-42aa-aef3-5b083b6635dd
2012-07-30 11:26:02,842 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-8) [b04efda] vds puma09 reported domain fde98d91-61e2-4
2aa-aef3-5b083b6635dd:360060160128230000db5096878cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational
2012-07-30 11:30:00,000 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [4cee01f6] Checking autorecoverable hosts
2012-07-30 11:30:00,019 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [4cee01f6] Autorecovering 5 hosts
2012-07-30 11:30:00,019 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [4cee01f6] Autorecovering hosts: 89813a7a-d97e-11e1-a44d-1b8ec9396a72
2012-07-30 11:30:00,024 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [282f2529] Lock Acquired to object EngineLock [exclusiveLocks= key: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 value: VDS
, sharedLocks= ]
2012-07-30 11:30:00,024 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [282f2529] Running command: ActivateVdsCommand internal: true. Entities affected :  ID: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 Type: VDS
2012-07-30 11:30:00,027 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [282f2529] START, SetVdsStatusVDSCommand(vdsId = 89813a7a-d97e-11e1-a44d-1b8ec9396a72, status=Unassigned, nonOperationalReason=NONE), log id: a2adbc
2012-07-30 11:30:00,038 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [282f2529] FINISH, SetVdsStatusVDSCommand, log id: a2adbc
2012-07-30 11:30:00,055 INFO  [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [282f2529] START, ActivateVdsVDSCommand(vdsId = 89813a7a-d97e-11e1-a44d-1b8ec9396a72), log id: 7453bee4
2012-07-30 11:30:00,230 INFO  [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-82) [4a445a46] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected :  ID: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 Type: VDS
2012-07-30 11:30:00,241 INFO  [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-82) [c3af50e] Running command: HandleVdsVersionCommand internal: true. Entities affected :  ID: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 Type: VDS
2012-07-30 11:30:00,244 INFO  [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [c3af50e] FINISH, ActivateVdsVDSCommand, log id: 7453bee4
2012-07-30 11:30:00,296 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [c3af50e] Lock freed to object EngineLock [exclusiveLocks= key: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 value: VDS
, sharedLocks= ]
2012-07-30 11:30:00,298 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [c3af50e] Autorecovering hosts: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388
2012-07-30 11:30:00,304 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [52359e08] Lock Acquired to object EngineLock [exclusiveLocks= key: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 value: VDS
, sharedLocks= ]
2012-07-30 11:30:00,305 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [52359e08] Running command: ActivateVdsCommand internal: true. Entities affected :  ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS
2012-07-30 11:30:00,309 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [52359e08] START, SetVdsStatusVDSCommand(vdsId = 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388, status=Unassigned, nonOperationalReason=NONE), log id: 696dbb82
2012-07-30 11:30:00,330 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [52359e08] FINISH, SetVdsStatusVDSCommand, log id: 696dbb82
2012-07-30 11:30:00,338 INFO  [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [52359e08] START, ActivateVdsVDSCommand(vdsId = 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388), log id: 35aa667a
2012-07-30 11:30:00,505 INFO  [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-82) [642ad778] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected :  ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS
2012-07-30 11:30:00,516 INFO  [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-82) [7ac0ddab] Running command: HandleVdsVersionCommand internal: true. Entities affected :  ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS
2012-07-30 11:30:00,519 INFO  [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [7ac0ddab] FINISH, ActivateVdsVDSCommand, log id: 35aa667a

Comment 2 mkublin 2012-08-02 11:57:48 UTC
http://gerrit.ovirt.org/#/c/6858/

Comment 3 mkublin 2012-08-02 11:58:44 UTC
pushed

Comment 4 Ayal Baron 2012-08-06 20:28:41 UTC
not pushed downstream, moving to post

Comment 9 Barak 2012-10-16 14:37:44 UTC
*** Bug 729262 has been marked as a duplicate of this bug. ***

Comment 10 Omri Hochman 2012-11-07 13:42:27 UTC
Unable to reproduce SI23.


Note You need to log in before you can comment on or make changes to this bug.