Bug 844291 - ovirt-engine-backend [scalability]: Engine sets hosts to NonOperational based on cached information from a Storage-domain which is no longer problematic.
ovirt-engine-backend [scalability]: Engine sets hosts to NonOperational based...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
x86_64 Linux
high Severity urgent
: ---
: ---
Assigned To: mkublin
Omri Hochman
infra
:
: 729262 (view as bug list)
Depends On: 844656
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-30 04:49 EDT by Omri Hochman
Modified: 2016-02-10 14:32 EST (History)
13 users (show)

See Also:
Fixed In Version: si19
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine.log (669.74 KB, application/octet-stream)
2012-07-30 04:49 EDT, Omri Hochman
no flags Details

  None (edit)
Description Omri Hochman 2012-07-30 04:49:14 EDT
Created attachment 601153 [details]
engine.log

ovirt-engine-backend [scalability]: Engine sets hosts to NonOperational based on cached information from a Storage-domain which is no longer problematic.

Description:
*************
I have Setup environment with 6 hosts and 50 iSCSI SD's, one of the hosts is SPM and constantly in 'Up' status, the other 5 hosts are switching between 'Up' and 'Non-Operational' status.  According the engine.logs the hosts have a problem with one of the SDs and there for engine switch the hosts to Non-operational, 
but few minutes later the auto-recovery checks if there's a connection to All Storage-domains and deciding to initiate the hosts back to 'Up'.

Investigation showed: 
**********************
rhevm-engine writes to cache 'problematic Storage-Domains'. engine do not clear the cached list when connection to storage domain resume, there for Engine sets hosts to NonOperational and auto-recovery resume them.

Results: 
********
(5 out of 6) hosts - kept switching between 'Up' to 'non-operational'.
  
Engine.log 
***********
2012-07-30 11:25:52,405 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-29) [cfe82d] vds puma05 reported domain b7491d9b-a0fb-4
323-b787-b782e4695872:3600601601282300032f94e5b78cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational
2012-07-30 11:25:55,334 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-13) [7140b355] starting ProcessDomainRecovery for domai
n d541c3a6-961f-49b3-a6c2-79e8b537f0c8
2012-07-30 11:25:55,363 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-13) [7140b355] vds puma06 reported domain d541c3a6-961f
-49b3-a6c2-79e8b537f0c8:360060160128230006664545578cbe111 as in problem, moving the vds to status NonOperational
2012-07-30 11:25:55,371 WARN  [org.ovirt.engine.core.dal.job.ExecutionMessageDirector] (QuartzScheduler_Worker-13) [7b17c6cf] The message key SetNonOperationalVds is miss
ing from bundles/ExecutionMessages
2012-07-30 11:25:55,410 INFO  [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-13) [7b17c6cf] Running command: SetNonOperationalVdsCommand 
internal: true. Entities affected :  ID: 8b8a000e-d97e-11e1-aaed-97cb62be7a95 Type: VDS
2012-07-30 11:25:55,411 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-13) [7b17c6cf] START, SetVdsStatusVDSCommand(vdsId = 8b8a00
0e-d97e-11e1-aaed-97cb62be7a95, status=NonOperational, nonOperationalReason=STORAGE_DOMAIN_UNREACHABLE), log id: 2cd8cb2c
2012-07-30 11:25:55,431 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-13) [7b17c6cf] FINISH, SetVdsStatusVDSCommand, log id: 2cd8
cb2c
2012-07-30 11:26:02,744 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-72) [2313f95f] starting ProcessDomainRecovery for domai
n 4c84bb36-96fa-4b73-89e1-90f81c394483
2012-07-30 11:26:02,770 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-72) [2313f95f] vds puma09 reported domain 4c84bb36-96fa
-4b73-89e1-90f81c394483:360060160128230006e64545578cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational
2012-07-30 11:26:02,770 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-75) [560acdcf] starting ProcessDomainRecovery for domai
n fc5b0ee4-6542-4c76-bc79-7afa9d2e2c71
2012-07-30 11:26:02,792 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-75) [560acdcf] vds puma09 reported domain fc5b0ee4-6542
-4c76-bc79-7afa9d2e2c71:360060160128230003ef94e5b78cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational
2012-07-30 11:26:02,792 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) [18ddfad7] starting ProcessDomainRecovery for domai
n f29a9642-e5c4-474d-8a0a-6cfbc6dcba15
2012-07-30 11:26:02,819 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) [18ddfad7] vds puma09 reported domain f29a9642-e5c4
-474d-8a0a-6cfbc6dcba15:3600601601282300044f94e5b78cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational
2012-07-30 11:26:02,819 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-8) [b04efda] starting ProcessDomainRecovery for domain 
fde98d91-61e2-42aa-aef3-5b083b6635dd
2012-07-30 11:26:02,842 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-8) [b04efda] vds puma09 reported domain fde98d91-61e2-4
2aa-aef3-5b083b6635dd:360060160128230000db5096878cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational
2012-07-30 11:30:00,000 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [4cee01f6] Checking autorecoverable hosts
2012-07-30 11:30:00,019 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [4cee01f6] Autorecovering 5 hosts
2012-07-30 11:30:00,019 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [4cee01f6] Autorecovering hosts: 89813a7a-d97e-11e1-a44d-1b8ec9396a72
2012-07-30 11:30:00,024 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [282f2529] Lock Acquired to object EngineLock [exclusiveLocks= key: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 value: VDS
, sharedLocks= ]
2012-07-30 11:30:00,024 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [282f2529] Running command: ActivateVdsCommand internal: true. Entities affected :  ID: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 Type: VDS
2012-07-30 11:30:00,027 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [282f2529] START, SetVdsStatusVDSCommand(vdsId = 89813a7a-d97e-11e1-a44d-1b8ec9396a72, status=Unassigned, nonOperationalReason=NONE), log id: a2adbc
2012-07-30 11:30:00,038 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [282f2529] FINISH, SetVdsStatusVDSCommand, log id: a2adbc
2012-07-30 11:30:00,055 INFO  [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [282f2529] START, ActivateVdsVDSCommand(vdsId = 89813a7a-d97e-11e1-a44d-1b8ec9396a72), log id: 7453bee4
2012-07-30 11:30:00,230 INFO  [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-82) [4a445a46] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected :  ID: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 Type: VDS
2012-07-30 11:30:00,241 INFO  [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-82) [c3af50e] Running command: HandleVdsVersionCommand internal: true. Entities affected :  ID: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 Type: VDS
2012-07-30 11:30:00,244 INFO  [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [c3af50e] FINISH, ActivateVdsVDSCommand, log id: 7453bee4
2012-07-30 11:30:00,296 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [c3af50e] Lock freed to object EngineLock [exclusiveLocks= key: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 value: VDS
, sharedLocks= ]
2012-07-30 11:30:00,298 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [c3af50e] Autorecovering hosts: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388
2012-07-30 11:30:00,304 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [52359e08] Lock Acquired to object EngineLock [exclusiveLocks= key: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 value: VDS
, sharedLocks= ]
2012-07-30 11:30:00,305 INFO  [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [52359e08] Running command: ActivateVdsCommand internal: true. Entities affected :  ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS
2012-07-30 11:30:00,309 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [52359e08] START, SetVdsStatusVDSCommand(vdsId = 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388, status=Unassigned, nonOperationalReason=NONE), log id: 696dbb82
2012-07-30 11:30:00,330 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [52359e08] FINISH, SetVdsStatusVDSCommand, log id: 696dbb82
2012-07-30 11:30:00,338 INFO  [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [52359e08] START, ActivateVdsVDSCommand(vdsId = 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388), log id: 35aa667a
2012-07-30 11:30:00,505 INFO  [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-82) [642ad778] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected :  ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS
2012-07-30 11:30:00,516 INFO  [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-82) [7ac0ddab] Running command: HandleVdsVersionCommand internal: true. Entities affected :  ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS
2012-07-30 11:30:00,519 INFO  [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [7ac0ddab] FINISH, ActivateVdsVDSCommand, log id: 35aa667a
Comment 2 mkublin 2012-08-02 07:57:48 EDT
http://gerrit.ovirt.org/#/c/6858/
Comment 3 mkublin 2012-08-02 07:58:44 EDT
pushed
Comment 4 Ayal Baron 2012-08-06 16:28:41 EDT
not pushed downstream, moving to post
Comment 9 Barak 2012-10-16 10:37:44 EDT
*** Bug 729262 has been marked as a duplicate of this bug. ***
Comment 10 Omri Hochman 2012-11-07 08:42:27 EST
Unable to reproduce SI23.

Note You need to log in before you can comment on or make changes to this bug.