Created attachment 601153 [details] engine.log ovirt-engine-backend [scalability]: Engine sets hosts to NonOperational based on cached information from a Storage-domain which is no longer problematic. Description: ************* I have Setup environment with 6 hosts and 50 iSCSI SD's, one of the hosts is SPM and constantly in 'Up' status, the other 5 hosts are switching between 'Up' and 'Non-Operational' status. According the engine.logs the hosts have a problem with one of the SDs and there for engine switch the hosts to Non-operational, but few minutes later the auto-recovery checks if there's a connection to All Storage-domains and deciding to initiate the hosts back to 'Up'. Investigation showed: ********************** rhevm-engine writes to cache 'problematic Storage-Domains'. engine do not clear the cached list when connection to storage domain resume, there for Engine sets hosts to NonOperational and auto-recovery resume them. Results: ******** (5 out of 6) hosts - kept switching between 'Up' to 'non-operational'. Engine.log *********** 2012-07-30 11:25:52,405 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-29) [cfe82d] vds puma05 reported domain b7491d9b-a0fb-4 323-b787-b782e4695872:3600601601282300032f94e5b78cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational 2012-07-30 11:25:55,334 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-13) [7140b355] starting ProcessDomainRecovery for domai n d541c3a6-961f-49b3-a6c2-79e8b537f0c8 2012-07-30 11:25:55,363 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-13) [7140b355] vds puma06 reported domain d541c3a6-961f -49b3-a6c2-79e8b537f0c8:360060160128230006664545578cbe111 as in problem, moving the vds to status NonOperational 2012-07-30 11:25:55,371 WARN [org.ovirt.engine.core.dal.job.ExecutionMessageDirector] (QuartzScheduler_Worker-13) [7b17c6cf] The message key SetNonOperationalVds is miss ing from bundles/ExecutionMessages 2012-07-30 11:25:55,410 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-13) [7b17c6cf] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: 8b8a000e-d97e-11e1-aaed-97cb62be7a95 Type: VDS 2012-07-30 11:25:55,411 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-13) [7b17c6cf] START, SetVdsStatusVDSCommand(vdsId = 8b8a00 0e-d97e-11e1-aaed-97cb62be7a95, status=NonOperational, nonOperationalReason=STORAGE_DOMAIN_UNREACHABLE), log id: 2cd8cb2c 2012-07-30 11:25:55,431 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-13) [7b17c6cf] FINISH, SetVdsStatusVDSCommand, log id: 2cd8 cb2c 2012-07-30 11:26:02,744 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-72) [2313f95f] starting ProcessDomainRecovery for domai n 4c84bb36-96fa-4b73-89e1-90f81c394483 2012-07-30 11:26:02,770 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-72) [2313f95f] vds puma09 reported domain 4c84bb36-96fa -4b73-89e1-90f81c394483:360060160128230006e64545578cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational 2012-07-30 11:26:02,770 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-75) [560acdcf] starting ProcessDomainRecovery for domai n fc5b0ee4-6542-4c76-bc79-7afa9d2e2c71 2012-07-30 11:26:02,792 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-75) [560acdcf] vds puma09 reported domain fc5b0ee4-6542 -4c76-bc79-7afa9d2e2c71:360060160128230003ef94e5b78cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational 2012-07-30 11:26:02,792 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) [18ddfad7] starting ProcessDomainRecovery for domai n f29a9642-e5c4-474d-8a0a-6cfbc6dcba15 2012-07-30 11:26:02,819 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) [18ddfad7] vds puma09 reported domain f29a9642-e5c4 -474d-8a0a-6cfbc6dcba15:3600601601282300044f94e5b78cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational 2012-07-30 11:26:02,819 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-8) [b04efda] starting ProcessDomainRecovery for domain fde98d91-61e2-42aa-aef3-5b083b6635dd 2012-07-30 11:26:02,842 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-8) [b04efda] vds puma09 reported domain fde98d91-61e2-4 2aa-aef3-5b083b6635dd:360060160128230000db5096878cbe111 as in problem, vds is in status NonOperational, no need to move to nonoperational 2012-07-30 11:30:00,000 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [4cee01f6] Checking autorecoverable hosts 2012-07-30 11:30:00,019 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [4cee01f6] Autorecovering 5 hosts 2012-07-30 11:30:00,019 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [4cee01f6] Autorecovering hosts: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 2012-07-30 11:30:00,024 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [282f2529] Lock Acquired to object EngineLock [exclusiveLocks= key: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 value: VDS , sharedLocks= ] 2012-07-30 11:30:00,024 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [282f2529] Running command: ActivateVdsCommand internal: true. Entities affected : ID: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 Type: VDS 2012-07-30 11:30:00,027 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [282f2529] START, SetVdsStatusVDSCommand(vdsId = 89813a7a-d97e-11e1-a44d-1b8ec9396a72, status=Unassigned, nonOperationalReason=NONE), log id: a2adbc 2012-07-30 11:30:00,038 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [282f2529] FINISH, SetVdsStatusVDSCommand, log id: a2adbc 2012-07-30 11:30:00,055 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [282f2529] START, ActivateVdsVDSCommand(vdsId = 89813a7a-d97e-11e1-a44d-1b8ec9396a72), log id: 7453bee4 2012-07-30 11:30:00,230 INFO [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-82) [4a445a46] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected : ID: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 Type: VDS 2012-07-30 11:30:00,241 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-82) [c3af50e] Running command: HandleVdsVersionCommand internal: true. Entities affected : ID: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 Type: VDS 2012-07-30 11:30:00,244 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [c3af50e] FINISH, ActivateVdsVDSCommand, log id: 7453bee4 2012-07-30 11:30:00,296 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [c3af50e] Lock freed to object EngineLock [exclusiveLocks= key: 89813a7a-d97e-11e1-a44d-1b8ec9396a72 value: VDS , sharedLocks= ] 2012-07-30 11:30:00,298 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-82) [c3af50e] Autorecovering hosts: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 2012-07-30 11:30:00,304 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [52359e08] Lock Acquired to object EngineLock [exclusiveLocks= key: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 value: VDS , sharedLocks= ] 2012-07-30 11:30:00,305 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (QuartzScheduler_Worker-82) [52359e08] Running command: ActivateVdsCommand internal: true. Entities affected : ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS 2012-07-30 11:30:00,309 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [52359e08] START, SetVdsStatusVDSCommand(vdsId = 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388, status=Unassigned, nonOperationalReason=NONE), log id: 696dbb82 2012-07-30 11:30:00,330 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-82) [52359e08] FINISH, SetVdsStatusVDSCommand, log id: 696dbb82 2012-07-30 11:30:00,338 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [52359e08] START, ActivateVdsVDSCommand(vdsId = 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388), log id: 35aa667a 2012-07-30 11:30:00,505 INFO [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (QuartzScheduler_Worker-82) [642ad778] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected : ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS 2012-07-30 11:30:00,516 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (QuartzScheduler_Worker-82) [7ac0ddab] Running command: HandleVdsVersionCommand internal: true. Entities affected : ID: 8a9ab47c-d97e-11e1-ad7a-43fef5f1b388 Type: VDS 2012-07-30 11:30:00,519 INFO [org.ovirt.engine.core.vdsbroker.ActivateVdsVDSCommand] (QuartzScheduler_Worker-82) [7ac0ddab] FINISH, ActivateVdsVDSCommand, log id: 35aa667a
http://gerrit.ovirt.org/#/c/6858/
pushed
not pushed downstream, moving to post
*** Bug 729262 has been marked as a duplicate of this bug. ***
Unable to reproduce SI23.