Created attachment 1088458 [details] agent log Description of problem: ovirt-ha-agent gets killed after some time with the error "Too many errors occurred, giving up. Please review the log and consider filing a bug." Version-Release number of selected component (if applicable): ovirt-hosted-engine-ha-1.3.1 How reproducible: Always Steps to Reproduce: 1. Setup hosted engine with gluster volume using "hosted-engine --deploy" in first host 2. Setup hosted engine with gluster volume using "hosted-engine --deploy" in second host 3. Check "service ovirt-ha-agent status" Actual results: ovirt-ha-agent service is failed Expected results: ovirt-ha-agent service should be up and running. Additional info: Same issue is seen in Third host as well. Note: "hosted-engine --deploy" failed in second and third host and fixed with workaround as mentioned in bz#1277010
The agent is designed to quit after several retries, as you can see in the message:"Too many errors occurred, giving up." Looking at the log file this seems to be a setup issue, unrelated to the agent. So you first need to have a working environment and only then this will become an issue. Can you reproduce this issue on a non-gluster working setup?
The real error is this one: MainThread::ERROR::2015-10-29 15:20:31,256::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'list index out of range' - trying to restart agent And it's not an agent error: VDSM raises an exception on getImagesList if called on an unattached storage domain, please see: https://bugzilla.redhat.com/show_bug.cgi?id=1274622 We have also a workaround for it if we are not able to fix VDSM in time, please see: https://bugzilla.redhat.com/show_bug.cgi?id=1276650 *** This bug has been marked as a duplicate of bug 1276650 ***