Bug 1277013 - ovirt-ha-agent gets killed after some time
Summary: ovirt-ha-agent gets killed after some time
Keywords:
Status: CLOSED DUPLICATE of bug 1276650
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Agent
Version: 1.3.1
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
: ---
Assignee: Martin Sivák
QA Contact: Ilanit Stein
URL:
Whiteboard: integration
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-02 04:56 UTC by Ramesh N
Modified: 2015-11-02 08:32 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-11-02 08:31:35 UTC
oVirt Team: ---
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
agent log (276.13 KB, text/plain)
2015-11-02 04:56 UTC, Ramesh N
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1277010 0 high CLOSED hosted-engine --deploy fails in second host when using gluster volume 2021-02-22 00:41:40 UTC

Internal Links: 1277010

Description Ramesh N 2015-11-02 04:56:37 UTC
Created attachment 1088458 [details]
agent log

Description of problem:
 
ovirt-ha-agent gets killed after some time with the error  "Too many errors occurred, giving up. Please review the log and consider filing a bug."

Version-Release number of selected component (if applicable):

ovirt-hosted-engine-ha-1.3.1

How reproducible:

Always

Steps to Reproduce:
1. Setup hosted engine with gluster volume using "hosted-engine --deploy" in first host 
2. Setup hosted engine with gluster volume using "hosted-engine --deploy" in second host
3. Check "service ovirt-ha-agent status"

Actual results:

 ovirt-ha-agent service is failed

Expected results:
 
ovirt-ha-agent service should be up and running. 

Additional info:

Same issue is seen in Third host as well. 

Note: "hosted-engine --deploy" failed in second and third host and fixed with workaround as mentioned in bz#1277010

Comment 1 Doron Fediuck 2015-11-02 07:19:11 UTC
The agent is designed to quit after several retries, as you can see in the
message:"Too many errors occurred, giving up."
Looking at the log file this seems to be a setup issue, unrelated to
the agent. So you first need to have a working environment and only then this
will become an issue. Can you reproduce this issue on a non-gluster working setup?

Comment 2 Simone Tiraboschi 2015-11-02 08:31:35 UTC
The real error is this one:

MainThread::ERROR::2015-10-29 15:20:31,256::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'list index out of range' - trying to restart agent

And it's not an agent error: VDSM raises an exception on getImagesList if called on an unattached storage domain, please see:
https://bugzilla.redhat.com/show_bug.cgi?id=1274622

We have also a workaround for it if we are not able to fix VDSM in time, please see:
https://bugzilla.redhat.com/show_bug.cgi?id=1276650

*** This bug has been marked as a duplicate of bug 1276650 ***


Note You need to log in before you can comment on or make changes to this bug.