Bug 1124624
Summary: | No error logged when agent restarts (ovirt-hosted-engine-ha-1.2 branch only) | ||
---|---|---|---|
Product: | [Retired] oVirt | Reporter: | Greg Padgett <gpadgett> |
Component: | ovirt-hosted-engine-ha | Assignee: | Greg Padgett <gpadgett> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.5 | CC: | amureini, ecohen, gklein, gpadgett, iheim, rbalakri, sbonazzo, yeylon |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | 3.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | sla | ||
Fixed In Version: | ovirt-3.5.0_rc1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-10-17 12:33:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1123006 |
Description
Greg Padgett
2014-07-29 23:37:58 UTC
Hi Greg, 1.I need the exact steps for this bug reproduction. 2.Please specify the file to be checked during failure in HE start. (In reply to Nikolai Sednev from comment #1) > Hi Greg, > 1.I need the exact steps for this bug reproduction. > 2.Please specify the file to be checked during failure in HE start. Hi Nikolai, Here's a way to reproduce that I just confirmed: 1. Stop the ovirt-ha-agent and ovirt-ha-broker services 2. Start the agent manually: /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (For this step, avoid using 'service' or 'systemctl' to start the agent because they will also start the broker, which for our test we are trying to avoid.) 3. Wait and check the log: /var/log/ovirt-hosted-engine-ha/agent.log For a message like the following: ("Error: '<...>' - trying to restart agent") If this message appears, the code is good. In my case, I see this, which took a few minutes to show up in a the log: MainThread::ERROR::2014-08-13 09:38:21,104::agent::172::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to connect to broker, the number of errors has exceeded the limit (10)' - trying to restart agent After the test is over, just kill the running agent, or if you wait long enough, it will stop itself. (In reply to Greg Padgett from comment #2) > (In reply to Nikolai Sednev from comment #1) > > Hi Greg, > > 1.I need the exact steps for this bug reproduction. > > 2.Please specify the file to be checked during failure in HE start. > > Hi Nikolai, > > Here's a way to reproduce that I just confirmed: > > 1. Stop the ovirt-ha-agent and ovirt-ha-broker services > > 2. Start the agent manually: > /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent > (For this step, avoid using 'service' or 'systemctl' to start the agent > because they will also start the broker, which for our test we are trying to > avoid.) > > 3. Wait and check the log: > /var/log/ovirt-hosted-engine-ha/agent.log > For a message like the following: > ("Error: '<...>' - trying to restart agent") > If this message appears, the code is good. > > In my case, I see this, which took a few minutes to show up in a the log: > MainThread::ERROR::2014-08-13 > 09:38:21,104::agent::172::ovirt_hosted_engine_ha.agent.agent.Agent:: > (_run_agent) Error: 'Failed to connect to broker, the number of errors has > exceeded the limit (10)' - trying to restart agent > > After the test is over, just kill the running agent, or if you wait long > enough, it will stop itself. Seems like we have a fix then, I'm receiving these on both of my hosts and engine works well: MainThread::ERROR::2014-08-13 18:55:23,459::agent::172::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to connect to broker, the num ber of errors has exceeded the limit (10)' - trying to restart agent MainThread::WARNING::2014-08-13 18:55:28,465::agent::175::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '3' MainThread::INFO::2014-08-13 18:55:28,487::hosted_engine::222::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate c ommon name: 10.35.64.85 MainThread::INFO::2014-08-13 18:55:28,803::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing h a-broker connection MainThread::INFO::2014-08-13 18:55:28,803::brokerlink::67::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(connect) Failed to connect to broker: [Errno 2] No such file or directory MainThread::INFO::2014-08-13 18:55:28,803::brokerlink::69::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(connect) Retrying broker connection in '5' se conds MainThread::INFO::2014-08-13 18:55:33,808::brokerlink::67::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(connect) Failed to connect to broker: [Errno 2] No such file or directory MainThread::INFO::2014-08-13 18:55:33,809::brokerlink::69::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(connect) Retrying broker connection in '5' se conds Both hosts shows via webui that their HE HA [N/A] as broker was stopped manually before. Components used: Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014 ovirt-engine-setup-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch ovirt-engine-setup-base-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch libvirt-0.10.2-29.el6_5.10.x86_64 sanlock-2.8-1.el6.x86_64 vdsm-4.16.1-6.gita4a4614.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64 oVirt 3.5 has been released and should include the fix for this issue. |