Bug 1332963
| Summary: | IOError from smtplib is not handled | ||
|---|---|---|---|
| Product: | [oVirt] ovirt-hosted-engine-ha | Reporter: | Simone Tiraboschi <stirabos> |
| Component: | Broker | Assignee: | Martin Sivák <msivak> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 1.3.5.3 | CC: | bugs, mavital, mgoldboi, obockows, rgolan, sbonazzo |
| Target Milestone: | ovirt-3.6.6 | Keywords: | Triaged |
| Target Release: | 1.3.5.5 | Flags: | rule-engine:
ovirt-3.6.z+
mgoldboi: planning_ack+ rgolan: devel_ack+ nsednev: testing_ack+ |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| URL: | https://polarion.engineering.redhat.com/polarion/#/project/RHEVM3/workitem?id=RHEVM-15420 | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
Connection to SMTP server blocked.
Consequence:
HA agent crashed because of unhandled exception in reporting code.
Fix:
Exception handling improved.
Result:
No mail is sent (because the smtp connection is broken), but the agent stays alive and tries again during next monitoring cycle.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-05-30 10:55:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1333143 | ||
Deployed hosted engine on single host from rhevm-appliance-20160515.0-1.el7ev.noarch and everything worked well using scenario as follows:
1)Clean deployment over NFS of hosted engine with SMTP configuration over single host from the appliance.
2)Addition of NFS storage to get auto-import to finish import of the HE-VM.
3)Reject rule implementation on host in order to simulate SMTP server connectivity issue e.g. iptables -A OUTPUT -p tcp --dport 25 -j REJECT.
4)Placed HE in to HA-Global Maintenance via WEBUI, did not received email notification.
5)Observed error in broker.log without ha-agent being restarted:
Thread-1::ERROR::2016-05-16 17:23:40,156::notifications::35::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno 111] C
onnection refused
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 24, in send_email
server = smtplib.SMTP(cfg["smtp-server"], port=cfg["smtp-port"])
File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
(code, msg) = self.connect(host, port)
File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
return socket.create_connection((host, port), timeout)
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
Thread-6255::INFO::2016-05-16 17:23:41,234::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established
agent reported as follows:
MainThread::INFO::2016-05-16 17:23:04,486::hosted_engine::462::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current
state GlobalMaintenance (score: 3400)
MainThread::INFO::2016-05-16 17:23:14,536::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1463408
594.54 type=state_transition detail=GlobalMaintenance-ReinitializeFSM hostname='alma04.qa.lab.tlv.redhat.com'
MainThread::INFO::2016-05-16 17:23:15,769::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification o
f state_transition (GlobalMaintenance-ReinitializeFSM) sent? ignored
MainThread::INFO::2016-05-16 17:23:15,769::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initial
izing VDSM
MainThread::INFO::2016-05-16 17:23:15,822::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_image
s) Connecting the storage
6)Release the HE-VM from the global maintenance via WEBUI and also did not received email notification as SMTP was still blocked, no errors or ha-agent restarted.
7)Released SMTP iptables blocking rule e.g.
# iptables --flush
[root@alma04 ~]# iptables --list
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
8)No email notifications received as they were not saved or stack in queue.
9)Executed again steps 4&6, but this time without SMTP blocking iptables rule.
10)Received 4 notifications over the email and logs indicated this as well:
MainThread::INFO::2016-05-16 17:49:39,219::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification o
f state_transition (EngineUp-GlobalMaintenance) sent? sent
MainThread::INFO::2016-05-16 17:49:39,219::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initial
izing VDSM
Emails:
1)
ovirt-hosted-engine state transition EngineUp-GlobalMaintenance
The state machine changed state.
2)
ovirt-hosted-engine state transition GlobalMaintenance-ReinitializeFSM
The state machine changed state.
3)
ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting
The state machine changed state.
4)
ovirt-hosted-engine state transition EngineStarting-EngineUp
The state machine changed state.
Components on engine:
rhevm-reports-setup-3.6.5.1-1.el6ev.noarch
rhevm-setup-base-3.6.6.2-0.1.el6.noarch
rhevm-websocket-proxy-3.6.6.2-0.1.el6.noarch
rhevm-webadmin-portal-3.6.6.2-0.1.el6.noarch
rhevm-spice-client-x64-cab-3.6-7.el6.noarch
rhevm-dbscripts-3.6.6.2-0.1.el6.noarch
rhevm-setup-plugins-3.6.5-1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.6.2-0.1.el6.noarch
rhevm-extensions-api-impl-3.6.6.2-0.1.el6.noarch
rhevm-dwh-setup-3.6.6-1.el6ev.noarch
rhevm-reports-3.6.5.1-1.el6ev.noarch
rhevm-doc-3.6.0-7.el6eng.noarch
rhevm-log-collector-3.6.1-1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.6.2-0.1.el6.noarch
rhevm-setup-plugin-websocket-proxy-3.6.6.2-0.1.el6.noarch
rhevm-branding-rhev-3.6.0-9.el6ev.noarch
rhevm-spice-client-x86-msi-3.6-7.el6.noarch
rhevm-spice-client-x64-msi-3.6-7.el6.noarch
rhevm-userportal-3.6.6.2-0.1.el6.noarch
rhevm-restapi-3.6.6.2-0.1.el6.noarch
rhevm-backend-3.6.6.2-0.1.el6.noarch
rhevm-setup-3.6.6.2-0.1.el6.noarch
rhevm-setup-plugin-vmconsole-proxy-helper-3.6.6.2-0.1.el6.noarch
rhevm-3.6.6.2-0.1.el6.noarch
rhevm-iso-uploader-3.6.0-1.el6ev.noarch
rhevm-cli-3.6.2.0-1.el6ev.noarch
rhevm-lib-3.6.6.2-0.1.el6.noarch
rhevm-dwh-3.6.6-1.el6ev.noarch
rhevm-dependencies-3.6.0-1.el6ev.noarch
rhevm-tools-backup-3.6.6.2-0.1.el6.noarch
rhevm-spice-client-x86-cab-3.6-7.el6.noarch
rhevm-tools-3.6.6.2-0.1.el6.noarch
rhevm-vmconsole-proxy-helper-3.6.6.2-0.1.el6.noarch
rhevm-sdk-python-3.6.5.0-1.el6ev.noarch
rhevm-guest-agent-common-1.0.11-6.el6ev.noarch
rhevm-image-uploader-3.6.0-1.el6ev.noarch
Linux 2.6.32-642.el6.x86_64 #1 SMP Wed Apr 13 00:51:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.8 (Santiago)
Host:
rhevm-appliance-20160515.0-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.4.x86_64
vdsm-4.17.28-0.el7ev.noarch
mom-0.5.3-1.el7ev.noarch
ovirt-vmconsole-host-1.0.2-2.el7ev.noarch
ovirt-hosted-engine-setup-1.3.6.1-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.13.x86_64
sanlock-3.2.4-2.el7_2.x86_64
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-vmconsole-1.0.2-2.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.5-1.el7ev.noarch
ovirt-vmconsole-host-1.0.2-2.el7ev.noarch
ovirt-hosted-engine-setup-1.3.6.1-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-vmconsole-1.0.2-2.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.5-1.el7ev.noarch
Red Hat Enterprise Linux Server release 7.2 (Maipo)
Linux 3.10.0-327.18.2.el7.x86_64 #1 SMP Fri Apr 8 05:09:53 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
|
Description of problem: Due to different reasons, when the broker sends a notification email, smtplib can raise an IOError exception that is not correctly handled. The broker simply reports back the exception to agent that logs a confusing message: MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent and restarts Here the issue: the issue is here: at 2016-05-03 15:27:56,263 ovirt-ha-agent notices a state transition ans try do send a notification email: MainThread::INFO::2016-05-03 15:27:56,263::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1462289276.26 type=state_transition detail=StartState-ReinitializeFSM hostname='flex08' at 2016-05-03 15:27:56,346 the broker fails sending the notification with an IOError: Thread-3193::ERROR::2016-05-03 15:27:56,346::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: "notify time=1462289276.26 type=state_transition detail=StartState-ReinitializeFSM hostname='flex08'" Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 302, in _dispatch if notifications.notify(**options): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 102, in notify return send_email(smtp_config, email_body) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 31, in send_email message.as_string()) File "/usr/lib64/python2.7/smtplib.py", line 717, in sendmail self.ehlo_or_helo_if_needed() File "/usr/lib64/python2.7/smtplib.py", line 541, in ehlo_or_helo_if_needed if not (200 <= self.ehlo()[0] <= 299): File "/usr/lib64/python2.7/smtplib.py", line 412, in ehlo self.putcmd(self.ehlo_msg, name or self.local_hostname) File "/usr/lib64/python2.7/smtplib.py", line 340, in putcmd self.send(str) File "/usr/lib64/python2.7/smtplib.py", line 324, in send print>>stderr, 'send:', repr(str) IOError: [Errno 32] Broken pipe so at 15:27:56,346 the agent restarts: MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent Version-Release number of selected component (if applicable): How reproducible: it depends on SMTP server status Steps to Reproduce: 1. deploy hosted-engine with an external SMTP 2. bring down the SMTP server 3. trigger a notification via a state change Actual results: The agent restarts with: MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent Expected results: The broker correctly handles IOError Additional info: