Bug 1332963 - IOError from smtplib is not handled
Summary: IOError from smtplib is not handled
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Broker
Version: 1.3.5.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-3.6.6
: 1.3.5.5
Assignee: Martin Sivák
QA Contact: Nikolai Sednev
URL: https://polarion.engineering.redhat.c...
Whiteboard:
Depends On:
Blocks: 1333143
TreeView+ depends on / blocked
 
Reported: 2016-05-04 12:30 UTC by Simone Tiraboschi
Modified: 2019-10-10 12:07 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Connection to SMTP server blocked. Consequence: HA agent crashed because of unhandled exception in reporting code. Fix: Exception handling improved. Result: No mail is sent (because the smtp connection is broken), but the agent stays alive and tries again during next monitoring cycle.
Clone Of:
Environment:
Last Closed: 2016-05-30 10:55:07 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-3.6.z+
mgoldboi: planning_ack+
rgolan: devel_ack+
nsednev: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 57035 0 master MERGED Fix IOError handling when sending notification emails 2016-05-04 12:56:53 UTC
oVirt gerrit 57039 0 ovirt-hosted-engine-ha-1.3 MERGED Fix IOError handling when sending notification emails 2016-05-04 13:43:01 UTC

Description Simone Tiraboschi 2016-05-04 12:30:06 UTC
Description of problem:
Due to different reasons, when the broker sends a notification email, smtplib can raise an IOError exception that is not correctly handled.
The broker simply reports back the exception to agent that logs a confusing message:

MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent

and restarts

Here the issue:
the issue is here:

at 2016-05-03 15:27:56,263 ovirt-ha-agent notices a state transition ans try do send a notification email:

MainThread::INFO::2016-05-03 15:27:56,263::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1462289276.26 type=state_transition detail=StartState-ReinitializeFSM hostname='flex08'

at 2016-05-03 15:27:56,346 the broker fails sending the notification with an IOError:
Thread-3193::ERROR::2016-05-03 15:27:56,346::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: "notify time=1462289276.26 type=state_transition detail=StartState-ReinitializeFSM hostname='flex08'"
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle
    data)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 302, in _dispatch
    if notifications.notify(**options):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 102, in notify
    return send_email(smtp_config, email_body)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 31, in send_email
    message.as_string())
  File "/usr/lib64/python2.7/smtplib.py", line 717, in sendmail
    self.ehlo_or_helo_if_needed()
  File "/usr/lib64/python2.7/smtplib.py", line 541, in ehlo_or_helo_if_needed
    if not (200 <= self.ehlo()[0] <= 299):
  File "/usr/lib64/python2.7/smtplib.py", line 412, in ehlo
    self.putcmd(self.ehlo_msg, name or self.local_hostname)
  File "/usr/lib64/python2.7/smtplib.py", line 340, in putcmd
    self.send(str)
  File "/usr/lib64/python2.7/smtplib.py", line 324, in send
    print>>stderr, 'send:', repr(str)
IOError: [Errno 32] Broken pipe

so at 15:27:56,346 the agent restarts:
MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent


Version-Release number of selected component (if applicable):


How reproducible:
it depends on SMTP server status

Steps to Reproduce:
1. deploy hosted-engine with an external SMTP
2. bring down the SMTP server
3. trigger a notification via a state change

Actual results:
The agent restarts with:
MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent

Expected results:
The broker correctly handles IOError

Additional info:

Comment 1 Nikolai Sednev 2016-05-16 15:09:15 UTC
Deployed hosted engine on single host from rhevm-appliance-20160515.0-1.el7ev.noarch and everything worked well using scenario as follows:
1)Clean deployment over NFS of hosted engine with SMTP configuration over single host from the appliance.
2)Addition of NFS storage to get auto-import to finish import of the HE-VM.
3)Reject rule implementation on host in order to simulate SMTP server connectivity issue e.g. iptables -A OUTPUT -p tcp --dport 25 -j REJECT.
4)Placed HE in to HA-Global Maintenance via WEBUI, did not received email notification.
5)Observed error in broker.log without ha-agent being restarted:
Thread-1::ERROR::2016-05-16 17:23:40,156::notifications::35::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno 111] C
onnection refused
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 24, in send_email
    server = smtplib.SMTP(cfg["smtp-server"], port=cfg["smtp-port"])
  File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
    (code, msg) = self.connect(host, port)
  File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
    self.sock = self._get_socket(host, port, self.timeout)
  File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
    return socket.create_connection((host, port), timeout)
  File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
Thread-6255::INFO::2016-05-16 17:23:41,234::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established

agent reported as follows:
MainThread::INFO::2016-05-16 17:23:04,486::hosted_engine::462::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current
 state GlobalMaintenance (score: 3400)
MainThread::INFO::2016-05-16 17:23:14,536::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1463408
594.54 type=state_transition detail=GlobalMaintenance-ReinitializeFSM hostname='alma04.qa.lab.tlv.redhat.com'
MainThread::INFO::2016-05-16 17:23:15,769::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification o
f state_transition (GlobalMaintenance-ReinitializeFSM) sent? ignored
MainThread::INFO::2016-05-16 17:23:15,769::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initial
izing VDSM
MainThread::INFO::2016-05-16 17:23:15,822::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_image
s) Connecting the storage


6)Release the HE-VM from the global maintenance via WEBUI and also did not received email notification as SMTP was still blocked, no errors or ha-agent restarted.
7)Released SMTP iptables blocking rule e.g.
# iptables --flush
[root@alma04 ~]# iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

8)No email notifications received as they were not saved or stack in queue.
9)Executed again steps 4&6, but this time without SMTP blocking iptables rule.
10)Received 4 notifications over the email and logs indicated this as well:
MainThread::INFO::2016-05-16 17:49:39,219::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification o
f state_transition (EngineUp-GlobalMaintenance) sent? sent
MainThread::INFO::2016-05-16 17:49:39,219::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initial
izing VDSM

Emails:
1)
ovirt-hosted-engine state transition EngineUp-GlobalMaintenance

The state machine changed state.

2)
ovirt-hosted-engine state transition GlobalMaintenance-ReinitializeFSM

The state machine changed state.

3)

ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting

The state machine changed state.

4)

ovirt-hosted-engine state transition EngineStarting-EngineUp

The state machine changed state.

Components on engine:
rhevm-reports-setup-3.6.5.1-1.el6ev.noarch
rhevm-setup-base-3.6.6.2-0.1.el6.noarch
rhevm-websocket-proxy-3.6.6.2-0.1.el6.noarch
rhevm-webadmin-portal-3.6.6.2-0.1.el6.noarch
rhevm-spice-client-x64-cab-3.6-7.el6.noarch
rhevm-dbscripts-3.6.6.2-0.1.el6.noarch
rhevm-setup-plugins-3.6.5-1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.6.2-0.1.el6.noarch
rhevm-extensions-api-impl-3.6.6.2-0.1.el6.noarch
rhevm-dwh-setup-3.6.6-1.el6ev.noarch
rhevm-reports-3.6.5.1-1.el6ev.noarch
rhevm-doc-3.6.0-7.el6eng.noarch
rhevm-log-collector-3.6.1-1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.6.2-0.1.el6.noarch
rhevm-setup-plugin-websocket-proxy-3.6.6.2-0.1.el6.noarch
rhevm-branding-rhev-3.6.0-9.el6ev.noarch
rhevm-spice-client-x86-msi-3.6-7.el6.noarch
rhevm-spice-client-x64-msi-3.6-7.el6.noarch
rhevm-userportal-3.6.6.2-0.1.el6.noarch
rhevm-restapi-3.6.6.2-0.1.el6.noarch
rhevm-backend-3.6.6.2-0.1.el6.noarch
rhevm-setup-3.6.6.2-0.1.el6.noarch
rhevm-setup-plugin-vmconsole-proxy-helper-3.6.6.2-0.1.el6.noarch
rhevm-3.6.6.2-0.1.el6.noarch
rhevm-iso-uploader-3.6.0-1.el6ev.noarch
rhevm-cli-3.6.2.0-1.el6ev.noarch
rhevm-lib-3.6.6.2-0.1.el6.noarch
rhevm-dwh-3.6.6-1.el6ev.noarch
rhevm-dependencies-3.6.0-1.el6ev.noarch
rhevm-tools-backup-3.6.6.2-0.1.el6.noarch
rhevm-spice-client-x86-cab-3.6-7.el6.noarch
rhevm-tools-3.6.6.2-0.1.el6.noarch
rhevm-vmconsole-proxy-helper-3.6.6.2-0.1.el6.noarch
rhevm-sdk-python-3.6.5.0-1.el6ev.noarch
rhevm-guest-agent-common-1.0.11-6.el6ev.noarch
rhevm-image-uploader-3.6.0-1.el6ev.noarch
Linux 2.6.32-642.el6.x86_64 #1 SMP Wed Apr 13 00:51:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.8 (Santiago)

Host:
rhevm-appliance-20160515.0-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.4.x86_64
vdsm-4.17.28-0.el7ev.noarch
mom-0.5.3-1.el7ev.noarch
ovirt-vmconsole-host-1.0.2-2.el7ev.noarch
ovirt-hosted-engine-setup-1.3.6.1-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.13.x86_64
sanlock-3.2.4-2.el7_2.x86_64
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-vmconsole-1.0.2-2.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.5-1.el7ev.noarch
ovirt-vmconsole-host-1.0.2-2.el7ev.noarch
ovirt-hosted-engine-setup-1.3.6.1-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-vmconsole-1.0.2-2.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.5-1.el7ev.noarch
Red Hat Enterprise Linux Server release 7.2 (Maipo)
Linux 3.10.0-327.18.2.el7.x86_64 #1 SMP Fri Apr 8 05:09:53 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux


Note You need to log in before you can comment on or make changes to this bug.