Bug 1332963

Summary: IOError from smtplib is not handled
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Simone Tiraboschi <stirabos>
Component: BrokerAssignee: Martin Sivák <msivak>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.3.5.3CC: bugs, mavital, mgoldboi, obockows, rgolan, sbonazzo
Target Milestone: ovirt-3.6.6Keywords: Triaged
Target Release: 1.3.5.5Flags: rule-engine: ovirt-3.6.z+
mgoldboi: planning_ack+
rgolan: devel_ack+
nsednev: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
URL: https://polarion.engineering.redhat.com/polarion/#/project/RHEVM3/workitem?id=RHEVM-15420
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Connection to SMTP server blocked. Consequence: HA agent crashed because of unhandled exception in reporting code. Fix: Exception handling improved. Result: No mail is sent (because the smtp connection is broken), but the agent stays alive and tries again during next monitoring cycle.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-30 10:55:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1333143    

Description Simone Tiraboschi 2016-05-04 12:30:06 UTC
Description of problem:
Due to different reasons, when the broker sends a notification email, smtplib can raise an IOError exception that is not correctly handled.
The broker simply reports back the exception to agent that logs a confusing message:

MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent

and restarts

Here the issue:
the issue is here:

at 2016-05-03 15:27:56,263 ovirt-ha-agent notices a state transition ans try do send a notification email:

MainThread::INFO::2016-05-03 15:27:56,263::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1462289276.26 type=state_transition detail=StartState-ReinitializeFSM hostname='flex08'

at 2016-05-03 15:27:56,346 the broker fails sending the notification with an IOError:
Thread-3193::ERROR::2016-05-03 15:27:56,346::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: "notify time=1462289276.26 type=state_transition detail=StartState-ReinitializeFSM hostname='flex08'"
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle
    data)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 302, in _dispatch
    if notifications.notify(**options):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 102, in notify
    return send_email(smtp_config, email_body)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 31, in send_email
    message.as_string())
  File "/usr/lib64/python2.7/smtplib.py", line 717, in sendmail
    self.ehlo_or_helo_if_needed()
  File "/usr/lib64/python2.7/smtplib.py", line 541, in ehlo_or_helo_if_needed
    if not (200 <= self.ehlo()[0] <= 299):
  File "/usr/lib64/python2.7/smtplib.py", line 412, in ehlo
    self.putcmd(self.ehlo_msg, name or self.local_hostname)
  File "/usr/lib64/python2.7/smtplib.py", line 340, in putcmd
    self.send(str)
  File "/usr/lib64/python2.7/smtplib.py", line 324, in send
    print>>stderr, 'send:', repr(str)
IOError: [Errno 32] Broken pipe

so at 15:27:56,346 the agent restarts:
MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent


Version-Release number of selected component (if applicable):


How reproducible:
it depends on SMTP server status

Steps to Reproduce:
1. deploy hosted-engine with an external SMTP
2. bring down the SMTP server
3. trigger a notification via a state change

Actual results:
The agent restarts with:
MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent

Expected results:
The broker correctly handles IOError

Additional info:

Comment 1 Nikolai Sednev 2016-05-16 15:09:15 UTC
Deployed hosted engine on single host from rhevm-appliance-20160515.0-1.el7ev.noarch and everything worked well using scenario as follows:
1)Clean deployment over NFS of hosted engine with SMTP configuration over single host from the appliance.
2)Addition of NFS storage to get auto-import to finish import of the HE-VM.
3)Reject rule implementation on host in order to simulate SMTP server connectivity issue e.g. iptables -A OUTPUT -p tcp --dport 25 -j REJECT.
4)Placed HE in to HA-Global Maintenance via WEBUI, did not received email notification.
5)Observed error in broker.log without ha-agent being restarted:
Thread-1::ERROR::2016-05-16 17:23:40,156::notifications::35::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno 111] C
onnection refused
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 24, in send_email
    server = smtplib.SMTP(cfg["smtp-server"], port=cfg["smtp-port"])
  File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
    (code, msg) = self.connect(host, port)
  File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
    self.sock = self._get_socket(host, port, self.timeout)
  File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
    return socket.create_connection((host, port), timeout)
  File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
Thread-6255::INFO::2016-05-16 17:23:41,234::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established

agent reported as follows:
MainThread::INFO::2016-05-16 17:23:04,486::hosted_engine::462::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current
 state GlobalMaintenance (score: 3400)
MainThread::INFO::2016-05-16 17:23:14,536::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1463408
594.54 type=state_transition detail=GlobalMaintenance-ReinitializeFSM hostname='alma04.qa.lab.tlv.redhat.com'
MainThread::INFO::2016-05-16 17:23:15,769::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification o
f state_transition (GlobalMaintenance-ReinitializeFSM) sent? ignored
MainThread::INFO::2016-05-16 17:23:15,769::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initial
izing VDSM
MainThread::INFO::2016-05-16 17:23:15,822::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_image
s) Connecting the storage


6)Release the HE-VM from the global maintenance via WEBUI and also did not received email notification as SMTP was still blocked, no errors or ha-agent restarted.
7)Released SMTP iptables blocking rule e.g.
# iptables --flush
[root@alma04 ~]# iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

8)No email notifications received as they were not saved or stack in queue.
9)Executed again steps 4&6, but this time without SMTP blocking iptables rule.
10)Received 4 notifications over the email and logs indicated this as well:
MainThread::INFO::2016-05-16 17:49:39,219::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification o
f state_transition (EngineUp-GlobalMaintenance) sent? sent
MainThread::INFO::2016-05-16 17:49:39,219::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initial
izing VDSM

Emails:
1)
ovirt-hosted-engine state transition EngineUp-GlobalMaintenance

The state machine changed state.

2)
ovirt-hosted-engine state transition GlobalMaintenance-ReinitializeFSM

The state machine changed state.

3)

ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting

The state machine changed state.

4)

ovirt-hosted-engine state transition EngineStarting-EngineUp

The state machine changed state.

Components on engine:
rhevm-reports-setup-3.6.5.1-1.el6ev.noarch
rhevm-setup-base-3.6.6.2-0.1.el6.noarch
rhevm-websocket-proxy-3.6.6.2-0.1.el6.noarch
rhevm-webadmin-portal-3.6.6.2-0.1.el6.noarch
rhevm-spice-client-x64-cab-3.6-7.el6.noarch
rhevm-dbscripts-3.6.6.2-0.1.el6.noarch
rhevm-setup-plugins-3.6.5-1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.6.2-0.1.el6.noarch
rhevm-extensions-api-impl-3.6.6.2-0.1.el6.noarch
rhevm-dwh-setup-3.6.6-1.el6ev.noarch
rhevm-reports-3.6.5.1-1.el6ev.noarch
rhevm-doc-3.6.0-7.el6eng.noarch
rhevm-log-collector-3.6.1-1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-common-3.6.6.2-0.1.el6.noarch
rhevm-setup-plugin-websocket-proxy-3.6.6.2-0.1.el6.noarch
rhevm-branding-rhev-3.6.0-9.el6ev.noarch
rhevm-spice-client-x86-msi-3.6-7.el6.noarch
rhevm-spice-client-x64-msi-3.6-7.el6.noarch
rhevm-userportal-3.6.6.2-0.1.el6.noarch
rhevm-restapi-3.6.6.2-0.1.el6.noarch
rhevm-backend-3.6.6.2-0.1.el6.noarch
rhevm-setup-3.6.6.2-0.1.el6.noarch
rhevm-setup-plugin-vmconsole-proxy-helper-3.6.6.2-0.1.el6.noarch
rhevm-3.6.6.2-0.1.el6.noarch
rhevm-iso-uploader-3.6.0-1.el6ev.noarch
rhevm-cli-3.6.2.0-1.el6ev.noarch
rhevm-lib-3.6.6.2-0.1.el6.noarch
rhevm-dwh-3.6.6-1.el6ev.noarch
rhevm-dependencies-3.6.0-1.el6ev.noarch
rhevm-tools-backup-3.6.6.2-0.1.el6.noarch
rhevm-spice-client-x86-cab-3.6-7.el6.noarch
rhevm-tools-3.6.6.2-0.1.el6.noarch
rhevm-vmconsole-proxy-helper-3.6.6.2-0.1.el6.noarch
rhevm-sdk-python-3.6.5.0-1.el6ev.noarch
rhevm-guest-agent-common-1.0.11-6.el6ev.noarch
rhevm-image-uploader-3.6.0-1.el6ev.noarch
Linux 2.6.32-642.el6.x86_64 #1 SMP Wed Apr 13 00:51:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.8 (Santiago)

Host:
rhevm-appliance-20160515.0-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.4.x86_64
vdsm-4.17.28-0.el7ev.noarch
mom-0.5.3-1.el7ev.noarch
ovirt-vmconsole-host-1.0.2-2.el7ev.noarch
ovirt-hosted-engine-setup-1.3.6.1-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.13.x86_64
sanlock-3.2.4-2.el7_2.x86_64
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-vmconsole-1.0.2-2.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.5-1.el7ev.noarch
ovirt-vmconsole-host-1.0.2-2.el7ev.noarch
ovirt-hosted-engine-setup-1.3.6.1-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-vmconsole-1.0.2-2.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.5-1.el7ev.noarch
Red Hat Enterprise Linux Server release 7.2 (Maipo)
Linux 3.10.0-327.18.2.el7.x86_64 #1 SMP Fri Apr 8 05:09:53 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux