Description of problem: Due to different reasons, when the broker sends a notification email, smtplib can raise an IOError exception that is not correctly handled. The broker simply reports back the exception to agent that logs a confusing message: MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent and restarts Here the issue: the issue is here: at 2016-05-03 15:27:56,263 ovirt-ha-agent notices a state transition ans try do send a notification email: MainThread::INFO::2016-05-03 15:27:56,263::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1462289276.26 type=state_transition detail=StartState-ReinitializeFSM hostname='flex08' at 2016-05-03 15:27:56,346 the broker fails sending the notification with an IOError: Thread-3193::ERROR::2016-05-03 15:27:56,346::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: "notify time=1462289276.26 type=state_transition detail=StartState-ReinitializeFSM hostname='flex08'" Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 302, in _dispatch if notifications.notify(**options): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 102, in notify return send_email(smtp_config, email_body) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 31, in send_email message.as_string()) File "/usr/lib64/python2.7/smtplib.py", line 717, in sendmail self.ehlo_or_helo_if_needed() File "/usr/lib64/python2.7/smtplib.py", line 541, in ehlo_or_helo_if_needed if not (200 <= self.ehlo()[0] <= 299): File "/usr/lib64/python2.7/smtplib.py", line 412, in ehlo self.putcmd(self.ehlo_msg, name or self.local_hostname) File "/usr/lib64/python2.7/smtplib.py", line 340, in putcmd self.send(str) File "/usr/lib64/python2.7/smtplib.py", line 324, in send print>>stderr, 'send:', repr(str) IOError: [Errno 32] Broken pipe so at 15:27:56,346 the agent restarts: MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent Version-Release number of selected component (if applicable): How reproducible: it depends on SMTP server status Steps to Reproduce: 1. deploy hosted-engine with an external SMTP 2. bring down the SMTP server 3. trigger a notification via a state change Actual results: The agent restarts with: MainThread::ERROR::2016-05-03 15:27:56,346::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'flex08'}: Request failed: <type 'exceptions.IOError'>' - trying to restart agent Expected results: The broker correctly handles IOError Additional info:
Deployed hosted engine on single host from rhevm-appliance-20160515.0-1.el7ev.noarch and everything worked well using scenario as follows: 1)Clean deployment over NFS of hosted engine with SMTP configuration over single host from the appliance. 2)Addition of NFS storage to get auto-import to finish import of the HE-VM. 3)Reject rule implementation on host in order to simulate SMTP server connectivity issue e.g. iptables -A OUTPUT -p tcp --dport 25 -j REJECT. 4)Placed HE in to HA-Global Maintenance via WEBUI, did not received email notification. 5)Observed error in broker.log without ha-agent being restarted: Thread-1::ERROR::2016-05-16 17:23:40,156::notifications::35::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno 111] C onnection refused Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 24, in send_email server = smtplib.SMTP(cfg["smtp-server"], port=cfg["smtp-port"]) File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__ (code, msg) = self.connect(host, port) File "/usr/lib64/python2.7/smtplib.py", line 315, in connect self.sock = self._get_socket(host, port, self.timeout) File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket return socket.create_connection((host, port), timeout) File "/usr/lib64/python2.7/socket.py", line 571, in create_connection raise err error: [Errno 111] Connection refused Thread-6255::INFO::2016-05-16 17:23:41,234::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established agent reported as follows: MainThread::INFO::2016-05-16 17:23:04,486::hosted_engine::462::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state GlobalMaintenance (score: 3400) MainThread::INFO::2016-05-16 17:23:14,536::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1463408 594.54 type=state_transition detail=GlobalMaintenance-ReinitializeFSM hostname='alma04.qa.lab.tlv.redhat.com' MainThread::INFO::2016-05-16 17:23:15,769::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification o f state_transition (GlobalMaintenance-ReinitializeFSM) sent? ignored MainThread::INFO::2016-05-16 17:23:15,769::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initial izing VDSM MainThread::INFO::2016-05-16 17:23:15,822::hosted_engine::658::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_image s) Connecting the storage 6)Release the HE-VM from the global maintenance via WEBUI and also did not received email notification as SMTP was still blocked, no errors or ha-agent restarted. 7)Released SMTP iptables blocking rule e.g. # iptables --flush [root@alma04 ~]# iptables --list Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination 8)No email notifications received as they were not saved or stack in queue. 9)Executed again steps 4&6, but this time without SMTP blocking iptables rule. 10)Received 4 notifications over the email and logs indicated this as well: MainThread::INFO::2016-05-16 17:49:39,219::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification o f state_transition (EngineUp-GlobalMaintenance) sent? sent MainThread::INFO::2016-05-16 17:49:39,219::hosted_engine::613::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initial izing VDSM Emails: 1) ovirt-hosted-engine state transition EngineUp-GlobalMaintenance The state machine changed state. 2) ovirt-hosted-engine state transition GlobalMaintenance-ReinitializeFSM The state machine changed state. 3) ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting The state machine changed state. 4) ovirt-hosted-engine state transition EngineStarting-EngineUp The state machine changed state. Components on engine: rhevm-reports-setup-3.6.5.1-1.el6ev.noarch rhevm-setup-base-3.6.6.2-0.1.el6.noarch rhevm-websocket-proxy-3.6.6.2-0.1.el6.noarch rhevm-webadmin-portal-3.6.6.2-0.1.el6.noarch rhevm-spice-client-x64-cab-3.6-7.el6.noarch rhevm-dbscripts-3.6.6.2-0.1.el6.noarch rhevm-setup-plugins-3.6.5-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.6.2-0.1.el6.noarch rhevm-extensions-api-impl-3.6.6.2-0.1.el6.noarch rhevm-dwh-setup-3.6.6-1.el6ev.noarch rhevm-reports-3.6.5.1-1.el6ev.noarch rhevm-doc-3.6.0-7.el6eng.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.6.2-0.1.el6.noarch rhevm-setup-plugin-websocket-proxy-3.6.6.2-0.1.el6.noarch rhevm-branding-rhev-3.6.0-9.el6ev.noarch rhevm-spice-client-x86-msi-3.6-7.el6.noarch rhevm-spice-client-x64-msi-3.6-7.el6.noarch rhevm-userportal-3.6.6.2-0.1.el6.noarch rhevm-restapi-3.6.6.2-0.1.el6.noarch rhevm-backend-3.6.6.2-0.1.el6.noarch rhevm-setup-3.6.6.2-0.1.el6.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.6.2-0.1.el6.noarch rhevm-3.6.6.2-0.1.el6.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-cli-3.6.2.0-1.el6ev.noarch rhevm-lib-3.6.6.2-0.1.el6.noarch rhevm-dwh-3.6.6-1.el6ev.noarch rhevm-dependencies-3.6.0-1.el6ev.noarch rhevm-tools-backup-3.6.6.2-0.1.el6.noarch rhevm-spice-client-x86-cab-3.6-7.el6.noarch rhevm-tools-3.6.6.2-0.1.el6.noarch rhevm-vmconsole-proxy-helper-3.6.6.2-0.1.el6.noarch rhevm-sdk-python-3.6.5.0-1.el6ev.noarch rhevm-guest-agent-common-1.0.11-6.el6ev.noarch rhevm-image-uploader-3.6.0-1.el6ev.noarch Linux 2.6.32-642.el6.x86_64 #1 SMP Wed Apr 13 00:51:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.8 (Santiago) Host: rhevm-appliance-20160515.0-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.4.x86_64 vdsm-4.17.28-0.el7ev.noarch mom-0.5.3-1.el7ev.noarch ovirt-vmconsole-host-1.0.2-2.el7ev.noarch ovirt-hosted-engine-setup-1.3.6.1-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.13.x86_64 sanlock-3.2.4-2.el7_2.x86_64 ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-vmconsole-1.0.2-2.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.5-1.el7ev.noarch ovirt-vmconsole-host-1.0.2-2.el7ev.noarch ovirt-hosted-engine-setup-1.3.6.1-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-vmconsole-1.0.2-2.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.5-1.el7ev.noarch Red Hat Enterprise Linux Server release 7.2 (Maipo) Linux 3.10.0-327.18.2.el7.x86_64 #1 SMP Fri Apr 8 05:09:53 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux