Bug 1513484

Summary: failure of email message delivery is not reported anywhere
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Martin Bukatovic <mbukatov>
Component: web-admin-tendrl-notifierAssignee: Nishanth Thomas <nthomas>
Status: CLOSED WONTFIX QA Contact: sds-qe-bugs
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: gshanmug, mkudlej, sankarshan
Target Milestone: ---Keywords: Reopened, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-08 19:57:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1515276    
Bug Blocks:    

Description Martin Bukatovic 2017-11-15 14:07:56 UTC
Description of problem
======================

When delivery of an email notification fails, tendrl-notifier doesn't report
the problem anywhere so that it's not possible to find out that something went
wrong (even though notifier has an opportunity to notice and report the
problem).

Version-Release
===============

tendrl-notifier-1.5.4-2.el7rhgs.noarch

All RHGSWA components:

# rpm -qa | grep tendrl | sort
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-commons-1.5.4-2.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-3.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-3.el7rhgs.noarch
tendrl-node-agent-1.5.4-2.el7rhgs.noarch
tendrl-notifier-1.5.4-2.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-ui-1.5.4-2.el7rhgs.noarch

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHGSWA using tendrl-ansible
2. Import any gluser trusted storage pool with a volume
3. Enable email alerting (following documentation) and configure
   tendrl notifier to send emails via smtp server you operate (see
   step 5 to understand why)
4. Perform any action, which tendrl alerting will send event about,
   eg. stop a volume or shutdown a storage machine and see that you
   have received the email messages (to validate the setup).
5. Stop the smpt server tendrl notifier is talking to
6. Perform any action, which tendrl alerting will send event about,
   eg. stop a volume or shutdown a storage machine.
7. See logs of tendrl notifier

Note: one can use qe playbook for step 3, see:

* https://github.com/usmqe/usmqe-setup/blob/e7d174ab4c970ee535954a9efdc2e4db245f18cf/test_setup.smtp.yml
  (version I was using when I reported this BZ)
* https://github.com/usmqe/usmqe-setup/blob/master/test_setup.smtp.yml (latest version)

Actual results
==============

Email notifications couldn't be send, because the smtp server tendrl uses is
not running:

```
[root@usm1-client ~]# systemctl status postfix
● postfix.service - Postfix Mail Transport Agent
   Loaded: loaded (/usr/lib/systemd/system/postfix.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2017-11-15 08:40:32 EST; 13min ago
  Process: 25826 ExecStop=/usr/sbin/postfix stop (code=exited, status=0/SUCCESS)
  Process: 25439 ExecStart=/usr/sbin/postfix start (code=exited, status=0/SUCCESS)
  Process: 25436 ExecStartPre=/usr/libexec/postfix/chroot-update (code=exited, status=0/SUCCESS)
  Process: 25433 ExecStartPre=/usr/libexec/postfix/aliasesdb (code=exited, status=0/SUCCESS)
 Main PID: 25511 (code=killed, signal=TERM)
```

Trying to connect to the smtp server from the tendrl server machine:

```
[root@usm1-server ~]# grep email_smtp_server /etc/tendrl/notifier/email.conf.yaml
email_smtp_server: usm1-client.example.com
[root@usm1-server ~]# telnet usm1-client.example.com 25
Trying 10.37.169.25...
telnet: connect to address 10.37.169.25: Connection refused
```

Which is expected, because it was me who stopped the smtp server. I list this
example to demonstrate particular failure I tested with.

But there is no error reported about this in tendrl-notifier logs:

```
[root@usm1-server ~]# journalctl -u tendrl-notifier -fe
```

The last log line I see contains no info about the problem:

```
Nov 15 03:11:46 mbukatov-usm1-server.usmqe.lab.eng.brq.redhat.com tendrl-notifier[21602]: ramsRegistering atom namespace.tendrl.objects.Cluster.atoms.ValidImportClusterParamsFinding flows in namespace.tendrl.objects.Cluster.flowsRegistering object namespace.tendrl.objects.AlertFinding atoms in namespace.tendrl.objects.Alert.atomsFinding flows in namespace.tendrl.objects.Alert.flowsRegistering object namespace.tendrl.objects.ClusterAlertFinding atoms in namespace.tendrl.objects.ClusterAlert.atomsFinding flows in namespace.tendrl.objects.ClusterAlert.flowsRegistering object namespace.tendrl.objects.ClusterAlertCountersFinding atoms in namespace.tendrl.objects.ClusterAlertCounters.atomsFinding flows in namespace.tendrl.objects.ClusterAlertCounters.flowsRegistering object namespace.tendrl.objects.ClusterNodeContextFinding atoms in namespace.tendrl.objects.ClusterNodeContext.atomsFinding flows in namespace.tendrl.objects.ClusterNodeContext.flowsRegistering object namespace.tendrl.objects.ClusterTendrlContextFinding atoms in namespace.tendrl.objects.ClusterTendrlContext.atomsFinding flows in namespace.tendrl.objects.ClusterTendrlContext.flowsRegistering object namespace.tendrl.objects.CpuFinding atoms in namespace.tendrl.objects.Cpu.atomsFinding flows in namespace.tendrl.objects.Cpu.flowsRegistering object namespace.tendrl.objects.DefinitionFinding atoms in namespace.tendrl.objects.Definition.atomsFinding flows in namespace.tendrl.objects.Definition.flowsRegistering object namespace.tendrl.objects.DetectedClusterFinding atoms in namespace.tendrl.objects.DetectedCluster.atomsFinding flows in namespace.tendrl.objects.DetectedCluster.flowsRegistering object namespace.tendrl.objects.DiskFinding atoms in namespace.tendrl.objects.Disk.atomsFinding flows in namespace.tendrl.objects.Disk.flowsRegistering object namespace.tendrl.objects.JobFinding atoms in namespace.tendrl.objects.Job.atomsFinding flows in namespace.tendrl.objects.Job.flowsRegistering object namespace.tendrl.objects.MemoryFinding atoms in namespace.tendrl.objects.Memory.atomsFinding flows in namespace.tendrl.objects.Memory.flowsRegistering object na
```

Nor I see the error about this reported anywhere else.

Expected results
================

Tendrl logs the problem with enough details, so that admin will know what
has happened.

At least tendrl-notifier logs should provide the error:

```
[root@usm1-server ~]# journalctl -u tendrl-notifier -fe
```

Comment 2 Martin Bukatovic 2017-11-15 14:11:41 UTC
Additional Information
======================

Besides this particular reporting problem, notifier seems to operate nominally. When I enable snmpv3 trap messages to be send, it sends them with success while ignoring that it was not possible to deliver email ones.

Comment 3 RHEL Program Management 2017-11-15 16:52:53 UTC
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.