1343651 – skyring service doesn't clearly mark it's own start or shutdown in the log

Bug 1343651 - skyring service doesn't clearly mark it's own start or shutdown in the log

Summary: skyring service doesn't clearly mark it's own start or shutdown in the log

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3
Assignee:	Nishanth Thomas
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-06-07 15:11 UTC by Martin Bukatovic
Modified:	2017-03-23 04:04 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-03-23 04:04:41 UTC
Embargoed:

Attachments	(Terms of Use)

Description Martin Bukatovic 2016-06-07 15:11:05 UTC

Description of problem
======================

Skyring service doesn't clearly log it's own start or shutdown in the logs so
that it's not always immediately obvious when the service was started, stopped
or restarted.

This could be confusing and would make debugging harder when there are some
issues with the service.

Version-Release
===============

rhscon-core-0.0.21-1.el7scon.x86_64

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation.
2. Make sure skyring service is running.
3. Stop it: systemctl stop skyring
4. Start it: systemctl start skyring
5. Check /var/log/skyring.log file
6. Stop it again: systemctl stop skyring
7. Make sure unix socket file /var/run/.skyring-event exists, in other words:
   create one manually on purpose like this:

~~~
# socat UNIX-LISTEN:/var/run/.skyring-event -
^C
# ls -l /var/run/.skyring-event
# lsof /var/run/.skyring-event
#
~~~

8. Start it again: systemctl start skyring (asking skyring to start while it's
   unix domain socket already exists make it shutdown itself outright after
   the start)
9. Check /var/log/skyring.log file

Teardown: remove leftover socket file: rm /var/run/.skyring-event

Actual results
==============

During step #5, I see this in the skyring log file:

~~~
2016-06-07T14:40:29.753+02:00 WARNING  monitoring.go:276 FetchAggregatedStatsFromGraphite] skyring:22419750-9573-422a-87ea-5094da97e134 - Error Failed to get the instant stat of collectd.dhcp-126-84_lab_eng_brq_redhat_com.disk-vda.disk_ops.write resource of dhcp-126-84_lab_eng_brq_redhat_com
Failed to get the instant stat of collectd.dhcp-126-84_lab_eng_brq_redhat_com.disk-vda1.disk_ops.write resource of dhcp-126-84_lab_eng_brq_redhat_com
Failed to get the instant stat of collectd.dhcp-126-84_lab_eng_brq_redhat_com.disk-vda2.disk_ops.write resource of dhcp-126-84_lab_eng_brq_redhat_com
Failed to get the instant stat of collectd.dhcp-126-84_lab_eng_brq_redhat_com.disk-vdb.disk_ops.write resource of dhcp-126-84_lab_eng_brq_redhat_com
Failed to get the instant stat of collectd.dhcp-126-84_lab_eng_brq_redhat_com.disk-vdc.disk_ops.write resource of dhcp-126-84_lab_eng_brq_redhat_com
2016-06-07T14:45:13.388+02:00 INFO     main.go:252 func·005] Signal:%!(EXTRA syscall.Signal=terminated)
2016-06-07T14:46:44.953+02:00 ERROR    auth.go:197 AddUser] Username: admin already exists
2016-06-07T14:46:44.953+02:00 ERROR    auth.go:41 AddDefaultUser] Unable to create default User:user already exists
2016-06-07T14:46:44.954+02:00 INFO     main.go:214 start] Starting the providers
2016-06-07T14:46:44.955+02:00 DEBUG    provider-conf.go:57 LoadProviderConfig] File Name: ceph.conf
2016-06-07T14:46:44.955+02:00 DEBUG    provider-conf.go:57 LoadProviderConfig] File Name: ceph.dat
2016-06-07T14:46:44.955+02:00 DEBUG    provider-conf.go:57 LoadProviderConfig] File Name: ceph.evt
2016-06-07T14:46:44.955+02:00 DEBUG    provider-conf.go:86 LoadProviderConfig] Collection: [{{ceph bigfin 10.1} [] {ceph-installer  true false}}]
2016-06-07T14:46:44.955+02:00 INFO     skyring.go:98 StartProviders] Config:%!(EXTRA []conf.ProviderInfo=[{{ceph bigfin 10.1} [] {ceph-installer  true false}}])
2016-06-07T14:46:44.955+02:00 DEBUG    skyring.go:101 StartProviders] Config:%!(EXTRA conf.ProviderInfo={{ceph bigfin 10.1} [] {ceph-installer  true false}})
2016-06-07T14:46:44.963+02:00 INFO     main.go:221 start] Starting event listener
2016-06-07T14:46:44.964+02:00 INFO     broadcaster.go:61 func·003] Websocket - start listening on 0.0.0.0 : 8081
2016-06-07T14:46:44.964+02:00 INFO     main.go:151 func·004] Started listening on 0.0.0.0:8080
~~~

While there is a hint that skyring service was stopped:

~~~
2016-06-07T14:45:13.388+02:00 INFO     main.go:252 func·005] Signal:%!(EXTRA syscall.Signal=terminated)
~~~

There is no log entry for the shutdown itself.

Moreover there is no entry for skyring service start. The 1st log entry after
the start is just:

~~~
2016-06-07T14:46:44.953+02:00 ERROR    auth.go:197 AddUser] Username: admin already exists
~~~

During step #9: I see this in the log:

~~~
2016-06-07T16:34:16.585+02:00 INFO     main.go:252 func·005] Signal:%!(EXTRA syscall.Signal=terminated)
2016-06-07T16:35:33.145+02:00 ERROR    auth.go:197 AddUser] Username: admin already exists
2016-06-07T16:35:33.145+02:00 ERROR    auth.go:41 AddDefaultUser] Unable to create default User:user already exists
2016-06-07T16:35:33.148+02:00 INFO     main.go:214 start] Starting the providers
2016-06-07T16:35:33.148+02:00 DEBUG    provider-conf.go:57 LoadProviderConfig] File Name: ceph.conf
2016-06-07T16:35:33.148+02:00 DEBUG    provider-conf.go:57 LoadProviderConfig] File Name: ceph.dat
2016-06-07T16:35:33.148+02:00 DEBUG    provider-conf.go:57 LoadProviderConfig] File Name: ceph.evt
2016-06-07T16:35:33.148+02:00 DEBUG    provider-conf.go:86 LoadProviderConfig] Collection: [{{ceph bigfin 10.1} [] {ceph-installer  true false}}]
2016-06-07T16:35:33.148+02:00 INFO     skyring.go:98 StartProviders] Config:%!(EXTRA []conf.ProviderInfo=[{{ceph bigfin 10.1} [] {ceph-installer  true false}}])
2016-06-07T16:35:33.148+02:00 DEBUG    skyring.go:101 StartProviders] Config:%!(EXTRA conf.ProviderInfo={{ceph bigfin 10.1} [] {ceph-installer  true false}})
2016-06-07T16:35:33.151+02:00 INFO     main.go:221 start] Starting event listener
2016-06-07T16:35:33.152+02:00 CRITICAL listener.go:146 StartListener] listen error:listen unix /var/run/.skyring-event: bind: address already in use
2016-06-07T16:35:43.585+02:00 ERROR    auth.go:197 AddUser] Username: admin already exists
2016-06-07T16:35:43.585+02:00 ERROR    auth.go:41 AddDefaultUser] Unable to create default User:user already exists
2016-06-07T16:35:43.587+02:00 INFO     main.go:214 start] Starting the providers
2016-06-07T16:35:43.587+02:00 DEBUG    provider-conf.go:57 LoadProviderConfig] File Name: ceph.conf
2016-06-07T16:35:43.587+02:00 DEBUG    provider-conf.go:57 LoadProviderConfig] File Name: ceph.dat
2016-06-07T16:35:43.587+02:00 DEBUG    provider-conf.go:57 LoadProviderConfig] File Name: ceph.evt
2016-06-07T16:35:43.587+02:00 DEBUG    provider-conf.go:86 LoadProviderConfig] Collection: [{{ceph bigfin 10.1} [] {ceph-installer  true false}}]
2016-06-07T16:35:43.587+02:00 INFO     skyring.go:98 StartProviders] Config:%!(EXTRA []conf.ProviderInfo=[{{ceph bigfin 10.1} [] {ceph-installer  true false}}])
2016-06-07T16:35:43.587+02:00 DEBUG    skyring.go:101 StartProviders] Config:%!(EXTRA conf.ProviderInfo={{ceph bigfin 10.1} [] {ceph-installer  true false}})
2016-06-07T16:35:43.592+02:00 INFO     main.go:221 start] Starting event listener
2016-06-07T16:35:43.593+02:00 CRITICAL listener.go:146 StartListener] listen error:listen unix /var/run/.skyring-event: bind: address already in use
~~~

Note that 1st line in this log excerpt corresponds with step #6, when we
stopped skyring service for the 2nd time via `systemctl stop`.

Then when skyring was started again, it noticed that it's unix domain socket
already exists and based on this it decided to shutdown. Then systemd noticed
that skyring service failed and tried to start it again.

The problem is that the fact that the service has been started and stopped
two times isn't immediately clear from the log example above, because:

 * there is no log entry for skyring start (again)
 * there is no log entry for skyring shutdown

While there is log entry about the signal handler which was involved in the
shutdown (line 1 in log listing above), there is no log entry just for the
shutdown itself, so that the shutdown is logged only when the skyring service
is stopped via signal handler, but not when skyring shutdowns itself on it's
own.

Expected results
================

There is a clear log entry for skyring service start and shutdown, no matter
what triggered the action.

Comment 1 Nishanth Thomas 2016-06-17 10:26:49 UTC

Moving to 3.0

Comment 3 Nishanth Thomas 2016-06-17 10:28:11 UTC

After discussing with pgm and dev, decided to move this to 3.0

Note You need to log in before you can comment on or make changes to this bug.