Description of problem ====================== Skyring service doesn't clearly log it's own start or shutdown in the logs so that it's not always immediately obvious when the service was started, stopped or restarted. This could be confusing and would make debugging harder when there are some issues with the service. Version-Release =============== rhscon-core-0.0.21-1.el7scon.x86_64 How reproducible ================ 100 % Steps to Reproduce ================== 1. Install RHSC 2.0 following the documentation. 2. Make sure skyring service is running. 3. Stop it: systemctl stop skyring 4. Start it: systemctl start skyring 5. Check /var/log/skyring.log file 6. Stop it again: systemctl stop skyring 7. Make sure unix socket file /var/run/.skyring-event exists, in other words: create one manually on purpose like this: ~~~ # socat UNIX-LISTEN:/var/run/.skyring-event - ^C # ls -l /var/run/.skyring-event # lsof /var/run/.skyring-event # ~~~ 8. Start it again: systemctl start skyring (asking skyring to start while it's unix domain socket already exists make it shutdown itself outright after the start) 9. Check /var/log/skyring.log file Teardown: remove leftover socket file: rm /var/run/.skyring-event Actual results ============== During step #5, I see this in the skyring log file: ~~~ 2016-06-07T14:40:29.753+02:00 WARNING monitoring.go:276 FetchAggregatedStatsFromGraphite] skyring:22419750-9573-422a-87ea-5094da97e134 - Error Failed to get the instant stat of collectd.dhcp-126-84_lab_eng_brq_redhat_com.disk-vda.disk_ops.write resource of dhcp-126-84_lab_eng_brq_redhat_com Failed to get the instant stat of collectd.dhcp-126-84_lab_eng_brq_redhat_com.disk-vda1.disk_ops.write resource of dhcp-126-84_lab_eng_brq_redhat_com Failed to get the instant stat of collectd.dhcp-126-84_lab_eng_brq_redhat_com.disk-vda2.disk_ops.write resource of dhcp-126-84_lab_eng_brq_redhat_com Failed to get the instant stat of collectd.dhcp-126-84_lab_eng_brq_redhat_com.disk-vdb.disk_ops.write resource of dhcp-126-84_lab_eng_brq_redhat_com Failed to get the instant stat of collectd.dhcp-126-84_lab_eng_brq_redhat_com.disk-vdc.disk_ops.write resource of dhcp-126-84_lab_eng_brq_redhat_com 2016-06-07T14:45:13.388+02:00 INFO main.go:252 func·005] Signal:%!(EXTRA syscall.Signal=terminated) 2016-06-07T14:46:44.953+02:00 ERROR auth.go:197 AddUser] Username: admin already exists 2016-06-07T14:46:44.953+02:00 ERROR auth.go:41 AddDefaultUser] Unable to create default User:user already exists 2016-06-07T14:46:44.954+02:00 INFO main.go:214 start] Starting the providers 2016-06-07T14:46:44.955+02:00 DEBUG provider-conf.go:57 LoadProviderConfig] File Name: ceph.conf 2016-06-07T14:46:44.955+02:00 DEBUG provider-conf.go:57 LoadProviderConfig] File Name: ceph.dat 2016-06-07T14:46:44.955+02:00 DEBUG provider-conf.go:57 LoadProviderConfig] File Name: ceph.evt 2016-06-07T14:46:44.955+02:00 DEBUG provider-conf.go:86 LoadProviderConfig] Collection: [{{ceph bigfin 10.1} [] {ceph-installer true false}}] 2016-06-07T14:46:44.955+02:00 INFO skyring.go:98 StartProviders] Config:%!(EXTRA []conf.ProviderInfo=[{{ceph bigfin 10.1} [] {ceph-installer true false}}]) 2016-06-07T14:46:44.955+02:00 DEBUG skyring.go:101 StartProviders] Config:%!(EXTRA conf.ProviderInfo={{ceph bigfin 10.1} [] {ceph-installer true false}}) 2016-06-07T14:46:44.963+02:00 INFO main.go:221 start] Starting event listener 2016-06-07T14:46:44.964+02:00 INFO broadcaster.go:61 func·003] Websocket - start listening on 0.0.0.0 : 8081 2016-06-07T14:46:44.964+02:00 INFO main.go:151 func·004] Started listening on 0.0.0.0:8080 ~~~ While there is a hint that skyring service was stopped: ~~~ 2016-06-07T14:45:13.388+02:00 INFO main.go:252 func·005] Signal:%!(EXTRA syscall.Signal=terminated) ~~~ There is no log entry for the shutdown itself. Moreover there is no entry for skyring service start. The 1st log entry after the start is just: ~~~ 2016-06-07T14:46:44.953+02:00 ERROR auth.go:197 AddUser] Username: admin already exists ~~~ During step #9: I see this in the log: ~~~ 2016-06-07T16:34:16.585+02:00 INFO main.go:252 func·005] Signal:%!(EXTRA syscall.Signal=terminated) 2016-06-07T16:35:33.145+02:00 ERROR auth.go:197 AddUser] Username: admin already exists 2016-06-07T16:35:33.145+02:00 ERROR auth.go:41 AddDefaultUser] Unable to create default User:user already exists 2016-06-07T16:35:33.148+02:00 INFO main.go:214 start] Starting the providers 2016-06-07T16:35:33.148+02:00 DEBUG provider-conf.go:57 LoadProviderConfig] File Name: ceph.conf 2016-06-07T16:35:33.148+02:00 DEBUG provider-conf.go:57 LoadProviderConfig] File Name: ceph.dat 2016-06-07T16:35:33.148+02:00 DEBUG provider-conf.go:57 LoadProviderConfig] File Name: ceph.evt 2016-06-07T16:35:33.148+02:00 DEBUG provider-conf.go:86 LoadProviderConfig] Collection: [{{ceph bigfin 10.1} [] {ceph-installer true false}}] 2016-06-07T16:35:33.148+02:00 INFO skyring.go:98 StartProviders] Config:%!(EXTRA []conf.ProviderInfo=[{{ceph bigfin 10.1} [] {ceph-installer true false}}]) 2016-06-07T16:35:33.148+02:00 DEBUG skyring.go:101 StartProviders] Config:%!(EXTRA conf.ProviderInfo={{ceph bigfin 10.1} [] {ceph-installer true false}}) 2016-06-07T16:35:33.151+02:00 INFO main.go:221 start] Starting event listener 2016-06-07T16:35:33.152+02:00 CRITICAL listener.go:146 StartListener] listen error:listen unix /var/run/.skyring-event: bind: address already in use 2016-06-07T16:35:43.585+02:00 ERROR auth.go:197 AddUser] Username: admin already exists 2016-06-07T16:35:43.585+02:00 ERROR auth.go:41 AddDefaultUser] Unable to create default User:user already exists 2016-06-07T16:35:43.587+02:00 INFO main.go:214 start] Starting the providers 2016-06-07T16:35:43.587+02:00 DEBUG provider-conf.go:57 LoadProviderConfig] File Name: ceph.conf 2016-06-07T16:35:43.587+02:00 DEBUG provider-conf.go:57 LoadProviderConfig] File Name: ceph.dat 2016-06-07T16:35:43.587+02:00 DEBUG provider-conf.go:57 LoadProviderConfig] File Name: ceph.evt 2016-06-07T16:35:43.587+02:00 DEBUG provider-conf.go:86 LoadProviderConfig] Collection: [{{ceph bigfin 10.1} [] {ceph-installer true false}}] 2016-06-07T16:35:43.587+02:00 INFO skyring.go:98 StartProviders] Config:%!(EXTRA []conf.ProviderInfo=[{{ceph bigfin 10.1} [] {ceph-installer true false}}]) 2016-06-07T16:35:43.587+02:00 DEBUG skyring.go:101 StartProviders] Config:%!(EXTRA conf.ProviderInfo={{ceph bigfin 10.1} [] {ceph-installer true false}}) 2016-06-07T16:35:43.592+02:00 INFO main.go:221 start] Starting event listener 2016-06-07T16:35:43.593+02:00 CRITICAL listener.go:146 StartListener] listen error:listen unix /var/run/.skyring-event: bind: address already in use ~~~ Note that 1st line in this log excerpt corresponds with step #6, when we stopped skyring service for the 2nd time via `systemctl stop`. Then when skyring was started again, it noticed that it's unix domain socket already exists and based on this it decided to shutdown. Then systemd noticed that skyring service failed and tried to start it again. The problem is that the fact that the service has been started and stopped two times isn't immediately clear from the log example above, because: * there is no log entry for skyring start (again) * there is no log entry for skyring shutdown While there is log entry about the signal handler which was involved in the shutdown (line 1 in log listing above), there is no log entry just for the shutdown itself, so that the shutdown is logged only when the skyring service is stopped via signal handler, but not when skyring shutdowns itself on it's own. Expected results ================ There is a clear log entry for skyring service start and shutdown, no matter what triggered the action.
Moving to 3.0
After discussing with pgm and dev, decided to move this to 3.0