Bug 1310829

Summary: [Calamari] - salt '*' state.highstate failed with error - Failed to restart diamond.service
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Rachana Patel <racpatel>
Component: CalamariAssignee: Christina Meno <gmeno>
Calamari sub component: Back-end QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Status: CLOSED WONTFIX Docs Contact: Bara Ancincova <bancinco>
Severity: medium    
Priority: unspecified CC: ceph-eng-bugs, flucifre, gmeno, hnallurv, kdreyer, racpatel
Version: 1.3.2   
Target Milestone: rc   
Target Release: 1.3.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
.The "salt '*' state.highstate" command fails to restart the "diamond" service The `salt '*' state.highstate` command fails to restart the `diamond` service after installation of Red Hat Ceph Storage because the command cannot load the `diamond.service` unit file. As a consequence, the Calamari web UI does not show any data for the graphs in the `IOPS` and `Usage` sections of the Calamari dashboard. To work around this issue, restart `diamond` on each node by running the following command as `root`: ---- # /etc/init.d/diamond restart ---- Then run `salt '*' state.highstate` again: ---- # salt '*' state.highstate ----
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-07 23:46:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1299303    

Description Rachana Patel 2016-02-22 17:55:18 UTC
Description of problem:
=======================
executed command 'salt '*' state.highstate' as a root and it failed for all node with error

  ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  6366
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  


Version-Release number of selected component (if applicable):
============================================================
calamari-clients-1.3-2.el7cp.x86_64
calamari-server-1.3.3-1.el7cp.x86_64




How reproducible:
=================
intermittent


Steps to Reproduce:
===================
1. did installation of ceph on RHEL 7.2 cluster
2. connected all node to calamari server
3. executed command 'salt '*' state.highstate' as a root

[ubuntu@magna034 ~]$ sudo salt '*' state.highstate
magna052.ceph.redhat.com:
----------
          ID: diamond
    Function: pkg.installed
      Result: True
     Comment: Package diamond is already installed
     Changes:   
----------
          ID: diamond-config
    Function: file.managed
        Name: /etc/diamond/diamond.conf
      Result: True
     Comment: File /etc/diamond/diamond.conf is in the correct state
     Changes:   
----------
          ID: diamond-ceph-config
    Function: file.managed
        Name: /etc/diamond/collectors/CephCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond-network-config
    Function: file.managed
        Name: /etc/diamond/collectors/NetworkCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  11055
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  
----------
          ID: distribute-osd-crush-location-script
    Function: file.managed
        Name: /usr/bin/calamari-crush-location
      Result: True
     Comment: File /usr/bin/calamari-crush-location is in the correct state
     Changes:   
----------
          ID: change-ceph-conf-to-use-our-location-script
    Function: cmd.run
        Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done
      Result: True
     Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run
     Changes:   
              ----------
              pid:
                  11058
              retcode:
                  0
              stderr:
                  
              stdout:
                  
                  modifying /etc/ceph/ceph.conf
                  /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location

Summary
------------
Succeeded: 6
Failed:    1
------------
Total:     7
magna106.ceph.redhat.com:
----------
          ID: diamond
    Function: pkg.installed
      Result: True
     Comment: Package diamond is already installed
     Changes:   
----------
          ID: diamond-config
    Function: file.managed
        Name: /etc/diamond/diamond.conf
      Result: True
     Comment: File /etc/diamond/diamond.conf is in the correct state
     Changes:   
----------
          ID: diamond-ceph-config
    Function: file.managed
        Name: /etc/diamond/collectors/CephCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond-network-config
    Function: file.managed
        Name: /etc/diamond/collectors/NetworkCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  6731
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  
----------
          ID: distribute-osd-crush-location-script
    Function: file.managed
        Name: /usr/bin/calamari-crush-location
      Result: True
     Comment: File /usr/bin/calamari-crush-location is in the correct state
     Changes:   
----------
          ID: change-ceph-conf-to-use-our-location-script
    Function: cmd.run
        Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done
      Result: True
     Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run
     Changes:   
              ----------
              pid:
                  6734
              retcode:
                  0
              stderr:
                  
              stdout:
                  
                  modifying /etc/ceph/ceph.conf
                  /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location

Summary
------------
Succeeded: 6
Failed:    1
------------
Total:     7
magna093.ceph.redhat.com:
----------
          ID: diamond
    Function: pkg.installed
      Result: True
     Comment: Package diamond is already installed
     Changes:   
----------
          ID: diamond-config
    Function: file.managed
        Name: /etc/diamond/diamond.conf
      Result: True
     Comment: File /etc/diamond/diamond.conf is in the correct state
     Changes:   
----------
          ID: diamond-ceph-config
    Function: file.managed
        Name: /etc/diamond/collectors/CephCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond-network-config
    Function: file.managed
        Name: /etc/diamond/collectors/NetworkCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  26444
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  
----------
          ID: distribute-osd-crush-location-script
    Function: file.managed
        Name: /usr/bin/calamari-crush-location
      Result: True
     Comment: File /usr/bin/calamari-crush-location is in the correct state
     Changes:   
----------
          ID: change-ceph-conf-to-use-our-location-script
    Function: cmd.run
        Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done
      Result: True
     Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run
     Changes:   
              ----------
              pid:
                  26447
              retcode:
                  0
              stderr:
                  
              stdout:
                  
                  modifying /etc/ceph/ceph.conf
                  /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location

Summary
------------
Succeeded: 6
Failed:    1
------------
Total:     7
magna111.ceph.redhat.com:
----------
          ID: diamond
    Function: pkg.installed
      Result: True
     Comment: Package diamond is already installed
     Changes:   
----------
          ID: diamond-config
    Function: file.managed
        Name: /etc/diamond/diamond.conf
      Result: True
     Comment: File /etc/diamond/diamond.conf is in the correct state
     Changes:   
----------
          ID: diamond-ceph-config
    Function: file.managed
        Name: /etc/diamond/collectors/CephCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond-network-config
    Function: file.managed
        Name: /etc/diamond/collectors/NetworkCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  6366
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  
----------
          ID: distribute-osd-crush-location-script
    Function: file.managed
        Name: /usr/bin/calamari-crush-location
      Result: True
     Comment: File /usr/bin/calamari-crush-location is in the correct state
     Changes:   
----------
          ID: change-ceph-conf-to-use-our-location-script
    Function: cmd.run
        Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done
      Result: True
     Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run
     Changes:   
              ----------
              pid:
                  6369
              retcode:
                  0
              stderr:
                  
              stdout:
                  
                  modifying /etc/ceph/ceph.conf
                  /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location

Summary
------------
Succeeded: 6
Failed:    1
------------
Total:     7
magna058.ceph.redhat.com:
----------
          ID: diamond
    Function: pkg.installed
      Result: True
     Comment: Package diamond is already installed
     Changes:   
----------
          ID: diamond-config
    Function: file.managed
        Name: /etc/diamond/diamond.conf
      Result: True
     Comment: File /etc/diamond/diamond.conf is in the correct state
     Changes:   
----------
          ID: diamond-ceph-config
    Function: file.managed
        Name: /etc/diamond/collectors/CephCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond-network-config
    Function: file.managed
        Name: /etc/diamond/collectors/NetworkCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  24968
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  
----------
          ID: distribute-osd-crush-location-script
    Function: file.managed
        Name: /usr/bin/calamari-crush-location
      Result: True
     Comment: File /usr/bin/calamari-crush-location is in the correct state
     Changes:   
----------
          ID: change-ceph-conf-to-use-our-location-script
    Function: cmd.run
        Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done
      Result: True
     Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run
     Changes:   
              ----------
              pid:
                  24971
              retcode:
                  0
              stderr:
                  
              stdout:
                  
                  modifying /etc/ceph/ceph.conf
                  /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location

Summary
------------
Succeeded: 6
Failed:    1
------------
Total:     7



Actual results:
===============
command fails on node saying -  Failed to restart diamond.service.
Due to this calamari Web UI is not showing any data for graph - IOPS or Usage on dashboard and under graph section no data is shown for graphs


Expected results:
================
Command should not fail 


workaround:
================
on each node started service with command -  sudo /etc/init.d/diamond restart
and re run ' salt '*' state.highstate'. no failure in output and it started showing data on graph

Comment 2 Christina Meno 2016-02-22 20:32:44 UTC
I'm building a workaround since diamond 3 doesn't have systemd control
files. I had built a diamond 4 package but failed to get it into the
product.
if you're interested in adding the new diamond package they can be found here:
https://chacra.ceph.com/r/calamari/1.3.2/rhel/7/noarch/diamond-4.0.300-0.noarch.rpm
https://chacra.ceph.com/r/calamari/1.3.2/ubuntu/trusty/pool/main/d/diamond/diamond_4.0.300_all.deb

Comment 3 Christina Meno 2016-02-22 23:51:10 UTC
I would recommend the workaround as it's more tested than diamond 4.
Fixed upstream here: https://github.com/ceph/calamari/tree/wip-1310829

Comment 4 Harish NV Rao 2016-02-23 09:24:08 UTC
Gregory, are you coming up with a workaround that is different from what Rachana already mentioned in the description of this bug? Please clarify.

Comment 6 Christina Meno 2016-02-23 22:56:29 UTC
Harish: no my workaround is just code that does what Rachana said.

Comment 7 Harish NV Rao 2016-02-24 05:34:06 UTC
Docs team, please add this defect in the known issues section of 1.3.2 release notes. Please see the description of this bz to get the text for workaround

Comment 10 Mike McCune 2016-03-28 22:39:30 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions