Bug 1310829 - [Calamari] - salt '*' state.highstate failed with error - Failed to restart diamond.service
[Calamari] - salt '*' state.highstate failed with error - Failed to restart d...
Status: POST
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Calamari (Show other bugs)
1.3.2
x86_64 Linux
unspecified Severity medium
: rc
: 1.3.4
Assigned To: Gregory Meno
ceph-qe-bugs
Bara Ancincova
:
Depends On:
Blocks: 1299303
  Show dependency treegraph
 
Reported: 2016-02-22 12:55 EST by Rachana Patel
Modified: 2016-09-29 04:27 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
.The "salt '*' state.highstate" command fails to restart the "diamond" service The `salt '*' state.highstate` command fails to restart the `diamond` service after installation of Red Hat Ceph Storage because the command cannot load the `diamond.service` unit file. As a consequence, the Calamari web UI does not show any data for the graphs in the `IOPS` and `Usage` sections of the Calamari dashboard. To work around this issue, restart `diamond` on each node by running the following command as `root`: ---- # /etc/init.d/diamond restart ---- Then run `salt '*' state.highstate` again: ---- # salt '*' state.highstate ----
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rachana Patel 2016-02-22 12:55:18 EST
Description of problem:
=======================
executed command 'salt '*' state.highstate' as a root and it failed for all node with error

  ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  6366
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  


Version-Release number of selected component (if applicable):
============================================================
calamari-clients-1.3-2.el7cp.x86_64
calamari-server-1.3.3-1.el7cp.x86_64




How reproducible:
=================
intermittent


Steps to Reproduce:
===================
1. did installation of ceph on RHEL 7.2 cluster
2. connected all node to calamari server
3. executed command 'salt '*' state.highstate' as a root

[ubuntu@magna034 ~]$ sudo salt '*' state.highstate
magna052.ceph.redhat.com:
----------
          ID: diamond
    Function: pkg.installed
      Result: True
     Comment: Package diamond is already installed
     Changes:   
----------
          ID: diamond-config
    Function: file.managed
        Name: /etc/diamond/diamond.conf
      Result: True
     Comment: File /etc/diamond/diamond.conf is in the correct state
     Changes:   
----------
          ID: diamond-ceph-config
    Function: file.managed
        Name: /etc/diamond/collectors/CephCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond-network-config
    Function: file.managed
        Name: /etc/diamond/collectors/NetworkCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  11055
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  
----------
          ID: distribute-osd-crush-location-script
    Function: file.managed
        Name: /usr/bin/calamari-crush-location
      Result: True
     Comment: File /usr/bin/calamari-crush-location is in the correct state
     Changes:   
----------
          ID: change-ceph-conf-to-use-our-location-script
    Function: cmd.run
        Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done
      Result: True
     Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run
     Changes:   
              ----------
              pid:
                  11058
              retcode:
                  0
              stderr:
                  
              stdout:
                  
                  modifying /etc/ceph/ceph.conf
                  /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location

Summary
------------
Succeeded: 6
Failed:    1
------------
Total:     7
magna106.ceph.redhat.com:
----------
          ID: diamond
    Function: pkg.installed
      Result: True
     Comment: Package diamond is already installed
     Changes:   
----------
          ID: diamond-config
    Function: file.managed
        Name: /etc/diamond/diamond.conf
      Result: True
     Comment: File /etc/diamond/diamond.conf is in the correct state
     Changes:   
----------
          ID: diamond-ceph-config
    Function: file.managed
        Name: /etc/diamond/collectors/CephCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond-network-config
    Function: file.managed
        Name: /etc/diamond/collectors/NetworkCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  6731
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  
----------
          ID: distribute-osd-crush-location-script
    Function: file.managed
        Name: /usr/bin/calamari-crush-location
      Result: True
     Comment: File /usr/bin/calamari-crush-location is in the correct state
     Changes:   
----------
          ID: change-ceph-conf-to-use-our-location-script
    Function: cmd.run
        Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done
      Result: True
     Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run
     Changes:   
              ----------
              pid:
                  6734
              retcode:
                  0
              stderr:
                  
              stdout:
                  
                  modifying /etc/ceph/ceph.conf
                  /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location

Summary
------------
Succeeded: 6
Failed:    1
------------
Total:     7
magna093.ceph.redhat.com:
----------
          ID: diamond
    Function: pkg.installed
      Result: True
     Comment: Package diamond is already installed
     Changes:   
----------
          ID: diamond-config
    Function: file.managed
        Name: /etc/diamond/diamond.conf
      Result: True
     Comment: File /etc/diamond/diamond.conf is in the correct state
     Changes:   
----------
          ID: diamond-ceph-config
    Function: file.managed
        Name: /etc/diamond/collectors/CephCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond-network-config
    Function: file.managed
        Name: /etc/diamond/collectors/NetworkCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  26444
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  
----------
          ID: distribute-osd-crush-location-script
    Function: file.managed
        Name: /usr/bin/calamari-crush-location
      Result: True
     Comment: File /usr/bin/calamari-crush-location is in the correct state
     Changes:   
----------
          ID: change-ceph-conf-to-use-our-location-script
    Function: cmd.run
        Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done
      Result: True
     Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run
     Changes:   
              ----------
              pid:
                  26447
              retcode:
                  0
              stderr:
                  
              stdout:
                  
                  modifying /etc/ceph/ceph.conf
                  /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location

Summary
------------
Succeeded: 6
Failed:    1
------------
Total:     7
magna111.ceph.redhat.com:
----------
          ID: diamond
    Function: pkg.installed
      Result: True
     Comment: Package diamond is already installed
     Changes:   
----------
          ID: diamond-config
    Function: file.managed
        Name: /etc/diamond/diamond.conf
      Result: True
     Comment: File /etc/diamond/diamond.conf is in the correct state
     Changes:   
----------
          ID: diamond-ceph-config
    Function: file.managed
        Name: /etc/diamond/collectors/CephCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond-network-config
    Function: file.managed
        Name: /etc/diamond/collectors/NetworkCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  6366
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  
----------
          ID: distribute-osd-crush-location-script
    Function: file.managed
        Name: /usr/bin/calamari-crush-location
      Result: True
     Comment: File /usr/bin/calamari-crush-location is in the correct state
     Changes:   
----------
          ID: change-ceph-conf-to-use-our-location-script
    Function: cmd.run
        Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done
      Result: True
     Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run
     Changes:   
              ----------
              pid:
                  6369
              retcode:
                  0
              stderr:
                  
              stdout:
                  
                  modifying /etc/ceph/ceph.conf
                  /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location

Summary
------------
Succeeded: 6
Failed:    1
------------
Total:     7
magna058.ceph.redhat.com:
----------
          ID: diamond
    Function: pkg.installed
      Result: True
     Comment: Package diamond is already installed
     Changes:   
----------
          ID: diamond-config
    Function: file.managed
        Name: /etc/diamond/diamond.conf
      Result: True
     Comment: File /etc/diamond/diamond.conf is in the correct state
     Changes:   
----------
          ID: diamond-ceph-config
    Function: file.managed
        Name: /etc/diamond/collectors/CephCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond-network-config
    Function: file.managed
        Name: /etc/diamond/collectors/NetworkCollector.conf
      Result: True
     Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state
     Changes:   
----------
          ID: diamond
    Function: cmd.run
        Name: systemctl restart diamond
      Result: False
     Comment: Command "systemctl restart diamond" run
     Changes:   
              ----------
              pid:
                  24968
              retcode:
                  6
              stderr:
                  Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory.
              stdout:
                  
----------
          ID: distribute-osd-crush-location-script
    Function: file.managed
        Name: /usr/bin/calamari-crush-location
      Result: True
     Comment: File /usr/bin/calamari-crush-location is in the correct state
     Changes:   
----------
          ID: change-ceph-conf-to-use-our-location-script
    Function: cmd.run
        Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done
      Result: True
     Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run
     Changes:   
              ----------
              pid:
                  24971
              retcode:
                  0
              stderr:
                  
              stdout:
                  
                  modifying /etc/ceph/ceph.conf
                  /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location

Summary
------------
Succeeded: 6
Failed:    1
------------
Total:     7



Actual results:
===============
command fails on node saying -  Failed to restart diamond.service.
Due to this calamari Web UI is not showing any data for graph - IOPS or Usage on dashboard and under graph section no data is shown for graphs


Expected results:
================
Command should not fail 


workaround:
================
on each node started service with command -  sudo /etc/init.d/diamond restart
and re run ' salt '*' state.highstate'. no failure in output and it started showing data on graph
Comment 2 Gregory Meno 2016-02-22 15:32:44 EST
I'm building a workaround since diamond 3 doesn't have systemd control
files. I had built a diamond 4 package but failed to get it into the
product.
if you're interested in adding the new diamond package they can be found here:
https://chacra.ceph.com/r/calamari/1.3.2/rhel/7/noarch/diamond-4.0.300-0.noarch.rpm
https://chacra.ceph.com/r/calamari/1.3.2/ubuntu/trusty/pool/main/d/diamond/diamond_4.0.300_all.deb
Comment 3 Gregory Meno 2016-02-22 18:51:10 EST
I would recommend the workaround as it's more tested than diamond 4.
Fixed upstream here: https://github.com/ceph/calamari/tree/wip-1310829
Comment 4 Harish NV Rao 2016-02-23 04:24:08 EST
Gregory, are you coming up with a workaround that is different from what Rachana already mentioned in the description of this bug? Please clarify.
Comment 6 Gregory Meno 2016-02-23 17:56:29 EST
Harish: no my workaround is just code that does what Rachana said.
Comment 7 Harish NV Rao 2016-02-24 00:34:06 EST
Docs team, please add this defect in the known issues section of 1.3.2 release notes. Please see the description of this bz to get the text for workaround
Comment 10 Mike McCune 2016-03-28 18:39:30 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions

Note You need to log in before you can comment on or make changes to this bug.