Description of problem: ======================= executed command 'salt '*' state.highstate' as a root and it failed for all node with error ID: diamond Function: cmd.run Name: systemctl restart diamond Result: False Comment: Command "systemctl restart diamond" run Changes: ---------- pid: 6366 retcode: 6 stderr: Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory. stdout: Version-Release number of selected component (if applicable): ============================================================ calamari-clients-1.3-2.el7cp.x86_64 calamari-server-1.3.3-1.el7cp.x86_64 How reproducible: ================= intermittent Steps to Reproduce: =================== 1. did installation of ceph on RHEL 7.2 cluster 2. connected all node to calamari server 3. executed command 'salt '*' state.highstate' as a root [ubuntu@magna034 ~]$ sudo salt '*' state.highstate magna052.ceph.redhat.com: ---------- ID: diamond Function: pkg.installed Result: True Comment: Package diamond is already installed Changes: ---------- ID: diamond-config Function: file.managed Name: /etc/diamond/diamond.conf Result: True Comment: File /etc/diamond/diamond.conf is in the correct state Changes: ---------- ID: diamond-ceph-config Function: file.managed Name: /etc/diamond/collectors/CephCollector.conf Result: True Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state Changes: ---------- ID: diamond-network-config Function: file.managed Name: /etc/diamond/collectors/NetworkCollector.conf Result: True Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state Changes: ---------- ID: diamond Function: cmd.run Name: systemctl restart diamond Result: False Comment: Command "systemctl restart diamond" run Changes: ---------- pid: 11055 retcode: 6 stderr: Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory. stdout: ---------- ID: distribute-osd-crush-location-script Function: file.managed Name: /usr/bin/calamari-crush-location Result: True Comment: File /usr/bin/calamari-crush-location is in the correct state Changes: ---------- ID: change-ceph-conf-to-use-our-location-script Function: cmd.run Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done Result: True Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run Changes: ---------- pid: 11058 retcode: 0 stderr: stdout: modifying /etc/ceph/ceph.conf /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location Summary ------------ Succeeded: 6 Failed: 1 ------------ Total: 7 magna106.ceph.redhat.com: ---------- ID: diamond Function: pkg.installed Result: True Comment: Package diamond is already installed Changes: ---------- ID: diamond-config Function: file.managed Name: /etc/diamond/diamond.conf Result: True Comment: File /etc/diamond/diamond.conf is in the correct state Changes: ---------- ID: diamond-ceph-config Function: file.managed Name: /etc/diamond/collectors/CephCollector.conf Result: True Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state Changes: ---------- ID: diamond-network-config Function: file.managed Name: /etc/diamond/collectors/NetworkCollector.conf Result: True Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state Changes: ---------- ID: diamond Function: cmd.run Name: systemctl restart diamond Result: False Comment: Command "systemctl restart diamond" run Changes: ---------- pid: 6731 retcode: 6 stderr: Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory. stdout: ---------- ID: distribute-osd-crush-location-script Function: file.managed Name: /usr/bin/calamari-crush-location Result: True Comment: File /usr/bin/calamari-crush-location is in the correct state Changes: ---------- ID: change-ceph-conf-to-use-our-location-script Function: cmd.run Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done Result: True Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run Changes: ---------- pid: 6734 retcode: 0 stderr: stdout: modifying /etc/ceph/ceph.conf /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location Summary ------------ Succeeded: 6 Failed: 1 ------------ Total: 7 magna093.ceph.redhat.com: ---------- ID: diamond Function: pkg.installed Result: True Comment: Package diamond is already installed Changes: ---------- ID: diamond-config Function: file.managed Name: /etc/diamond/diamond.conf Result: True Comment: File /etc/diamond/diamond.conf is in the correct state Changes: ---------- ID: diamond-ceph-config Function: file.managed Name: /etc/diamond/collectors/CephCollector.conf Result: True Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state Changes: ---------- ID: diamond-network-config Function: file.managed Name: /etc/diamond/collectors/NetworkCollector.conf Result: True Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state Changes: ---------- ID: diamond Function: cmd.run Name: systemctl restart diamond Result: False Comment: Command "systemctl restart diamond" run Changes: ---------- pid: 26444 retcode: 6 stderr: Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory. stdout: ---------- ID: distribute-osd-crush-location-script Function: file.managed Name: /usr/bin/calamari-crush-location Result: True Comment: File /usr/bin/calamari-crush-location is in the correct state Changes: ---------- ID: change-ceph-conf-to-use-our-location-script Function: cmd.run Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done Result: True Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run Changes: ---------- pid: 26447 retcode: 0 stderr: stdout: modifying /etc/ceph/ceph.conf /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location Summary ------------ Succeeded: 6 Failed: 1 ------------ Total: 7 magna111.ceph.redhat.com: ---------- ID: diamond Function: pkg.installed Result: True Comment: Package diamond is already installed Changes: ---------- ID: diamond-config Function: file.managed Name: /etc/diamond/diamond.conf Result: True Comment: File /etc/diamond/diamond.conf is in the correct state Changes: ---------- ID: diamond-ceph-config Function: file.managed Name: /etc/diamond/collectors/CephCollector.conf Result: True Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state Changes: ---------- ID: diamond-network-config Function: file.managed Name: /etc/diamond/collectors/NetworkCollector.conf Result: True Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state Changes: ---------- ID: diamond Function: cmd.run Name: systemctl restart diamond Result: False Comment: Command "systemctl restart diamond" run Changes: ---------- pid: 6366 retcode: 6 stderr: Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory. stdout: ---------- ID: distribute-osd-crush-location-script Function: file.managed Name: /usr/bin/calamari-crush-location Result: True Comment: File /usr/bin/calamari-crush-location is in the correct state Changes: ---------- ID: change-ceph-conf-to-use-our-location-script Function: cmd.run Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done Result: True Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run Changes: ---------- pid: 6369 retcode: 0 stderr: stdout: modifying /etc/ceph/ceph.conf /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location Summary ------------ Succeeded: 6 Failed: 1 ------------ Total: 7 magna058.ceph.redhat.com: ---------- ID: diamond Function: pkg.installed Result: True Comment: Package diamond is already installed Changes: ---------- ID: diamond-config Function: file.managed Name: /etc/diamond/diamond.conf Result: True Comment: File /etc/diamond/diamond.conf is in the correct state Changes: ---------- ID: diamond-ceph-config Function: file.managed Name: /etc/diamond/collectors/CephCollector.conf Result: True Comment: File /etc/diamond/collectors/CephCollector.conf is in the correct state Changes: ---------- ID: diamond-network-config Function: file.managed Name: /etc/diamond/collectors/NetworkCollector.conf Result: True Comment: File /etc/diamond/collectors/NetworkCollector.conf is in the correct state Changes: ---------- ID: diamond Function: cmd.run Name: systemctl restart diamond Result: False Comment: Command "systemctl restart diamond" run Changes: ---------- pid: 24968 retcode: 6 stderr: Failed to restart diamond.service: Unit diamond.service failed to load: No such file or directory. stdout: ---------- ID: distribute-osd-crush-location-script Function: file.managed Name: /usr/bin/calamari-crush-location Result: True Comment: File /usr/bin/calamari-crush-location is in the correct state Changes: ---------- ID: change-ceph-conf-to-use-our-location-script Function: cmd.run Name: find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done Result: True Comment: Command "find /etc/ceph -name '*.conf' | while read conf; do echo; cp "$conf" "$conf.orig"; echo "modifying $conf"; grep -EH 'osd crush update on start = false|osd crush location hook' "$conf" || sed 's/\[global\]/\[global\]\nosd crush location hook = \/usr\/bin\/calamari-crush-location/' -i "$conf"; done" run Changes: ---------- pid: 24971 retcode: 0 stderr: stdout: modifying /etc/ceph/ceph.conf /etc/ceph/ceph.conf:osd crush location hook = /usr/bin/calamari-crush-location Summary ------------ Succeeded: 6 Failed: 1 ------------ Total: 7 Actual results: =============== command fails on node saying - Failed to restart diamond.service. Due to this calamari Web UI is not showing any data for graph - IOPS or Usage on dashboard and under graph section no data is shown for graphs Expected results: ================ Command should not fail workaround: ================ on each node started service with command - sudo /etc/init.d/diamond restart and re run ' salt '*' state.highstate'. no failure in output and it started showing data on graph
I'm building a workaround since diamond 3 doesn't have systemd control files. I had built a diamond 4 package but failed to get it into the product. if you're interested in adding the new diamond package they can be found here: https://chacra.ceph.com/r/calamari/1.3.2/rhel/7/noarch/diamond-4.0.300-0.noarch.rpm https://chacra.ceph.com/r/calamari/1.3.2/ubuntu/trusty/pool/main/d/diamond/diamond_4.0.300_all.deb
I would recommend the workaround as it's more tested than diamond 4. Fixed upstream here: https://github.com/ceph/calamari/tree/wip-1310829
Gregory, are you coming up with a workaround that is different from what Rachana already mentioned in the description of this bug? Please clarify.
Harish: no my workaround is just code that does what Rachana said.
Docs team, please add this defect in the known issues section of 1.3.2 release notes. Please see the description of this bz to get the text for workaround
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions