Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1275636

Summary: 1.3.1: 'service ceph start osd.<id>' displays error messages on a node which has an entry for osd crush location hook in ceph.conf file
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Harish NV Rao <hnallurv>
Component: CalamariAssignee: Boris Ranto <branto>
Calamari sub component: Back-end QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: branto, ceph-eng-bugs, flucifre, gmeno, kdreyer, mhackett, mmurthy, rcernin, vumrao
Version: 1.3.1   
Target Milestone: rc   
Target Release: 1.3.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: calamari-server-1.3.5-1.el7cp Ubuntu: calamari_1.3.5-2redhat1trusty Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-03 17:45:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harish NV Rao 2015-10-27 12:04:59 UTC
Description of problem:

I have 3 osd nodes in which two osd nodes have in the ceph.conf the entry "osd crush location hook = /usr/bin/calamari-crush-location". On these two nodes, if I do "service ceph start osd.<id>", I get following error messages but the daemon starts successfully.

[cephuser@magna086 ceph]$ sudo service ceph start osd.4
=== osd.4 ===
ERROR:calamari_osd_location:Error 2 running ceph config-key get:'libust[24687/24687]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
Error ENOENT: error obtaining 'daemon-private/osd.4/v1/calamari/osd_crush_location': (2) No such file or directory'
ERROR:calamari_osd_location:Error 1 running ceph config-key get:'2015-10-23 09:11:00.642080 7fafec7e4700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2015-10-23 09:11:00.642083 7fafec7e4700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound'
libust[24763/24763]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
create-or-move updated item name 'osd.4' weight 0.9 at location {host=magna086} to crush map
Starting Ceph osd.4 on magna086...
Running as unit run-24806.service.

Version-Release number of selected component (if applicable): 0.94.3


How reproducible:
always

Steps to Reproduce:
1. Ensure /etc/ceph/ceph.conf has the entry for "osd crush location hook" on a osd node
2. Restart an osd on such node

Actual results:
ERROR:calamari_osd_location:Error 2 running ceph config-key get:'libust[24687/24687]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
Error ENOENT: error obtaining 'daemon-private/osd.4/v1/calamari/osd_crush_location': (2) No such file or directory'
ERROR:calamari_osd_location:Error 1 running ceph config-key get:'2015-10-23 09:11:00.642080 7fafec7e4700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2015-10-23 09:11:00.642083 7fafec7e4700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound'



Expected results:
osd daemon starts without above mentioned error messages

Additional info:

Comment 2 Christina Meno 2015-10-27 23:00:11 UTC
These error messages are legitimate but could be toned down as the OSD starts fine.

The real bug here is why only two of the three OSD nodes are configured with a crush location hook.

lack of the location hook will only effect users who alter their CRUSH map with calamari to place an OSD in a host node named with a different name than $(hostname -s) reports.

In that case if the OSD got restarted it might not stick to the location previously specified.

I'm suggesting this get fixed in 1.3.2 since it doesn't prevent OSDs from starting or getting placed correctly in most instances.

Comment 7 Vikhyat Umrao 2016-07-19 06:16:23 UTC
Gregory, upstream patch is not merged and it is closed. Could you please let us know the correct patch for this bug.

Comment 19 Harish NV Rao 2018-03-14 10:38:36 UTC
@Manohar, Boris and I discussed over IRC and came up with following steps:

1) Stop an osd daemon
2) copy /opt/calamari/salt/salt/base/calamari-crush-location.py to /usr/bin/calamari-crush-location
3) In the ceph.conf, add the entry "osd crush location hook = /usr/bin/calamari-crush-location"
4) Distribute this conf file to all nodes in cluster
5) Start the stopped osd daemon
Expected: no error message should be printed. osd should start successfully. cluster should show active+clean state.

Please execute the above steps and if they work fine, then move the BZ to verified state.

Comment 21 Boris Ranto 2018-03-14 15:23:23 UTC
New PR:

https://github.com/ceph/calamari/pull/530

Comment 27 errata-xmlrpc 2018-04-03 17:45:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0626