Bug 1275636 - 1.3.1: 'service ceph start osd.<id>' displays error messages on a node which has an entry for osd crush location hook in ceph.conf file
1.3.1: 'service ceph start osd.<id>' displays error messages on a node which ...
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Calamari (Show other bugs)
1.3.1
x86_64 Linux
medium Severity medium
: rc
: 1.3.4
Assigned To: Boris Ranto
ceph-qe-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-27 08:04 EDT by Harish NV Rao
Modified: 2018-04-03 13:45 EDT (History)
9 users (show)

See Also:
Fixed In Version: RHEL: calamari-server-1.3.5-1.el7cp Ubuntu: calamari_1.3.5-2redhat1trusty
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-04-03 13:45:17 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Github ceph/calamari/pull/530 None None None 2018-03-14 12:08 EDT
Red Hat Knowledge Base (Solution) 2449081 None None None 2016-07-22 07:23 EDT
Red Hat Product Errata RHBA-2018:0626 None None None 2018-04-03 13:45 EDT

  None (edit)
Description Harish NV Rao 2015-10-27 08:04:59 EDT
Description of problem:

I have 3 osd nodes in which two osd nodes have in the ceph.conf the entry "osd crush location hook = /usr/bin/calamari-crush-location". On these two nodes, if I do "service ceph start osd.<id>", I get following error messages but the daemon starts successfully.

[cephuser@magna086 ceph]$ sudo service ceph start osd.4
=== osd.4 ===
ERROR:calamari_osd_location:Error 2 running ceph config-key get:'libust[24687/24687]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
Error ENOENT: error obtaining 'daemon-private/osd.4/v1/calamari/osd_crush_location': (2) No such file or directory'
ERROR:calamari_osd_location:Error 1 running ceph config-key get:'2015-10-23 09:11:00.642080 7fafec7e4700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2015-10-23 09:11:00.642083 7fafec7e4700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound'
libust[24763/24763]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
create-or-move updated item name 'osd.4' weight 0.9 at location {host=magna086} to crush map
Starting Ceph osd.4 on magna086...
Running as unit run-24806.service.

Version-Release number of selected component (if applicable): 0.94.3


How reproducible:
always

Steps to Reproduce:
1. Ensure /etc/ceph/ceph.conf has the entry for "osd crush location hook" on a osd node
2. Restart an osd on such node

Actual results:
ERROR:calamari_osd_location:Error 2 running ceph config-key get:'libust[24687/24687]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
Error ENOENT: error obtaining 'daemon-private/osd.4/v1/calamari/osd_crush_location': (2) No such file or directory'
ERROR:calamari_osd_location:Error 1 running ceph config-key get:'2015-10-23 09:11:00.642080 7fafec7e4700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2015-10-23 09:11:00.642083 7fafec7e4700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound'



Expected results:
osd daemon starts without above mentioned error messages

Additional info:
Comment 2 Gregory Meno 2015-10-27 19:00:11 EDT
These error messages are legitimate but could be toned down as the OSD starts fine.

The real bug here is why only two of the three OSD nodes are configured with a crush location hook.

lack of the location hook will only effect users who alter their CRUSH map with calamari to place an OSD in a host node named with a different name than $(hostname -s) reports.

In that case if the OSD got restarted it might not stick to the location previously specified.

I'm suggesting this get fixed in 1.3.2 since it doesn't prevent OSDs from starting or getting placed correctly in most instances.
Comment 7 Vikhyat Umrao 2016-07-19 02:16:23 EDT
Gregory, upstream patch is not merged and it is closed. Could you please let us know the correct patch for this bug.
Comment 19 Harish NV Rao 2018-03-14 06:38:36 EDT
@Manohar, Boris and I discussed over IRC and came up with following steps:

1) Stop an osd daemon
2) copy /opt/calamari/salt/salt/base/calamari-crush-location.py to /usr/bin/calamari-crush-location
3) In the ceph.conf, add the entry "osd crush location hook = /usr/bin/calamari-crush-location"
4) Distribute this conf file to all nodes in cluster
5) Start the stopped osd daemon
Expected: no error message should be printed. osd should start successfully. cluster should show active+clean state.

Please execute the above steps and if they work fine, then move the BZ to verified state.
Comment 21 Boris Ranto 2018-03-14 11:23:23 EDT
New PR:

https://github.com/ceph/calamari/pull/530
Comment 27 errata-xmlrpc 2018-04-03 13:45:17 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0626

Note You need to log in before you can comment on or make changes to this bug.