Bug 1275636 - 1.3.1: 'service ceph start osd.<id>' displays error messages on a node which has an entry for osd crush location hook in ceph.conf file
Summary: 1.3.1: 'service ceph start osd.<id>' displays error messages on a node which ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Calamari
Version: 1.3.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 1.3.4
Assignee: Boris Ranto
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-27 12:04 UTC by Harish NV Rao
Modified: 2021-06-10 11:02 UTC (History)
9 users (show)

Fixed In Version: RHEL: calamari-server-1.3.5-1.el7cp Ubuntu: calamari_1.3.5-2redhat1trusty
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-03 17:45:17 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph calamari pull 530 0 None None None 2018-03-14 16:08:50 UTC
Red Hat Knowledge Base (Solution) 2449081 0 None None None 2016-07-22 11:23:27 UTC
Red Hat Product Errata RHBA-2018:0626 0 None None None 2018-04-03 17:45:35 UTC

Description Harish NV Rao 2015-10-27 12:04:59 UTC
Description of problem:

I have 3 osd nodes in which two osd nodes have in the ceph.conf the entry "osd crush location hook = /usr/bin/calamari-crush-location". On these two nodes, if I do "service ceph start osd.<id>", I get following error messages but the daemon starts successfully.

[cephuser@magna086 ceph]$ sudo service ceph start osd.4
=== osd.4 ===
ERROR:calamari_osd_location:Error 2 running ceph config-key get:'libust[24687/24687]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
Error ENOENT: error obtaining 'daemon-private/osd.4/v1/calamari/osd_crush_location': (2) No such file or directory'
ERROR:calamari_osd_location:Error 1 running ceph config-key get:'2015-10-23 09:11:00.642080 7fafec7e4700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2015-10-23 09:11:00.642083 7fafec7e4700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound'
libust[24763/24763]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
create-or-move updated item name 'osd.4' weight 0.9 at location {host=magna086} to crush map
Starting Ceph osd.4 on magna086...
Running as unit run-24806.service.

Version-Release number of selected component (if applicable): 0.94.3


How reproducible:
always

Steps to Reproduce:
1. Ensure /etc/ceph/ceph.conf has the entry for "osd crush location hook" on a osd node
2. Restart an osd on such node

Actual results:
ERROR:calamari_osd_location:Error 2 running ceph config-key get:'libust[24687/24687]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
Error ENOENT: error obtaining 'daemon-private/osd.4/v1/calamari/osd_crush_location': (2) No such file or directory'
ERROR:calamari_osd_location:Error 1 running ceph config-key get:'2015-10-23 09:11:00.642080 7fafec7e4700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2015-10-23 09:11:00.642083 7fafec7e4700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound'



Expected results:
osd daemon starts without above mentioned error messages

Additional info:

Comment 2 Christina Meno 2015-10-27 23:00:11 UTC
These error messages are legitimate but could be toned down as the OSD starts fine.

The real bug here is why only two of the three OSD nodes are configured with a crush location hook.

lack of the location hook will only effect users who alter their CRUSH map with calamari to place an OSD in a host node named with a different name than $(hostname -s) reports.

In that case if the OSD got restarted it might not stick to the location previously specified.

I'm suggesting this get fixed in 1.3.2 since it doesn't prevent OSDs from starting or getting placed correctly in most instances.

Comment 7 Vikhyat Umrao 2016-07-19 06:16:23 UTC
Gregory, upstream patch is not merged and it is closed. Could you please let us know the correct patch for this bug.

Comment 19 Harish NV Rao 2018-03-14 10:38:36 UTC
@Manohar, Boris and I discussed over IRC and came up with following steps:

1) Stop an osd daemon
2) copy /opt/calamari/salt/salt/base/calamari-crush-location.py to /usr/bin/calamari-crush-location
3) In the ceph.conf, add the entry "osd crush location hook = /usr/bin/calamari-crush-location"
4) Distribute this conf file to all nodes in cluster
5) Start the stopped osd daemon
Expected: no error message should be printed. osd should start successfully. cluster should show active+clean state.

Please execute the above steps and if they work fine, then move the BZ to verified state.

Comment 21 Boris Ranto 2018-03-14 15:23:23 UTC
New PR:

https://github.com/ceph/calamari/pull/530

Comment 27 errata-xmlrpc 2018-04-03 17:45:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0626


Note You need to log in before you can comment on or make changes to this bug.