1275636 – 1.3.1: 'service ceph start osd.<id>' displays error messages on a node which has an entry for osd crush location hook in ceph.conf file

Bug 1275636 - 1.3.1: 'service ceph start osd.<id>' displays error messages on a node which has an entry for osd crush location hook in ceph.conf file

Summary: 1.3.1: 'service ceph start osd.<id>' displays error messages on a node which ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Calamari
Sub Component:
Version:	1.3.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	1.3.4
Assignee:	Boris Ranto
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-10-27 12:04 UTC by Harish NV Rao
Modified:	2021-06-10 11:02 UTC (History)
CC List:	9 users (show)
Fixed In Version:	RHEL: calamari-server-1.3.5-1.el7cp Ubuntu: calamari_1.3.5-2redhat1trusty
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-03 17:45:17 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph calamari pull 530	None	None	None	2018-03-14 16:08:50 UTC
Red Hat Knowledge Base (Solution)	2449081	None	None	None	2016-07-22 11:23:27 UTC
Red Hat Product Errata	RHBA-2018:0626	None	None	None	2018-04-03 17:45:35 UTC

Description Harish NV Rao 2015-10-27 12:04:59 UTC

Description of problem:

I have 3 osd nodes in which two osd nodes have in the ceph.conf the entry "osd crush location hook = /usr/bin/calamari-crush-location". On these two nodes, if I do "service ceph start osd.<id>", I get following error messages but the daemon starts successfully.

[cephuser@magna086 ceph]$ sudo service ceph start osd.4
=== osd.4 ===
ERROR:calamari_osd_location:Error 2 running ceph config-key get:'libust[24687/24687]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
Error ENOENT: error obtaining 'daemon-private/osd.4/v1/calamari/osd_crush_location': (2) No such file or directory'
ERROR:calamari_osd_location:Error 1 running ceph config-key get:'2015-10-23 09:11:00.642080 7fafec7e4700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2015-10-23 09:11:00.642083 7fafec7e4700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound'
libust[24763/24763]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
create-or-move updated item name 'osd.4' weight 0.9 at location {host=magna086} to crush map
Starting Ceph osd.4 on magna086...
Running as unit run-24806.service.

Version-Release number of selected component (if applicable): 0.94.3


How reproducible:
always

Steps to Reproduce:
1. Ensure /etc/ceph/ceph.conf has the entry for "osd crush location hook" on a osd node
2. Restart an osd on such node

Actual results:
ERROR:calamari_osd_location:Error 2 running ceph config-key get:'libust[24687/24687]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
Error ENOENT: error obtaining 'daemon-private/osd.4/v1/calamari/osd_crush_location': (2) No such file or directory'
ERROR:calamari_osd_location:Error 1 running ceph config-key get:'2015-10-23 09:11:00.642080 7fafec7e4700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2015-10-23 09:11:00.642083 7fafec7e4700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound'



Expected results:
osd daemon starts without above mentioned error messages

Additional info:

Comment 2 Christina Meno 2015-10-27 23:00:11 UTC

These error messages are legitimate but could be toned down as the OSD starts fine.

The real bug here is why only two of the three OSD nodes are configured with a crush location hook.

lack of the location hook will only effect users who alter their CRUSH map with calamari to place an OSD in a host node named with a different name than $(hostname -s) reports.

In that case if the OSD got restarted it might not stick to the location previously specified.

I'm suggesting this get fixed in 1.3.2 since it doesn't prevent OSDs from starting or getting placed correctly in most instances.

Comment 6 Christina Meno 2016-04-29 17:07:18 UTC

patch upstream: https://github.com/ceph/calamari/pull/435/commits/798625c24d8f8b1710ce03c3824ddef172e2ec35

Comment 7 Vikhyat Umrao 2016-07-19 06:16:23 UTC

Gregory, upstream patch is not merged and it is closed. Could you please let us know the correct patch for this bug.

Comment 19 Harish NV Rao 2018-03-14 10:38:36 UTC

@Manohar, Boris and I discussed over IRC and came up with following steps:

1) Stop an osd daemon
2) copy /opt/calamari/salt/salt/base/calamari-crush-location.py to /usr/bin/calamari-crush-location
3) In the ceph.conf, add the entry "osd crush location hook = /usr/bin/calamari-crush-location"
4) Distribute this conf file to all nodes in cluster
5) Start the stopped osd daemon
Expected: no error message should be printed. osd should start successfully. cluster should show active+clean state.

Please execute the above steps and if they work fine, then move the BZ to verified state.

Comment 21 Boris Ranto 2018-03-14 15:23:23 UTC

New PR:

https://github.com/ceph/calamari/pull/530

Comment 27 errata-xmlrpc 2018-04-03 17:45:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0626

Note You need to log in before you can comment on or make changes to this bug.