Bug 1224877 - calamari-crush-location causes OSD start failure when sudo is not present
Summary: calamari-crush-location causes OSD start failure when sudo is not present
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Calamari
Version: 1.3.0
Hardware: All
OS: All
unspecified
high
Target Milestone: rc
: 1.3.1
Assignee: Christina Meno
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
: 1232381 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-26 05:40 UTC by Christina Meno
Modified: 2022-07-09 07:34 UTC (History)
6 users (show)

Fixed In Version: calamari-server-1.3-8.el7cp
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-23 20:21:20 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-1493 0 None None None 2021-09-09 11:44:32 UTC
Red Hat Product Errata RHSA-2015:2066 0 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage 1.3.1 security, bug fix, and enhancement update 2015-11-24 02:34:55 UTC
Red Hat Product Errata RHSA-2015:2512 0 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage 1.3.1 security, bug fix, and enhancement update 2016-02-03 03:15:52 UTC

Description Christina Meno 2015-05-26 05:40:24 UTC
Description of problem:
in version 1.3 calamari installs a crush-location hook in the ceph.conf
this script is run during OSD start to determine where in the CRUSH map to place it. The script depends on sudo being available and able to run ceph password-less.

http://tracker.ceph.com/issues/11559



How reproducible:
requires sudo to not be present on an OSD

Steps to Reproduce:
1. yum -y remove sudo
2. service ceph restart osd


Actual results:

[root@vpm041 shadow_man]# service ceph restart osd
=== osd.1 === 
=== osd.1 === 
Stopping Ceph osd.1 on vpm041...kill 26929...kill 26929...done
=== osd.1 === 
Traceback (most recent call last):
  File "/usr/bin/calamari-crush-location", line 91, in <module>
    sys.exit(main())
  File "/usr/bin/calamari-crush-location", line 88, in main
    print get_osd_location(args.id)
  File "/usr/bin/calamari-crush-location", line 47, in get_osd_location
    last_location = get_last_crush_location(osd_id)
  File "/usr/bin/calamari-crush-location", line 27, in get_last_crush_location
    proc = Popen(c, stdout=PIPE, stderr=PIPE)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1308, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
Invalid command:  saw 0 of args(<string(goodchars [A-Za-z0-9-_.=])>) [<string(good
chars [A-Za-z0-9-_.=])>...], expected at least 1
osd crush create-or-move <osdname (id|osd.id)> <float[0.0-]> <args> [<args>...] : 
 create entry or move existing entry for <name> <weight> at/to location <args>
Error EINVAL: invalid command
failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1 --keyring=/v
ar/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 0.19 '
[root@vpm041 shadow_man]#

Expected results:
[root@vpm041 shadow_man]# service ceph restart osd
=== osd.1 === 
=== osd.1 === 
Stopping Ceph osd.1 on vpm041...kill 28685...kill 28685...done
=== osd.1 === 
ERROR:calamari_osd_location:Failed to get last crush location. Defaulting to curr$nt host vpm041
create-or-move updated item name 'osd.1' weight 0.19 at location {host=vpm041} to crush map
Starting Ceph osd.1 on vpm041...
Running as unit run-29316.service.
[root@vpm041 shadow_man]#

Additional info:

Comment 2 Christina Meno 2015-05-26 05:44:37 UTC
https://github.com/ceph/calamari/pull/291

Comment 3 Christina Meno 2015-06-16 21:47:43 UTC
*** Bug 1232381 has been marked as a duplicate of this bug. ***

Comment 4 Christina Meno 2015-06-17 01:25:21 UTC
work around for any customer that encounters this issue http://lists.ceph.com/pipermail/ceph-calamari-ceph.com/2015-May/000048.html

Comment 6 Ken Dreyer (Red Hat) 2015-08-12 22:10:17 UTC
Gregory, which build should we use for Ubuntu?

Comment 8 Harish NV Rao 2015-10-20 14:37:51 UTC
Hi Gregory,

I am trying to test the fix by following the steps mentioned in the BZ on a RHEL ceph cluster.

After executing "1. yum -y remove sudo 2. service ceph restart osd ", the osd restart is not happening. I ran these commands on OSD host.

I am unable to verify the fix with the steps mentioned in the defect.

Is there something wrong am I doing or am I missing something?

Can you please clarify?

Regards,
Harish

Logs:
----------------

[cephuser@magna086 ~]$ sudo service ceph status
=== osd.1 ===
osd.1: running {"version":"0.94.3"}
=== osd.5 ===
osd.5: running {"version":"0.94.3"}
=== osd.4 ===
osd.4: running {"version":"0.94.3"}
[cephuser@magna086 ~]$ sudo yum -y remove sudo
Loaded plugins: langpacks, product-id, subscription-manager
Resolving Dependencies
--> Running transaction check
---> Package sudo.x86_64 0:1.8.6p7-13.el7 will be erased
--> Finished Dependency Resolution
ceph-osd                                                                                                                                                                                    | 4.0 kB  00:00:00    
lab-extras                                                                                                                                                                                  |  951 B  00:00:00    
rhel-7-fcgi-ceph                                                                                                                                                                            |  951 B  00:00:00    

Dependencies Resolved

===================================================================================================================================================================================================================
 Package                                      Arch                                           Version                                                   Repository                                             Size
===================================================================================================================================================================================================================
Removing:
 sudo                                         x86_64                                         1.8.6p7-13.el7                                            @anaconda/7.1                                         2.4 M

Transaction Summary
===================================================================================================================================================================================================================
Remove  1 Package

Installed size: 2.4 M
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Erasing    : sudo-1.8.6p7-13.el7.x86_64                                                                                                                                                                      1/1
warning: /etc/sudoers saved as /etc/sudoers.rpmsave
  Verifying  : sudo-1.8.6p7-13.el7.x86_64                                                                                                                                                                      1/1

Removed:
  sudo.x86_64 0:1.8.6p7-13.el7                                                                                                                                                                                    

Complete!
[cephuser@magna086 ~]$ which sudo
/usr/bin/which: no sudo in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/cephuser/.local/bin:/home/cephuser/bin)
[cephuser@magna086 ~]$ #/etc/init.d/ceph restart osd.4
[cephuser@magna086 ~]$ service ceph restart osd.4
=== osd.4 ===
=== osd.4 ===
Stopping Ceph osd.4 on magna086...kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
^C
[cephuser@magna086 ~]$ service ceph status
=== osd.1 ===
osd.1: running failed: '/usr/bin/ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok version 2>/dev/null'
[cephuser@magna086 ~]$ service ceph status
=== osd.1 ===
osd.1: running failed: '/usr/bin/ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok version 2>/dev/null'

Note: No OSD was restarted and ceph health was ok.

Comment 10 Christina Meno 2015-10-21 04:23:48 UTC
Harish, you'll need to be root to successfully run the OSD restart

use su # you'll need the rootpw
alternatively you could sudo su -
then remove sudo

Comment 11 Harish NV Rao 2015-10-21 07:42:58 UTC
Thanks Gregory!

I was able to verify the fix with your suggestion.

[cephuser@magna086 ~]$ sudo service ceph status
=== osd.1 === 
osd.1: running {"version":"0.94.3"}
=== osd.5 === 
osd.5: running {"version":"0.94.3"}
=== osd.4 === 
osd.4: running {"version":"0.94.3"}
[cephuser@magna086 ~]$ su -
Password: 
Last login: Wed Oct 21 03:22:29 EDT 2015 on pts/0
[root@magna086 ~]# yum -y remove sudo
Loaded plugins: langpacks, product-id, subscription-manager
Resolving Dependencies
--> Running transaction check
---> Package sudo.x86_64 0:1.8.6p7-13.el7 will be erased
--> Finished Dependency Resolution
ceph-osd                                                                                                                                                                                    | 4.0 kB  00:00:00     
lab-extras                                                                                                                                                                                  |  951 B  00:00:00     
rhel-7-fcgi-ceph                                                                                                                                                                            |  951 B  00:00:00     
rhel-7-server-rpms/7Server/x86_64                                                                                                                                                           | 3.7 kB  00:00:00     
rhel-7-server-rpms/7Server/x86_64/updateinfo                                                                                                                                                | 643 kB  00:00:01     
rhel-7-server-rpms/7Server/x86_64/primary_db                                                                                                                                                |  14 MB  00:00:06     

Dependencies Resolved

===================================================================================================================================================================================================================
 Package                                     Arch                                          Version                                                Repository                                                  Size
===================================================================================================================================================================================================================
Removing:
 sudo                                        x86_64                                        1.8.6p7-13.el7                                         @rhel-7-server-rpms                                        2.4 M

Transaction Summary
===================================================================================================================================================================================================================
Remove  1 Package

Installed size: 2.4 M
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Erasing    : sudo-1.8.6p7-13.el7.x86_64                                                                                                                                                                      1/1 
rhel-7-server-rpms/7Server/x86_64/productid                                                                                                                                                 | 1.7 kB  00:00:00     
  Verifying  : sudo-1.8.6p7-13.el7.x86_64                                                                                                                                                                      1/1 

Removed:
  sudo.x86_64 0:1.8.6p7-13.el7                                                                                                                                                                                     

Complete!
[root@magna086 ~]# service ceph status
=== osd.1 === 
osd.1: running {"version":"0.94.3"}
=== osd.5 === 
osd.5: running {"version":"0.94.3"}
=== osd.4 === 
osd.4: running {"version":"0.94.3"}
[root@magna086 ~]# service ceph restart osd.4
=== osd.4 === 
=== osd.4 === 
Stopping Ceph osd.4 on magna086...kill 25320...kill 25320...done
=== osd.4 === 
ERROR:calamari_osd_location:Failed to get last crush location. Defaulting to current host magna086
libust[1545/1545]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
create-or-move updated item name 'osd.4' weight 0.9 at location {host=magna086} to crush map
Starting Ceph osd.4 on magna086...
Running as unit run-1588.service.
[root@magna086 ~]# service ceph status
=== osd.1 === 
osd.1: running {"version":"0.94.3"}
=== osd.5 === 
osd.5: running {"version":"0.94.3"}
=== osd.4 === 
osd.4: running {"version":"0.94.3"}
[root@magna086 ~]# exit
logout

Comment 13 errata-xmlrpc 2015-11-23 20:21:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2512

Comment 14 Siddharth Sharma 2015-11-23 21:53:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2066


Note You need to log in before you can comment on or make changes to this bug.