Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1224877 - calamari-crush-location causes OSD start failure when sudo is not present
calamari-crush-location causes OSD start failure when sudo is not present
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Calamari (Show other bugs)
1.3.0
All All
unspecified Severity high
: rc
: 1.3.1
Assigned To: Gregory Meno
ceph-qe-bugs
:
: 1232381 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-05-26 01:40 EDT by Gregory Meno
Modified: 2018-07-23 01:51 EDT (History)
6 users (show)

See Also:
Fixed In Version: calamari-server-1.3-8.el7cp
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-23 15:21:20 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:2066 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage 1.3.1 security, bug fix, and enhancement update 2015-11-23 21:34:55 EST
Red Hat Product Errata RHSA-2015:2512 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage 1.3.1 security, bug fix, and enhancement update 2016-02-02 22:15:52 EST

  None (edit)
Description Gregory Meno 2015-05-26 01:40:24 EDT
Description of problem:
in version 1.3 calamari installs a crush-location hook in the ceph.conf
this script is run during OSD start to determine where in the CRUSH map to place it. The script depends on sudo being available and able to run ceph password-less.

http://tracker.ceph.com/issues/11559



How reproducible:
requires sudo to not be present on an OSD

Steps to Reproduce:
1. yum -y remove sudo
2. service ceph restart osd


Actual results:

[root@vpm041 shadow_man]# service ceph restart osd
=== osd.1 === 
=== osd.1 === 
Stopping Ceph osd.1 on vpm041...kill 26929...kill 26929...done
=== osd.1 === 
Traceback (most recent call last):
  File "/usr/bin/calamari-crush-location", line 91, in <module>
    sys.exit(main())
  File "/usr/bin/calamari-crush-location", line 88, in main
    print get_osd_location(args.id)
  File "/usr/bin/calamari-crush-location", line 47, in get_osd_location
    last_location = get_last_crush_location(osd_id)
  File "/usr/bin/calamari-crush-location", line 27, in get_last_crush_location
    proc = Popen(c, stdout=PIPE, stderr=PIPE)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1308, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
Invalid command:  saw 0 of args(<string(goodchars [A-Za-z0-9-_.=])>) [<string(good
chars [A-Za-z0-9-_.=])>...], expected at least 1
osd crush create-or-move <osdname (id|osd.id)> <float[0.0-]> <args> [<args>...] : 
 create entry or move existing entry for <name> <weight> at/to location <args>
Error EINVAL: invalid command
failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1 --keyring=/v
ar/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 0.19 '
[root@vpm041 shadow_man]#

Expected results:
[root@vpm041 shadow_man]# service ceph restart osd
=== osd.1 === 
=== osd.1 === 
Stopping Ceph osd.1 on vpm041...kill 28685...kill 28685...done
=== osd.1 === 
ERROR:calamari_osd_location:Failed to get last crush location. Defaulting to curr$nt host vpm041
create-or-move updated item name 'osd.1' weight 0.19 at location {host=vpm041} to crush map
Starting Ceph osd.1 on vpm041...
Running as unit run-29316.service.
[root@vpm041 shadow_man]#

Additional info:
Comment 2 Gregory Meno 2015-05-26 01:44:37 EDT
https://github.com/ceph/calamari/pull/291
Comment 3 Gregory Meno 2015-06-16 17:47:43 EDT
*** Bug 1232381 has been marked as a duplicate of this bug. ***
Comment 4 Gregory Meno 2015-06-16 21:25:21 EDT
work around for any customer that encounters this issue http://lists.ceph.com/pipermail/ceph-calamari-ceph.com/2015-May/000048.html
Comment 6 Ken Dreyer (Red Hat) 2015-08-12 18:10:17 EDT
Gregory, which build should we use for Ubuntu?
Comment 8 Harish NV Rao 2015-10-20 10:37:51 EDT
Hi Gregory,

I am trying to test the fix by following the steps mentioned in the BZ on a RHEL ceph cluster.

After executing "1. yum -y remove sudo 2. service ceph restart osd ", the osd restart is not happening. I ran these commands on OSD host.

I am unable to verify the fix with the steps mentioned in the defect.

Is there something wrong am I doing or am I missing something?

Can you please clarify?

Regards,
Harish

Logs:
----------------

[cephuser@magna086 ~]$ sudo service ceph status
=== osd.1 ===
osd.1: running {"version":"0.94.3"}
=== osd.5 ===
osd.5: running {"version":"0.94.3"}
=== osd.4 ===
osd.4: running {"version":"0.94.3"}
[cephuser@magna086 ~]$ sudo yum -y remove sudo
Loaded plugins: langpacks, product-id, subscription-manager
Resolving Dependencies
--> Running transaction check
---> Package sudo.x86_64 0:1.8.6p7-13.el7 will be erased
--> Finished Dependency Resolution
ceph-osd                                                                                                                                                                                    | 4.0 kB  00:00:00    
lab-extras                                                                                                                                                                                  |  951 B  00:00:00    
rhel-7-fcgi-ceph                                                                                                                                                                            |  951 B  00:00:00    

Dependencies Resolved

===================================================================================================================================================================================================================
 Package                                      Arch                                           Version                                                   Repository                                             Size
===================================================================================================================================================================================================================
Removing:
 sudo                                         x86_64                                         1.8.6p7-13.el7                                            @anaconda/7.1                                         2.4 M

Transaction Summary
===================================================================================================================================================================================================================
Remove  1 Package

Installed size: 2.4 M
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Erasing    : sudo-1.8.6p7-13.el7.x86_64                                                                                                                                                                      1/1
warning: /etc/sudoers saved as /etc/sudoers.rpmsave
  Verifying  : sudo-1.8.6p7-13.el7.x86_64                                                                                                                                                                      1/1

Removed:
  sudo.x86_64 0:1.8.6p7-13.el7                                                                                                                                                                                    

Complete!
[cephuser@magna086 ~]$ which sudo
/usr/bin/which: no sudo in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/cephuser/.local/bin:/home/cephuser/bin)
[cephuser@magna086 ~]$ #/etc/init.d/ceph restart osd.4
[cephuser@magna086 ~]$ service ceph restart osd.4
=== osd.4 ===
=== osd.4 ===
Stopping Ceph osd.4 on magna086...kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
kill 25320...bash: line 5: kill: (25320) - Operation not permitted
^C
[cephuser@magna086 ~]$ service ceph status
=== osd.1 ===
osd.1: running failed: '/usr/bin/ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok version 2>/dev/null'
[cephuser@magna086 ~]$ service ceph status
=== osd.1 ===
osd.1: running failed: '/usr/bin/ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok version 2>/dev/null'

Note: No OSD was restarted and ceph health was ok.
Comment 10 Gregory Meno 2015-10-21 00:23:48 EDT
Harish, you'll need to be root to successfully run the OSD restart

use su # you'll need the rootpw
alternatively you could sudo su -
then remove sudo
Comment 11 Harish NV Rao 2015-10-21 03:42:58 EDT
Thanks Gregory!

I was able to verify the fix with your suggestion.

[cephuser@magna086 ~]$ sudo service ceph status
=== osd.1 === 
osd.1: running {"version":"0.94.3"}
=== osd.5 === 
osd.5: running {"version":"0.94.3"}
=== osd.4 === 
osd.4: running {"version":"0.94.3"}
[cephuser@magna086 ~]$ su -
Password: 
Last login: Wed Oct 21 03:22:29 EDT 2015 on pts/0
[root@magna086 ~]# yum -y remove sudo
Loaded plugins: langpacks, product-id, subscription-manager
Resolving Dependencies
--> Running transaction check
---> Package sudo.x86_64 0:1.8.6p7-13.el7 will be erased
--> Finished Dependency Resolution
ceph-osd                                                                                                                                                                                    | 4.0 kB  00:00:00     
lab-extras                                                                                                                                                                                  |  951 B  00:00:00     
rhel-7-fcgi-ceph                                                                                                                                                                            |  951 B  00:00:00     
rhel-7-server-rpms/7Server/x86_64                                                                                                                                                           | 3.7 kB  00:00:00     
rhel-7-server-rpms/7Server/x86_64/updateinfo                                                                                                                                                | 643 kB  00:00:01     
rhel-7-server-rpms/7Server/x86_64/primary_db                                                                                                                                                |  14 MB  00:00:06     

Dependencies Resolved

===================================================================================================================================================================================================================
 Package                                     Arch                                          Version                                                Repository                                                  Size
===================================================================================================================================================================================================================
Removing:
 sudo                                        x86_64                                        1.8.6p7-13.el7                                         @rhel-7-server-rpms                                        2.4 M

Transaction Summary
===================================================================================================================================================================================================================
Remove  1 Package

Installed size: 2.4 M
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Erasing    : sudo-1.8.6p7-13.el7.x86_64                                                                                                                                                                      1/1 
rhel-7-server-rpms/7Server/x86_64/productid                                                                                                                                                 | 1.7 kB  00:00:00     
  Verifying  : sudo-1.8.6p7-13.el7.x86_64                                                                                                                                                                      1/1 

Removed:
  sudo.x86_64 0:1.8.6p7-13.el7                                                                                                                                                                                     

Complete!
[root@magna086 ~]# service ceph status
=== osd.1 === 
osd.1: running {"version":"0.94.3"}
=== osd.5 === 
osd.5: running {"version":"0.94.3"}
=== osd.4 === 
osd.4: running {"version":"0.94.3"}
[root@magna086 ~]# service ceph restart osd.4
=== osd.4 === 
=== osd.4 === 
Stopping Ceph osd.4 on magna086...kill 25320...kill 25320...done
=== osd.4 === 
ERROR:calamari_osd_location:Failed to get last crush location. Defaulting to current host magna086
libust[1545/1545]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)
create-or-move updated item name 'osd.4' weight 0.9 at location {host=magna086} to crush map
Starting Ceph osd.4 on magna086...
Running as unit run-1588.service.
[root@magna086 ~]# service ceph status
=== osd.1 === 
osd.1: running {"version":"0.94.3"}
=== osd.5 === 
osd.5: running {"version":"0.94.3"}
=== osd.4 === 
osd.4: running {"version":"0.94.3"}
[root@magna086 ~]# exit
logout
Comment 13 errata-xmlrpc 2015-11-23 15:21:20 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2512
Comment 14 Siddharth Sharma 2015-11-23 16:53:21 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2066

Note You need to log in before you can comment on or make changes to this bug.