Description of problem: in version 1.3 calamari installs a crush-location hook in the ceph.conf this script is run during OSD start to determine where in the CRUSH map to place it. The script depends on sudo being available and able to run ceph password-less. http://tracker.ceph.com/issues/11559 How reproducible: requires sudo to not be present on an OSD Steps to Reproduce: 1. yum -y remove sudo 2. service ceph restart osd Actual results: [root@vpm041 shadow_man]# service ceph restart osd === osd.1 === === osd.1 === Stopping Ceph osd.1 on vpm041...kill 26929...kill 26929...done === osd.1 === Traceback (most recent call last): File "/usr/bin/calamari-crush-location", line 91, in <module> sys.exit(main()) File "/usr/bin/calamari-crush-location", line 88, in main print get_osd_location(args.id) File "/usr/bin/calamari-crush-location", line 47, in get_osd_location last_location = get_last_crush_location(osd_id) File "/usr/bin/calamari-crush-location", line 27, in get_last_crush_location proc = Popen(c, stdout=PIPE, stderr=PIPE) File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__ errread, errwrite) File "/usr/lib64/python2.7/subprocess.py", line 1308, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory Invalid command: saw 0 of args(<string(goodchars [A-Za-z0-9-_.=])>) [<string(good chars [A-Za-z0-9-_.=])>...], expected at least 1 osd crush create-or-move <osdname (id|osd.id)> <float[0.0-]> <args> [<args>...] : create entry or move existing entry for <name> <weight> at/to location <args> Error EINVAL: invalid command failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1 --keyring=/v ar/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 0.19 ' [root@vpm041 shadow_man]# Expected results: [root@vpm041 shadow_man]# service ceph restart osd === osd.1 === === osd.1 === Stopping Ceph osd.1 on vpm041...kill 28685...kill 28685...done === osd.1 === ERROR:calamari_osd_location:Failed to get last crush location. Defaulting to curr$nt host vpm041 create-or-move updated item name 'osd.1' weight 0.19 at location {host=vpm041} to crush map Starting Ceph osd.1 on vpm041... Running as unit run-29316.service. [root@vpm041 shadow_man]# Additional info:
https://github.com/ceph/calamari/pull/291
*** Bug 1232381 has been marked as a duplicate of this bug. ***
work around for any customer that encounters this issue http://lists.ceph.com/pipermail/ceph-calamari-ceph.com/2015-May/000048.html
Gregory, which build should we use for Ubuntu?
Hi Gregory, I am trying to test the fix by following the steps mentioned in the BZ on a RHEL ceph cluster. After executing "1. yum -y remove sudo 2. service ceph restart osd ", the osd restart is not happening. I ran these commands on OSD host. I am unable to verify the fix with the steps mentioned in the defect. Is there something wrong am I doing or am I missing something? Can you please clarify? Regards, Harish Logs: ---------------- [cephuser@magna086 ~]$ sudo service ceph status === osd.1 === osd.1: running {"version":"0.94.3"} === osd.5 === osd.5: running {"version":"0.94.3"} === osd.4 === osd.4: running {"version":"0.94.3"} [cephuser@magna086 ~]$ sudo yum -y remove sudo Loaded plugins: langpacks, product-id, subscription-manager Resolving Dependencies --> Running transaction check ---> Package sudo.x86_64 0:1.8.6p7-13.el7 will be erased --> Finished Dependency Resolution ceph-osd | 4.0 kB 00:00:00 lab-extras | 951 B 00:00:00 rhel-7-fcgi-ceph | 951 B 00:00:00 Dependencies Resolved =================================================================================================================================================================================================================== Package Arch Version Repository Size =================================================================================================================================================================================================================== Removing: sudo x86_64 1.8.6p7-13.el7 @anaconda/7.1 2.4 M Transaction Summary =================================================================================================================================================================================================================== Remove 1 Package Installed size: 2.4 M Downloading packages: Running transaction check Running transaction test Transaction test succeeded Running transaction Erasing : sudo-1.8.6p7-13.el7.x86_64 1/1 warning: /etc/sudoers saved as /etc/sudoers.rpmsave Verifying : sudo-1.8.6p7-13.el7.x86_64 1/1 Removed: sudo.x86_64 0:1.8.6p7-13.el7 Complete! [cephuser@magna086 ~]$ which sudo /usr/bin/which: no sudo in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/cephuser/.local/bin:/home/cephuser/bin) [cephuser@magna086 ~]$ #/etc/init.d/ceph restart osd.4 [cephuser@magna086 ~]$ service ceph restart osd.4 === osd.4 === === osd.4 === Stopping Ceph osd.4 on magna086...kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted kill 25320...bash: line 5: kill: (25320) - Operation not permitted ^C [cephuser@magna086 ~]$ service ceph status === osd.1 === osd.1: running failed: '/usr/bin/ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok version 2>/dev/null' [cephuser@magna086 ~]$ service ceph status === osd.1 === osd.1: running failed: '/usr/bin/ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok version 2>/dev/null' Note: No OSD was restarted and ceph health was ok.
Harish, you'll need to be root to successfully run the OSD restart use su # you'll need the rootpw alternatively you could sudo su - then remove sudo
Thanks Gregory! I was able to verify the fix with your suggestion. [cephuser@magna086 ~]$ sudo service ceph status === osd.1 === osd.1: running {"version":"0.94.3"} === osd.5 === osd.5: running {"version":"0.94.3"} === osd.4 === osd.4: running {"version":"0.94.3"} [cephuser@magna086 ~]$ su - Password: Last login: Wed Oct 21 03:22:29 EDT 2015 on pts/0 [root@magna086 ~]# yum -y remove sudo Loaded plugins: langpacks, product-id, subscription-manager Resolving Dependencies --> Running transaction check ---> Package sudo.x86_64 0:1.8.6p7-13.el7 will be erased --> Finished Dependency Resolution ceph-osd | 4.0 kB 00:00:00 lab-extras | 951 B 00:00:00 rhel-7-fcgi-ceph | 951 B 00:00:00 rhel-7-server-rpms/7Server/x86_64 | 3.7 kB 00:00:00 rhel-7-server-rpms/7Server/x86_64/updateinfo | 643 kB 00:00:01 rhel-7-server-rpms/7Server/x86_64/primary_db | 14 MB 00:00:06 Dependencies Resolved =================================================================================================================================================================================================================== Package Arch Version Repository Size =================================================================================================================================================================================================================== Removing: sudo x86_64 1.8.6p7-13.el7 @rhel-7-server-rpms 2.4 M Transaction Summary =================================================================================================================================================================================================================== Remove 1 Package Installed size: 2.4 M Downloading packages: Running transaction check Running transaction test Transaction test succeeded Running transaction Erasing : sudo-1.8.6p7-13.el7.x86_64 1/1 rhel-7-server-rpms/7Server/x86_64/productid | 1.7 kB 00:00:00 Verifying : sudo-1.8.6p7-13.el7.x86_64 1/1 Removed: sudo.x86_64 0:1.8.6p7-13.el7 Complete! [root@magna086 ~]# service ceph status === osd.1 === osd.1: running {"version":"0.94.3"} === osd.5 === osd.5: running {"version":"0.94.3"} === osd.4 === osd.4: running {"version":"0.94.3"} [root@magna086 ~]# service ceph restart osd.4 === osd.4 === === osd.4 === Stopping Ceph osd.4 on magna086...kill 25320...kill 25320...done === osd.4 === ERROR:calamari_osd_location:Failed to get last crush location. Defaulting to current host magna086 libust[1545/1545]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) create-or-move updated item name 'osd.4' weight 0.9 at location {host=magna086} to crush map Starting Ceph osd.4 on magna086... Running as unit run-1588.service. [root@magna086 ~]# service ceph status === osd.1 === osd.1: running {"version":"0.94.3"} === osd.5 === osd.5: running {"version":"0.94.3"} === osd.4 === osd.4: running {"version":"0.94.3"} [root@magna086 ~]# exit logout
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:2512
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:2066