Red Hat Bugzilla – Bug 1269299
installing Ceph OSDs on RHEL 7.2 seems to work, but does not create any osds.
Last modified: 2017-12-12 19:24:05 EST
Created attachment 1080440 [details]
Sample output from ceph-deploy osd create magna045:sdb
Description of problem:
ceph-deploy osd create seems to finish cleanly, but there are errors in the results.
Version-Release number of selected component (if applicable):
Rhel 7.2 -- Ceph 1.2.3
100% of the time (3 for 3 for me)
Steps to Reproduce:
On two RHEL 7.2 machines
1. run ice_setup on one machine
2. use ceph-deploy to install ceph on the other machine
3. the ceph osd commands are:
a. ceph-deploy disk zap magnaXXX:sdb
b. ceph-deploy osd create magnaXXX:sdb
The command seems to work, but looking at the output (see attached file), there
appear to be some problems. The behavior is the same when creating osds on sdc
and sdd as well:
$ sudo ceph osd stat
osdmap e1: 0 osds: 0 up, 0 in
$ sudo ceph health
HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
A healthy ceph should be running
The problems of which I speak in the the attachment are lines like:
[WARNIN] partx: /dev/sdb: error adding partitions 1-2
Created attachment 1080512 [details]
ceph-deploy mon create-initial output after ceph-deploy install --release firefly was performed.
Created attachment 1080513 [details]
Result after ceph-deploy mon create-initial
This problem (the one with mon create-initial failing) happened because ceph-mon, for some reason, was not found on the monitor site. I ran yum install ceph-mon on that site and this command now works.
Hmmm. ceph-disk is not installed on the ceph cluster site.
I need to run yum install ceph-osd there too.
After making the changes in the previous comments, ceph osd create magnaXXX:sdX now finishes. The output (including the error adding partition messages) looked similar to the output in the first file attached. I also got the message:
[magna045][WARNIN] there is 1 OSD down
[magna045][WARNIN] there is 1 OSD out
after creating the first osd
[magna045][WARNIN] there are 2 OSDs down
[magna045][WARNIN] there are 2 OSDs out
after creating the second osd, and
[magna045][WARNIN] there are 3 OSDs down
[magna045][WARNIN] there are 3 OSDs out
after the third.
This looked ominous to me. Sure enough, ceph health shows:
HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean
Raising a new BZ against 1.3
(In reply to Warren from comment #3)
> Created attachment 1080512 [details]
> ceph-deploy mon create-initial output after ceph-deploy install --release
> firefly was performed.
Is "--release firefly" in the RHCS install docs anywhere? I suspect that will grab packages from ceph.com.
Note: This was unclear from reading the comments:
The first problem that I encountered (OSDs appeared to come up but ceph osd stat proved otherwise) happened when I did ceph-deploy install magnaXXX.
Later comments (comment #3 to comment #7) happened when I tried the --release firefly option as an attempt to see if this were a workaround.
Created attachment 1080831 [details]
ceph-deploy install output without any subscriptions.
The attachment in Comment 12 show the output of ceph-deploy install when no subscriptions are changed and the iso is installed from ice_setup.
[magna038][WARNIN] http://magna035.ceph.redhat.com/static/ceph/0.80.8/repodata/repomd.xml: [Errno 14] curl#7 - "Failed connect to magna035.ceph.redhat.com:80; No route to host"
ls /opt/calamari/webapp/content/ceph/0.80.8/repodata shows
So I am guessing that on magna038, http://magna035.ceph.redhat.com/static should point to /opt/calamari/webapp/content but does not.
Ugh. Previous message -- messed up the firewall...
Ceph is now up and healthy. The network has two sites. One runs calamari and was the site installation site. The other is a ceph cluster consisting of one mon and 3 osds.
For some reason, this site needed to be rebooted in order for ceph to get healthy.
Closing. Problems here were firewall issues, improper use of --release parameters and confusion caused by 1269684, which I have now opened.