This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1269299 - installing Ceph OSDs on RHEL 7.2 seems to work, but does not create any osds.
installing Ceph OSDs on RHEL 7.2 seems to work, but does not create any osds.
Status: CLOSED NOTABUG
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Installer (Show other bugs)
1.2.3
Unspecified Unspecified
unspecified Severity unspecified
: rc
: 1.3.2
Assigned To: Travis Rhoden
ceph-qe-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-06 18:42 EDT by Warren
Modified: 2015-10-07 20:13 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-07 20:13:45 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Sample output from ceph-deploy osd create magna045:sdb (5.93 KB, text/plain)
2015-10-06 18:42 EDT, Warren
no flags Details
ceph-deploy mon create-initial output after ceph-deploy install --release firefly was performed. (3.05 KB, text/plain)
2015-10-06 23:49 EDT, Warren
no flags Details
Result after ceph-deploy mon create-initial (3.05 KB, text/plain)
2015-10-06 23:51 EDT, Warren
no flags Details
ceph-deploy install output without any subscriptions. (7.68 KB, text/plain)
2015-10-07 18:15 EDT, Warren
no flags Details

  None (edit)
Description Warren 2015-10-06 18:42:28 EDT
Created attachment 1080440 [details]
Sample output from ceph-deploy osd create magna045:sdb

Description of problem:

ceph-deploy osd create seems to finish cleanly, but there are errors in the results.

Version-Release number of selected component (if applicable):
Rhel 7.2 -- Ceph 1.2.3


How reproducible:

100% of the time (3 for 3 for me)

Steps to Reproduce:
On two RHEL 7.2 machines
1. run ice_setup on one machine
2. use ceph-deploy to install ceph on the other machine
3. the ceph osd commands are:
         a.    ceph-deploy disk zap magnaXXX:sdb
         b.    ceph-deploy osd create magnaXXX:sdb

The command seems to work, but looking at the output (see attached file), there
appear to be some problems.  The behavior is the same when creating osds on sdc
and sdd as well:

Actual results:

$ sudo ceph osd stat
     osdmap e1: 0 osds: 0 up, 0 in
$ sudo ceph health
     HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds

Expected results:

A healthy ceph should be running

Additional info:
Comment 2 Warren 2015-10-06 18:46:08 EDT
The problems of which I speak in the the attachment are lines like:

[WARNIN] partx: /dev/sdb: error adding partitions 1-2
Comment 3 Warren 2015-10-06 23:49 EDT
Created attachment 1080512 [details]
ceph-deploy mon create-initial output after ceph-deploy install --release firefly was performed.
Comment 4 Warren 2015-10-06 23:51 EDT
Created attachment 1080513 [details]
Result after ceph-deploy mon create-initial
Comment 5 Warren 2015-10-07 00:03:44 EDT
This problem (the one with mon create-initial failing) happened because ceph-mon, for some reason, was not found on the monitor site.  I ran yum install ceph-mon on that site and this command now works.
Comment 6 Warren 2015-10-07 00:07:19 EDT
Hmmm.  ceph-disk is not installed on the ceph cluster site.

I need to run yum install ceph-osd there too.
Comment 7 Warren 2015-10-07 00:13:31 EDT
After making the changes in the previous comments, ceph osd create magnaXXX:sdX now finishes.  The output (including the error adding partition messages) looked similar to the output in the first file attached.  I also got the message:

[magna045][WARNIN] there is 1 OSD down
[magna045][WARNIN] there is 1 OSD out

after creating the first osd

[magna045][WARNIN] there are 2 OSDs down
[magna045][WARNIN] there are 2 OSDs out

after creating the second osd, and


[magna045][WARNIN] there are 3 OSDs down
[magna045][WARNIN] there are 3 OSDs out

after the third.

This looked ominous to me.  Sure enough, ceph health shows:


HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean
Comment 9 shilpa 2015-10-07 04:27:54 EDT
Raising a new BZ against 1.3
Comment 10 Ken Dreyer (Red Hat) 2015-10-07 10:40:34 EDT
(In reply to Warren from comment #3)
> Created attachment 1080512 [details]
> ceph-deploy mon create-initial output after ceph-deploy install --release
> firefly was performed.

Is "--release firefly" in the RHCS install docs anywhere? I suspect that will grab packages from ceph.com.
Comment 11 Warren 2015-10-07 13:28:17 EDT
Note: This was unclear from reading the comments:

The first problem that I encountered (OSDs appeared to come up but ceph osd stat proved otherwise) happened when I did ceph-deploy install magnaXXX.

Later comments (comment #3 to comment #7) happened when I tried the --release firefly option as an attempt to see if this were a workaround.
Comment 12 Warren 2015-10-07 18:15 EDT
Created attachment 1080831 [details]
ceph-deploy install output without any subscriptions.
Comment 13 Warren 2015-10-07 18:20:23 EDT
The attachment in Comment 12 show the output of ceph-deploy install when no subscriptions are changed and the iso is installed from ice_setup.

[magna038][WARNIN] http://magna035.ceph.redhat.com/static/ceph/0.80.8/repodata/repomd.xml: [Errno 14] curl#7 - "Failed connect to magna035.ceph.redhat.com:80; No route to host"

ls /opt/calamari/webapp/content/ceph/0.80.8/repodata shows

403badb7fb44b02aa6b46b0719b345cde3ecca45a93f1a188c7b173a0049b7bf-primary.xml.gz
5f84e6808d08a334da52ddfb24206b7353fefe9363c4d3ef371729b70a6bd687-filelists.xml.gz
a2b491c13826ca6f9a64d0cdf058c7dc076ef7e39696db67522fe9f613990e99-other.sqlite.bz2
aea9774252295df69cf674eb008a794c696610a9a0be4abc5bc77db3f7e1a690-filelists.sqlite.bz2
d71bddc322fc500bfa263430117301c8586cc52ee284c0d144163698c20d84c8-primary.sqlite.bz2
f4a9c4f95e797f6298ff78a6e89e18fd022b2303e874b8d873b4802d5c428fa3-other.xml.gz
repomd.xml
TRANS.TBL


So I am guessing that on magna038, http://magna035.ceph.redhat.com/static should point to /opt/calamari/webapp/content but does not.
Comment 14 Warren 2015-10-07 18:31:26 EDT
Ugh.  Previous message -- messed up the firewall...
Comment 15 Warren 2015-10-07 19:13:32 EDT
Ceph is now up and healthy.  The network has two sites.  One runs calamari and was the site installation site.  The other is a ceph cluster consisting of one mon and 3 osds.  

For some reason, this site needed to be rebooted in order for ceph to get healthy.
Comment 16 Warren 2015-10-07 20:13:45 EDT
Closing.  Problems here were firewall issues, improper use of --release parameters and confusion caused by 1269684, which I have now opened.

Note You need to log in before you can comment on or make changes to this bug.