Bug 1269299 - installing Ceph OSDs on RHEL 7.2 seems to work, but does not create any osds.
Summary: installing Ceph OSDs on RHEL 7.2 seems to work, but does not create any osds.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Installer
Version: 1.2.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 1.3.2
Assignee: Travis Rhoden
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-06 22:42 UTC by Warren
Modified: 2022-02-21 18:19 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-08 00:13:45 UTC
Embargoed:


Attachments (Terms of Use)
Sample output from ceph-deploy osd create magna045:sdb (5.93 KB, text/plain)
2015-10-06 22:42 UTC, Warren
no flags Details
ceph-deploy mon create-initial output after ceph-deploy install --release firefly was performed. (3.05 KB, text/plain)
2015-10-07 03:49 UTC, Warren
no flags Details
Result after ceph-deploy mon create-initial (3.05 KB, text/plain)
2015-10-07 03:51 UTC, Warren
no flags Details
ceph-deploy install output without any subscriptions. (7.68 KB, text/plain)
2015-10-07 22:15 UTC, Warren
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-3348 0 None None None 2022-02-21 18:19:23 UTC

Description Warren 2015-10-06 22:42:28 UTC
Created attachment 1080440 [details]
Sample output from ceph-deploy osd create magna045:sdb

Description of problem:

ceph-deploy osd create seems to finish cleanly, but there are errors in the results.

Version-Release number of selected component (if applicable):
Rhel 7.2 -- Ceph 1.2.3


How reproducible:

100% of the time (3 for 3 for me)

Steps to Reproduce:
On two RHEL 7.2 machines
1. run ice_setup on one machine
2. use ceph-deploy to install ceph on the other machine
3. the ceph osd commands are:
         a.    ceph-deploy disk zap magnaXXX:sdb
         b.    ceph-deploy osd create magnaXXX:sdb

The command seems to work, but looking at the output (see attached file), there
appear to be some problems.  The behavior is the same when creating osds on sdc
and sdd as well:

Actual results:

$ sudo ceph osd stat
     osdmap e1: 0 osds: 0 up, 0 in
$ sudo ceph health
     HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds

Expected results:

A healthy ceph should be running

Additional info:

Comment 2 Warren 2015-10-06 22:46:08 UTC
The problems of which I speak in the the attachment are lines like:

[WARNIN] partx: /dev/sdb: error adding partitions 1-2

Comment 3 Warren 2015-10-07 03:49:49 UTC
Created attachment 1080512 [details]
ceph-deploy mon create-initial output after ceph-deploy install --release firefly was performed.

Comment 4 Warren 2015-10-07 03:51:17 UTC
Created attachment 1080513 [details]
Result after ceph-deploy mon create-initial

Comment 5 Warren 2015-10-07 04:03:44 UTC
This problem (the one with mon create-initial failing) happened because ceph-mon, for some reason, was not found on the monitor site.  I ran yum install ceph-mon on that site and this command now works.

Comment 6 Warren 2015-10-07 04:07:19 UTC
Hmmm.  ceph-disk is not installed on the ceph cluster site.

I need to run yum install ceph-osd there too.

Comment 7 Warren 2015-10-07 04:13:31 UTC
After making the changes in the previous comments, ceph osd create magnaXXX:sdX now finishes.  The output (including the error adding partition messages) looked similar to the output in the first file attached.  I also got the message:

[magna045][WARNIN] there is 1 OSD down
[magna045][WARNIN] there is 1 OSD out

after creating the first osd

[magna045][WARNIN] there are 2 OSDs down
[magna045][WARNIN] there are 2 OSDs out

after creating the second osd, and


[magna045][WARNIN] there are 3 OSDs down
[magna045][WARNIN] there are 3 OSDs out

after the third.

This looked ominous to me.  Sure enough, ceph health shows:


HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean

Comment 9 shilpa 2015-10-07 08:27:54 UTC
Raising a new BZ against 1.3

Comment 10 Ken Dreyer (Red Hat) 2015-10-07 14:40:34 UTC
(In reply to Warren from comment #3)
> Created attachment 1080512 [details]
> ceph-deploy mon create-initial output after ceph-deploy install --release
> firefly was performed.

Is "--release firefly" in the RHCS install docs anywhere? I suspect that will grab packages from ceph.com.

Comment 11 Warren 2015-10-07 17:28:17 UTC
Note: This was unclear from reading the comments:

The first problem that I encountered (OSDs appeared to come up but ceph osd stat proved otherwise) happened when I did ceph-deploy install magnaXXX.

Later comments (comment #3 to comment #7) happened when I tried the --release firefly option as an attempt to see if this were a workaround.

Comment 12 Warren 2015-10-07 22:15:31 UTC
Created attachment 1080831 [details]
ceph-deploy install output without any subscriptions.

Comment 13 Warren 2015-10-07 22:20:23 UTC
The attachment in Comment 12 show the output of ceph-deploy install when no subscriptions are changed and the iso is installed from ice_setup.

[magna038][WARNIN] http://magna035.ceph.redhat.com/static/ceph/0.80.8/repodata/repomd.xml: [Errno 14] curl#7 - "Failed connect to magna035.ceph.redhat.com:80; No route to host"

ls /opt/calamari/webapp/content/ceph/0.80.8/repodata shows

403badb7fb44b02aa6b46b0719b345cde3ecca45a93f1a188c7b173a0049b7bf-primary.xml.gz
5f84e6808d08a334da52ddfb24206b7353fefe9363c4d3ef371729b70a6bd687-filelists.xml.gz
a2b491c13826ca6f9a64d0cdf058c7dc076ef7e39696db67522fe9f613990e99-other.sqlite.bz2
aea9774252295df69cf674eb008a794c696610a9a0be4abc5bc77db3f7e1a690-filelists.sqlite.bz2
d71bddc322fc500bfa263430117301c8586cc52ee284c0d144163698c20d84c8-primary.sqlite.bz2
f4a9c4f95e797f6298ff78a6e89e18fd022b2303e874b8d873b4802d5c428fa3-other.xml.gz
repomd.xml
TRANS.TBL


So I am guessing that on magna038, http://magna035.ceph.redhat.com/static should point to /opt/calamari/webapp/content but does not.

Comment 14 Warren 2015-10-07 22:31:26 UTC
Ugh.  Previous message -- messed up the firewall...

Comment 15 Warren 2015-10-07 23:13:32 UTC
Ceph is now up and healthy.  The network has two sites.  One runs calamari and was the site installation site.  The other is a ceph cluster consisting of one mon and 3 osds.  

For some reason, this site needed to be rebooted in order for ceph to get healthy.

Comment 16 Warren 2015-10-08 00:13:45 UTC
Closing.  Problems here were firewall issues, improper use of --release parameters and confusion caused by 1269684, which I have now opened.


Note You need to log in before you can comment on or make changes to this bug.