1269299 – installing Ceph OSDs on RHEL 7.2 seems to work, but does not create any osds.

Bug 1269299 - installing Ceph OSDs on RHEL 7.2 seems to work, but does not create any osds.

Summary: installing Ceph OSDs on RHEL 7.2 seems to work, but does not create any osds.

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Installer
Sub Component:
Version:	1.2.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	1.3.2
Assignee:	Travis Rhoden
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-10-06 22:42 UTC by Warren
Modified:	2022-02-21 18:19 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-10-08 00:13:45 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Sample output from ceph-deploy osd create magna045:sdb (5.93 KB, text/plain) 2015-10-06 22:42 UTC, Warren	no flags	Details
ceph-deploy mon create-initial output after ceph-deploy install --release firefly was performed. (3.05 KB, text/plain) 2015-10-07 03:49 UTC, Warren	no flags	Details
Result after ceph-deploy mon create-initial (3.05 KB, text/plain) 2015-10-07 03:51 UTC, Warren	no flags	Details
ceph-deploy install output without any subscriptions. (7.68 KB, text/plain) 2015-10-07 22:15 UTC, Warren	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-3348	0	None	None	None	2022-02-21 18:19:23 UTC

Description Warren 2015-10-06 22:42:28 UTC

Created attachment 1080440 [details]
Sample output from ceph-deploy osd create magna045:sdb

Description of problem:

ceph-deploy osd create seems to finish cleanly, but there are errors in the results.

Version-Release number of selected component (if applicable):
Rhel 7.2 -- Ceph 1.2.3


How reproducible:

100% of the time (3 for 3 for me)

Steps to Reproduce:
On two RHEL 7.2 machines
1. run ice_setup on one machine
2. use ceph-deploy to install ceph on the other machine
3. the ceph osd commands are:
         a.    ceph-deploy disk zap magnaXXX:sdb
         b.    ceph-deploy osd create magnaXXX:sdb

The command seems to work, but looking at the output (see attached file), there
appear to be some problems.  The behavior is the same when creating osds on sdc
and sdd as well:

Actual results:

$ sudo ceph osd stat
     osdmap e1: 0 osds: 0 up, 0 in
$ sudo ceph health
     HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds

Expected results:

A healthy ceph should be running

Additional info:

Comment 2 Warren 2015-10-06 22:46:08 UTC

The problems of which I speak in the the attachment are lines like:

[WARNIN] partx: /dev/sdb: error adding partitions 1-2

Comment 3 Warren 2015-10-07 03:49:49 UTC

Created attachment 1080512 [details]
ceph-deploy mon create-initial output after ceph-deploy install --release firefly was performed.

Comment 4 Warren 2015-10-07 03:51:17 UTC

Created attachment 1080513 [details]
Result after ceph-deploy mon create-initial

Comment 5 Warren 2015-10-07 04:03:44 UTC

This problem (the one with mon create-initial failing) happened because ceph-mon, for some reason, was not found on the monitor site.  I ran yum install ceph-mon on that site and this command now works.

Comment 6 Warren 2015-10-07 04:07:19 UTC

Hmmm.  ceph-disk is not installed on the ceph cluster site.

I need to run yum install ceph-osd there too.

Comment 7 Warren 2015-10-07 04:13:31 UTC

After making the changes in the previous comments, ceph osd create magnaXXX:sdX now finishes.  The output (including the error adding partition messages) looked similar to the output in the first file attached.  I also got the message:

[magna045][WARNIN] there is 1 OSD down
[magna045][WARNIN] there is 1 OSD out

after creating the first osd

[magna045][WARNIN] there are 2 OSDs down
[magna045][WARNIN] there are 2 OSDs out

after creating the second osd, and


[magna045][WARNIN] there are 3 OSDs down
[magna045][WARNIN] there are 3 OSDs out

after the third.

This looked ominous to me.  Sure enough, ceph health shows:


HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean

Comment 9 shilpa 2015-10-07 08:27:54 UTC

Raising a new BZ against 1.3

Comment 10 Ken Dreyer (Red Hat) 2015-10-07 14:40:34 UTC

(In reply to Warren from comment #3)
> Created attachment 1080512 [details]
> ceph-deploy mon create-initial output after ceph-deploy install --release
> firefly was performed.

Is "--release firefly" in the RHCS install docs anywhere? I suspect that will grab packages from ceph.com.

Comment 11 Warren 2015-10-07 17:28:17 UTC

Note: This was unclear from reading the comments:

The first problem that I encountered (OSDs appeared to come up but ceph osd stat proved otherwise) happened when I did ceph-deploy install magnaXXX.

Later comments (comment #3 to comment #7) happened when I tried the --release firefly option as an attempt to see if this were a workaround.

Comment 12 Warren 2015-10-07 22:15:31 UTC

Created attachment 1080831 [details]
ceph-deploy install output without any subscriptions.

Comment 13 Warren 2015-10-07 22:20:23 UTC

The attachment in Comment 12 show the output of ceph-deploy install when no subscriptions are changed and the iso is installed from ice_setup.

[magna038][WARNIN] http://magna035.ceph.redhat.com/static/ceph/0.80.8/repodata/repomd.xml: [Errno 14] curl#7 - "Failed connect to magna035.ceph.redhat.com:80; No route to host"

ls /opt/calamari/webapp/content/ceph/0.80.8/repodata shows

403badb7fb44b02aa6b46b0719b345cde3ecca45a93f1a188c7b173a0049b7bf-primary.xml.gz
5f84e6808d08a334da52ddfb24206b7353fefe9363c4d3ef371729b70a6bd687-filelists.xml.gz
a2b491c13826ca6f9a64d0cdf058c7dc076ef7e39696db67522fe9f613990e99-other.sqlite.bz2
aea9774252295df69cf674eb008a794c696610a9a0be4abc5bc77db3f7e1a690-filelists.sqlite.bz2
d71bddc322fc500bfa263430117301c8586cc52ee284c0d144163698c20d84c8-primary.sqlite.bz2
f4a9c4f95e797f6298ff78a6e89e18fd022b2303e874b8d873b4802d5c428fa3-other.xml.gz
repomd.xml
TRANS.TBL


So I am guessing that on magna038, http://magna035.ceph.redhat.com/static should point to /opt/calamari/webapp/content but does not.

Comment 14 Warren 2015-10-07 22:31:26 UTC

Ugh.  Previous message -- messed up the firewall...

Comment 15 Warren 2015-10-07 23:13:32 UTC

Ceph is now up and healthy.  The network has two sites.  One runs calamari and was the site installation site.  The other is a ceph cluster consisting of one mon and 3 osds.  

For some reason, this site needed to be rebooted in order for ceph to get healthy.

Comment 16 Warren 2015-10-08 00:13:45 UTC

Closing.  Problems here were firewall issues, improper use of --release parameters and confusion caused by 1269684, which I have now opened.

Note You need to log in before you can comment on or make changes to this bug.