Bug 1299978

Summary:	Cluster is not achieving active + clean state on setting 'osd_crush_update_on_start = false' during installation
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Rachana Patel <racpatel>
Component:	Documentation	Assignee:	ceph-docs <ceph-docs>
Status:	CLOSED CURRENTRELEASE	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	1.3.2	CC:	adeza, ceph-eng-bugs, hnallurv, hyelloji, jowilkin, kdreyer, ngoswami
Target Milestone:	rc
Target Release:	1.3.2
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-03-01 08:23:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1249045

Description Rachana Patel 2016-01-19 16:22:42 UTC

Description of problem:
======================
Installed ceph via CDN and set 'osd_crush_update_on_start = false' as mentioned in install doc. Cluster was unable to achieve 'active+clean' state.

[racpatel@magna048 ~]$ sudo ceph -s
    cluster c9cf8beb-861e-4aba-a1a2-2734623502cf
     health HEALTH_WARN
            64 pgs stuck inactive
            64 pgs stuck unclean
            too few PGs per OSD (21 < min 30)
     monmap e1: 1 mons at {magna090=10.8.128.90:6789/0}
            election epoch 1, quorum 0 magna090
     osdmap e23: 3 osds: 3 up, 3 in
      pgmap v43: 64 pgs, 1 pools, 0 bytes data, 0 objects
            100656 kB used, 2778 GB / 2778 GB avail
                  64 creating



Version-Release number of selected component (if applicable):
============================================================
0.94.5-1.el7cp.x86_64

 

How reproducible:
=================
always


Steps to Reproduce:
===================
1. Ceph installation via cdn
refering doc 'https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-installation-guide-rhel/blob/5689dfb78e7c07b15ae2b442c298ac314f591622/quick-ceph-deploy.adoc'

2. modified ceph-config file and set 'osd_crush_update_on_start = false'

3. after adding OSD verified cluster state. it never achieved 'active+clean' state

[racpatel@magna048 ~]$ sudo ceph -s
    cluster c9cf8beb-861e-4aba-a1a2-2734623502cf
     health HEALTH_WARN
            64 pgs stuck inactive
            64 pgs stuck unclean
            too few PGs per OSD (21 < min 30)
     monmap e1: 1 mons at {magna090=10.8.128.90:6789/0}
            election epoch 1, quorum 0 magna090
     osdmap e23: 3 osds: 3 up, 3 in
      pgmap v43: 64 pgs, 1 pools, 0 bytes data, 0 objects
            100656 kB used, 2778 GB / 2778 GB avail
                  64 creating


[c1@magna048 ceph-config]$ sudo ceph osd tree
ID WEIGHT TYPE NAME    UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1      0 root default                                   
 0      0 osd.0             up  1.00000          1.00000 
 1      0 osd.1             up  1.00000          1.00000 
 2      0 osd.2             up  1.00000          1.00000 



[racpatel@magna048 ~]$ sudo ceph pg dump
dumped all in format plain
version 43
stamp 2016-01-18 08:37:05.761242
last_osdmap_epoch 23
last_pg_scan 1
full_ratio 0.95
nearfull_ratio 0.85
pg_stat	objects	mip	degr	misp	unf	bytes	log	disklog	state	state_stamp	v	reported	up	up_primary	acting	acting_primary	last_scrub	scrub_stamp	last_deep_scrub	deep_scrub_stamp
0.22	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788131	0'0	2016-01-15 18:02:08.788131
0.21	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788130	0'0	2016-01-15 18:02:08.788130
0.20	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788130	0'0	2016-01-15 18:02:08.788130
0.1f	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788130	0'0	2016-01-15 18:02:08.788130
0.1e	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788129	0'0	2016-01-15 18:02:08.788129
0.1d	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788129	0'0	2016-01-15 18:02:08.788129
0.1c	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788129	0'0	2016-01-15 18:02:08.788129
0.1b	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788128	0'0	2016-01-15 18:02:08.788128
0.1a	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788128	0'0	2016-01-15 18:02:08.788128
0.19	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788128	0'0	2016-01-15 18:02:08.788128
0.18	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788128	0'0	2016-01-15 18:02:08.788128
0.17	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788127	0'0	2016-01-15 18:02:08.788127
0.16	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788127	0'0	2016-01-15 18:02:08.788127
0.15	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788127	0'0	2016-01-15 18:02:08.788127



0.25	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788137	0'0	2016-01-15 18:02:08.788137
0.24	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788137	0'0	2016-01-15 18:02:08.788137
0.23	0	0	0	0	0	0	0	0	creating	0.000000	0'0	0:0	[]	-1	[]	-1	0'0	2016-01-15 18:02:08.788136	0'0	2016-01-15 18:02:08.788136
pool 0	0	0	0	0	0	0	0	0
 sum	0	0	0	0	0	0	0	0
osdstat	kbused	kbavail	kb	hb in	hb out
0	33552	971010736	971044288	[]	[]
1	33552	971010736	971044288	[0,2]	[]
2	33552	971010736	971044288	[0]	[]
 sum	100656	2913032208	2913132864



4. removed 'osd_crush_update_on_start = false' from config file and restarted all daemons and checked output of OSD tree - hierarchy was not build.

5. created CRUSH hierarchy using 'ceph osd crush move' as below:-
[racpatel@magna100 ~]$ sudo ceph osd tree
ID WEIGHT  TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 2.69998 root default                                        
-4 0.89999     host magna106                                   
 1 0.89999         osd.1          up  1.00000          1.00000 
-2 0.89999     host magna101                                   
 0 0.89999         osd.0          up  1.00000          1.00000 
-3 0.89999     host magna117                                   


 cluster achieved 'active+ clean' state.




Actual results:
==============
On setting 'osd_crush_update_on_start = false' during installation , Cluster is not achieving active + clean state.



Additional info:
=================
Did installation in another setup and didnt set 'osd_crush_update_on_start = false' in config file during installation and found that OSD tree shows proper hierarchy and cluster achieved 'active+clean' state.

Comment 2 Ken Dreyer (Red Hat) 2016-01-19 18:43:45 UTC

This change was added to the docs in bug 1249045. Maybe we should revert that change? I'm blocking that bug with this one.

Comment 3 John Wilkins 2016-02-01 23:49:32 UTC

https://gitlab.cee.redhat.com/red-hat-ceph-storage-documentation/doc-Red_Hat_Ceph_Storage_1.3-Installation_Guide_for_Red_Hat_Enterprise_Linux/commit/84838fd0b076b158fe2046b9f89a02ca25a364e7

Comment 4 Hemanth Kumar 2016-02-03 08:40:36 UTC

The section describing about "osd_crush_update_on_start" has been removed from the Doc now.. 
Verified the doc.. Moving to verified state..