Hide Forgot
Description of problem: ====================== Installed ceph via CDN and set 'osd_crush_update_on_start = false' as mentioned in install doc. Cluster was unable to achieve 'active+clean' state. [racpatel@magna048 ~]$ sudo ceph -s cluster c9cf8beb-861e-4aba-a1a2-2734623502cf health HEALTH_WARN 64 pgs stuck inactive 64 pgs stuck unclean too few PGs per OSD (21 < min 30) monmap e1: 1 mons at {magna090=10.8.128.90:6789/0} election epoch 1, quorum 0 magna090 osdmap e23: 3 osds: 3 up, 3 in pgmap v43: 64 pgs, 1 pools, 0 bytes data, 0 objects 100656 kB used, 2778 GB / 2778 GB avail 64 creating Version-Release number of selected component (if applicable): ============================================================ 0.94.5-1.el7cp.x86_64 How reproducible: ================= always Steps to Reproduce: =================== 1. Ceph installation via cdn refering doc 'https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-installation-guide-rhel/blob/5689dfb78e7c07b15ae2b442c298ac314f591622/quick-ceph-deploy.adoc' 2. modified ceph-config file and set 'osd_crush_update_on_start = false' 3. after adding OSD verified cluster state. it never achieved 'active+clean' state [racpatel@magna048 ~]$ sudo ceph -s cluster c9cf8beb-861e-4aba-a1a2-2734623502cf health HEALTH_WARN 64 pgs stuck inactive 64 pgs stuck unclean too few PGs per OSD (21 < min 30) monmap e1: 1 mons at {magna090=10.8.128.90:6789/0} election epoch 1, quorum 0 magna090 osdmap e23: 3 osds: 3 up, 3 in pgmap v43: 64 pgs, 1 pools, 0 bytes data, 0 objects 100656 kB used, 2778 GB / 2778 GB avail 64 creating [c1@magna048 ceph-config]$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0 root default 0 0 osd.0 up 1.00000 1.00000 1 0 osd.1 up 1.00000 1.00000 2 0 osd.2 up 1.00000 1.00000 [racpatel@magna048 ~]$ sudo ceph pg dump dumped all in format plain version 43 stamp 2016-01-18 08:37:05.761242 last_osdmap_epoch 23 last_pg_scan 1 full_ratio 0.95 nearfull_ratio 0.85 pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 0.22 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788131 0'0 2016-01-15 18:02:08.788131 0.21 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788130 0'0 2016-01-15 18:02:08.788130 0.20 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788130 0'0 2016-01-15 18:02:08.788130 0.1f 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788130 0'0 2016-01-15 18:02:08.788130 0.1e 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788129 0'0 2016-01-15 18:02:08.788129 0.1d 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788129 0'0 2016-01-15 18:02:08.788129 0.1c 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788129 0'0 2016-01-15 18:02:08.788129 0.1b 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788128 0'0 2016-01-15 18:02:08.788128 0.1a 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788128 0'0 2016-01-15 18:02:08.788128 0.19 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788128 0'0 2016-01-15 18:02:08.788128 0.18 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788128 0'0 2016-01-15 18:02:08.788128 0.17 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788127 0'0 2016-01-15 18:02:08.788127 0.16 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788127 0'0 2016-01-15 18:02:08.788127 0.15 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788127 0'0 2016-01-15 18:02:08.788127 0.25 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788137 0'0 2016-01-15 18:02:08.788137 0.24 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788137 0'0 2016-01-15 18:02:08.788137 0.23 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 [] -1 0'0 2016-01-15 18:02:08.788136 0'0 2016-01-15 18:02:08.788136 pool 0 0 0 0 0 0 0 0 0 sum 0 0 0 0 0 0 0 0 osdstat kbused kbavail kb hb in hb out 0 33552 971010736 971044288 [] [] 1 33552 971010736 971044288 [0,2] [] 2 33552 971010736 971044288 [0] [] sum 100656 2913032208 2913132864 4. removed 'osd_crush_update_on_start = false' from config file and restarted all daemons and checked output of OSD tree - hierarchy was not build. 5. created CRUSH hierarchy using 'ceph osd crush move' as below:- [racpatel@magna100 ~]$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 2.69998 root default -4 0.89999 host magna106 1 0.89999 osd.1 up 1.00000 1.00000 -2 0.89999 host magna101 0 0.89999 osd.0 up 1.00000 1.00000 -3 0.89999 host magna117 cluster achieved 'active+ clean' state. Actual results: ============== On setting 'osd_crush_update_on_start = false' during installation , Cluster is not achieving active + clean state. Additional info: ================= Did installation in another setup and didnt set 'osd_crush_update_on_start = false' in config file during installation and found that OSD tree shows proper hierarchy and cluster achieved 'active+clean' state.
This change was added to the docs in bug 1249045. Maybe we should revert that change? I'm blocking that bug with this one.
https://gitlab.cee.redhat.com/red-hat-ceph-storage-documentation/doc-Red_Hat_Ceph_Storage_1.3-Installation_Guide_for_Red_Hat_Enterprise_Linux/commit/84838fd0b076b158fe2046b9f89a02ca25a364e7
The section describing about "osd_crush_update_on_start" has been removed from the Doc now.. Verified the doc.. Moving to verified state..