Description of problem: People exploring Ceph for the first time often set up a minimal cluster (I do it for docs all the time). For years, we have debated the issue of whether a quick startup cluster should have two or three nodes with two or three OSDs. End users who ignore our advice (and 5 year veterans like me who don't want to set up a three node cluster every time) will encounter health errors that are unintelligible. For example, running ceph-ansible with default size values with the following configuration: 1. One monitor 2. Two OSDs, each on separate nodes Will bring a cluster up and running with the following error: [root@rhel-mon ~]# ceph health detail HEALTH_WARN Reduced data availability: 32 pgs inactive; Degraded data redundancy: 32 pgs unclean; too few PGs per OSD (16 < min 30) PG_AVAILABILITY Reduced data availability: 32 pgs inactive This is telling a user to add more placement groups, which clearly won't solve the problem. We should have some intelligent error messages that gives the user some useful information. For example: A) IF (OSD# < osd_pool_default_size) PRINT "osd_pool_default_size is x, but there are only y OSDs. Increase the OSD count to achieve an active + clean state." B) IF (osd_crush_chooseleaf_type=1 && CRUSH_MAP.nodeCount() < osd_pool_default_size) PRINT "The cluster makes (osd_pool_default_size.count) copies across nodes. The cluster has CRUSH_MAP.nodeCount(), which is not enough for the cluster to reach an active+clean state. Add additional nodes to achieve an active + clean state" We've had this issue for awhile, and I think we should resolve it.
Oh come on. Why am I always hitting errors like this. [ceph-admin@voltaire ceph-cluster]$ ceph -s cluster: id: 0f0c8dbf-2315-4539-8d2c-05b182e66660 health: HEALTH_WARN Reduced data availability: 256 pgs inactive Degraded data redundancy: 256 pgs unclean, 256 pgs degraded, 256 pgs undersized services: mon: 1 daemons, quorum voltaire mgr: voltaire(active) mds: cephfs-1/1/1 up {0=voltaire=up:creating} osd: 2 osds: 2 up, 2 in data: pools: 2 pools, 256 pgs objects: 0 objects, 0 bytes usage: 2114 MB used, 1779 MB / 3893 MB avail pgs: 100.000% pgs not active 256 undersized+degraded+peered [ceph-admin@voltaire ceph-cluster]$ I'm currently setting up ceph, luminous release, all on one single el7 VM, with two extra virtual disks attached, with an osd on each. It looks like i now have to shutdown, add another virtual disk to the VM, reboot, and see if i can add another osd & continue. Thank you John Wilkins, for lodging this bug, and saving future users the anguish that I'm experiencing right now.
Guys do we have a fix for this ? I have a 4 OSD nodes each with 4 SSD (16 OSDs in total). Every time when I create ceph cluster (with either 3 OSD nodes OR 4 OSD nodes , Filestore or Bluestore), my cluster state is not healthy cluster: id: 2476cb8d-e018-4a5d-909f-0aea06ff11d2 health: HEALTH_WARN Reduced data availability: 544 pgs inactive Degraded data redundancy: 544 pgs unclean services: mon: 3 daemons, quorum ceph-node1.rhcs-test-drive.io,ceph-node2.rhcs-test-drive.io,ceph-node3.rhcs-test-drive.io mgr: ceph-admin(active) mds: cephfs-1/1/1 up {0=ceph-admin=up:active} osd: 12 osds: 12 up, 12 in rgw: 1 daemon active data: pools: 6 pools, 544 pgs objects: 0 objects, 0 bytes usage: 0 kB used, 0 kB / 0 kB avail pgs: 100.000% pgs unknown 544 unknown I tried the following with no luck - Increased PG count for pools ( as it was complaining too few pgs) - Tried adding 4th OSD node but no luck - Tried adding osd crush update on start = false on all OSD nodes (read on ceph mailing list) Now the interesting part is [root@ceph-node2 ~]# ceph health detail HEALTH_WARN Reduced data availability: 544 pgs inactive; Degraded data redundancy: 544 pgs unclean PG_AVAILABILITY Reduced data availability: 544 pgs inactive pg 1.ad is stuck inactive for 1410.318683, current state unknown, last acting [] pg 1.ae is stuck inactive for 1410.318683, current state unknown, last acting [] pg 1.af is stuck inactive for 1410.318683, current state unknown, last acting [] pg 1.b0 is stuck inactive for 1410.318683, current state unknown, last acting [] pg 1.b1 is stuck inactive for 1410.318683, current state unknown, last acting [] pg 1.b2 is stuck inactive for 1410.318683, current state unknown, last acting [] [root@ceph-node2 ~]# ceph pg 1.ad query { "state": "active+clean", "snap_trimq": "[]", "epoch": 80, "up": [ 0, 10, 11 ], "acting": [ 0, 10, 11 ], "actingbackfill": [ "0", "10", "11" ], [root@ceph-node2 ~]# ceph -v ceph version 12.2.1-40.el7cp (c6d85fd953226c9e8168c9abe81f499d66cc2716) luminous (stable) ceph-ansible-3.0.14-1 All the PGs are active+clean if i query them, however in ceph -s it reports all PG inactive and unclean (see above output) Please let me know if you have a workaround for this problem. I could reproduce this 100% of the time.
This seems like valid request, will create an upstream tracker to track this.
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. Regards, Giri
Backports are still pending upstream.