Bug 1492248 - Need Better Error Message when OSD count is less than osd_pool_default_size
Summary: Need Better Error Message when OSD count is less than osd_pool_default_size
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 3.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: rc
: 3.*
Assignee: Josh Durgin
QA Contact: Manohar Murthy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-15 21:40 UTC by John Wilkins
Modified: 2019-12-11 22:28 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-11 22:28:56 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 38617 0 None None None 2019-03-07 01:36:16 UTC

Description John Wilkins 2017-09-15 21:40:24 UTC
Description of problem:


People exploring Ceph for the first time often set up a minimal cluster (I do it for docs all the time). For years, we have debated the issue of whether a quick startup cluster should have two or three nodes with two or three OSDs.

End users who ignore our advice (and 5 year veterans like me who don't want to set up a three node cluster every time) will encounter health errors that are unintelligible. For example, running ceph-ansible with default size values with the following configuration: 

1. One monitor
2. Two OSDs, each on separate nodes

Will bring a cluster up and running with the following error: 

[root@rhel-mon ~]# ceph health detail
HEALTH_WARN Reduced data availability: 32 pgs inactive; Degraded data redundancy: 32 pgs unclean; too few PGs per OSD (16 < min 30)
PG_AVAILABILITY Reduced data availability: 32 pgs inactive

This is telling a user to add more placement groups, which clearly won't solve the problem. We should have some intelligent error messages that gives the user some useful information. 

For example: 

A) IF (OSD# < osd_pool_default_size)
   PRINT "osd_pool_default_size is x, but there are only y OSDs. Increase the OSD count to achieve an active + clean state." 

B) IF (osd_crush_chooseleaf_type=1 && CRUSH_MAP.nodeCount() < osd_pool_default_size)
   PRINT "The cluster makes (osd_pool_default_size.count) copies across nodes. The cluster has CRUSH_MAP.nodeCount(), which is not enough for the cluster to reach an active+clean state. Add additional nodes to achieve an active + clean state"

We've had this issue for awhile, and I think we should resolve it.

Comment 4 John 2017-11-20 16:17:52 UTC
Oh come on.
Why am I always hitting errors like this.


[ceph-admin@voltaire ceph-cluster]$ ceph -s
  cluster:
    id:     0f0c8dbf-2315-4539-8d2c-05b182e66660
    health: HEALTH_WARN
            Reduced data availability: 256 pgs inactive
            Degraded data redundancy: 256 pgs unclean, 256 pgs degraded, 256 pgs undersized
 
  services:
    mon: 1 daemons, quorum voltaire
    mgr: voltaire(active)
    mds: cephfs-1/1/1 up  {0=voltaire=up:creating}
    osd: 2 osds: 2 up, 2 in
 
  data:
    pools:   2 pools, 256 pgs
    objects: 0 objects, 0 bytes
    usage:   2114 MB used, 1779 MB / 3893 MB avail
    pgs:     100.000% pgs not active
             256 undersized+degraded+peered
 
[ceph-admin@voltaire ceph-cluster]$


I'm currently setting up ceph, luminous release, all on one single el7 VM, with two extra virtual disks attached, with an osd on each. It looks like i now have to shutdown, add another virtual disk to the VM, reboot, and see if i can add another osd & continue.

Thank you John Wilkins, for lodging this bug, and saving future users the anguish that I'm experiencing right now.

Comment 5 karan singh 2018-01-13 02:14:09 UTC
Guys do we have a fix for this ?

I have a 4 OSD nodes each with 4 SSD (16 OSDs in total). Every time when I create ceph cluster (with either 3 OSD nodes OR 4 OSD nodes , Filestore or Bluestore), my cluster state is not healthy 


  cluster:
    id:     2476cb8d-e018-4a5d-909f-0aea06ff11d2
    health: HEALTH_WARN
            Reduced data availability: 544 pgs inactive
            Degraded data redundancy: 544 pgs unclean

  services:
    mon: 3 daemons, quorum ceph-node1.rhcs-test-drive.io,ceph-node2.rhcs-test-drive.io,ceph-node3.rhcs-test-drive.io
    mgr: ceph-admin(active)
    mds: cephfs-1/1/1 up  {0=ceph-admin=up:active}
    osd: 12 osds: 12 up, 12 in
    rgw: 1 daemon active

  data:
    pools:   6 pools, 544 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:     100.000% pgs unknown
             544 unknown

I tried the following with no luck 
- Increased PG count for pools ( as it was complaining too few pgs)
- Tried adding 4th OSD node but no luck
- Tried adding osd crush update on start = false on all OSD nodes (read on ceph mailing list)

Now the interesting part is 

[root@ceph-node2 ~]# ceph health detail
HEALTH_WARN Reduced data availability: 544 pgs inactive; Degraded data redundancy: 544 pgs unclean
PG_AVAILABILITY Reduced data availability: 544 pgs inactive
    pg 1.ad is stuck inactive for 1410.318683, current state unknown, last acting []
    pg 1.ae is stuck inactive for 1410.318683, current state unknown, last acting []
    pg 1.af is stuck inactive for 1410.318683, current state unknown, last acting []
    pg 1.b0 is stuck inactive for 1410.318683, current state unknown, last acting []
    pg 1.b1 is stuck inactive for 1410.318683, current state unknown, last acting []
    pg 1.b2 is stuck inactive for 1410.318683, current state unknown, last acting []



[root@ceph-node2 ~]# ceph pg 1.ad query
{
    "state": "active+clean",
    "snap_trimq": "[]",
    "epoch": 80,
    "up": [
        0,
        10,
        11
    ],
    "acting": [
        0,
        10,
        11
    ],
    "actingbackfill": [
        "0",
        "10",
        "11"
    ],


[root@ceph-node2 ~]# ceph -v
ceph version 12.2.1-40.el7cp (c6d85fd953226c9e8168c9abe81f499d66cc2716) luminous (stable)

ceph-ansible-3.0.14-1


All the PGs are active+clean if i query them, however in ceph -s it reports all PG inactive and unclean (see above output)


Please let me know if you have a workaround for this problem. I could reproduce this 100% of the time.

Comment 6 Neha Ojha 2019-03-06 22:59:03 UTC
This seems like valid request, will create an upstream tracker to track this.

Comment 7 Giridhar Ramaraju 2019-08-05 13:08:32 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 8 Giridhar Ramaraju 2019-08-05 13:09:59 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 9 Josh Durgin 2019-09-27 22:12:11 UTC
Backports are still pending upstream.


Note You need to log in before you can comment on or make changes to this bug.