Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2214864

Summary:	Pools are marked full even if their PGs have no FULL OSDs
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Harsh Kumar <hakumar>
Component:	RADOS	Assignee:	Radoslaw Zarzynski <rzarzyns>
Status:	CLOSED NOTABUG	QA Contact:	Pawan <pdhiran>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	6.1	CC:	bhubbard, ceph-eng-bugs, cephqe-warriors, nojha, rzarzyns, vumrao
Target Milestone:	---
Target Release:	6.1z1
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-07-11 19:51:27 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Harsh Kumar 2023-06-13 23:05:45 UTC

Description of problem:
Scenario is pretty much similar to Red HatBZ#1560802
If One or more OSDs are FULL, IOs are expected to fail for all the pools which have any one these OSDs in their acting set, however, any pool which does not contain any of the FULL OSDs in its acting set should be able to serve IOs and should NOT be marked as 'full (no space)' in health warning.

Version-Release number of selected component (if applicable):
ceph Version 17.2.6-70.el9cp quincy (stable)
ceph version 16.2.10-175.el8cp (8c714c8184d123e241e34f9c0f6abcc1d1858e1c) pacific (stable)

How reproducible:
5/5

Steps to Reproduce:
1. Create a replicated pool with single PG for sake of convenience
2. Fetch acting set for the created pool (say, these OSDs are named x, y, z)
3. Re-weight the OSDs(x,y,z) part of the acting set to 0 to prevent inclusion during creation of subsequent pools
4. Create another single PG replicated pool and a single PG EC pool (single PG is not necessary, just for convenience)
5. Re-weight the 0 weight OSDs(x,y,z) to 1
6. Fetch acting set of new replicated and EC pool, ensure these acting sets do not have any of the OSDs which are part of the first replicated pool
7. Either decrease the nearfill-full, backfill-full, full ratios to 0.6, 0.6, and 0.7 respectively or continue with standard values of 0.85, 0.9, and 0.95
8. Write data to replicated pool 1 using any tool till full capacity is reached.
9. Once OSDs(x,y,z) are full, cluster health warning will have warning about these OSDs being full along with every pool being full including the ones that do not have these OSDs in their acting set


Actual results:
All the pools in the cluster are marked full when one or more OSDs are full

Expected results:
Pools having full OSDs should not be able to serve/accept I/Os and should be marked as full but Pools which do not have full OSDs should not be marked full and should be able to serve I/O

Additional info:
 cephadm -v shell -- ceph osd pool create pool_full_osds 1 1
 cephadm -v shell -- sudo ceph osd pool application enable pool_full_osds rados
 cephadm -v shell -- ceph osd pool set pool_full_osds pg_autoscale_mode off 
 cephadm -v shell -- ceph pg map 21.0 -f json 
 cephadm -v shell -- ceph osd reweight osd.9 0 
 cephadm -v shell -- ceph osd reweight osd.2 0 
 cephadm -v shell -- ceph osd reweight osd.4 0 
 cephadm -v shell -- ceph osd pool create re_pool_test 1 1 
 cephadm -v shell -- sudo ceph osd pool application enable re_pool_test rados 
 cephadm -v shell -- ceph osd pool set re_pool_test pg_autoscale_mode off 
 cephadm -v shell -- ceph osd dump -f json 
 cephadm -v shell -- ceph pg map 22.0 -f json 
 cephadm -v shell -- ceph osd erasure-code-profile set ecprofile_test_ec_pool crush-failure-domain=osd k=4 m=2 plugin=jerasure 
 cephadm -v shell -- ceph osd pool create test_ec_pool 1 1 erasure ecprofile_test_ec_pool 
 cephadm -v shell -- sudo ceph osd pool application enable test_ec_pool rados 
 cephadm -v shell -- ceph osd pool set test_ec_pool pg_autoscale_mode off 
 cephadm -v shell -- ceph pg map 23.0 -f json 
 Acting set of Pool 1: [9, 2, 4] | Acting set of Pool 2: [3, 7, 5] | Acting set of Pool 3: [10, 8, 14, 6, 12, 1]
 cephadm -v shell -- ceph osd set-nearfull-ratio 0.65 
 cephadm -v shell -- ceph osd set-backfillfull-ratio 0.69 
 cephadm -v shell -- ceph osd set-full-ratio 0.70 
 cephadm -v shell -- ceph osd reweight osd.9 1 
 cephadm -v shell -- ceph osd reweight osd.2 1 
 cephadm -v shell -- ceph osd reweight osd.4 1 

 rados bench -p pool_full_osds 200 write  -b 16384KB --no-cleanup --max-objects 1120 on 10.0.208.231 timeout 600

 cephadm -v shell -- ceph health detail 

 HEALTH_ERR 3 full osd(s); 10 pool(s) full
[ERR] OSD_FULL: 3 full osd(s)
    osd.2 is full
    osd.4 is full
    osd.9 is full
[WRN] POOL_FULL: 10 pool(s) full
    pool 'device_health_metrics' is full (no space)
    pool 'cephfs.cephfs.meta' is full (no space)
    pool 'cephfs.cephfs.data' is full (no space)
    pool '.rgw.root' is full (no space)
    pool 'default.rgw.log' is full (no space)
    pool 'default.rgw.control' is full (no space)
    pool 'default.rgw.meta' is full (no space)
    pool 'pool_full_osds' is full (no space)
    pool 're_pool_test' is full (no space)
    pool 'test_ec_pool' is full (no space)

Comment 2 Harsh Kumar 2023-06-14 21:55:44 UTC

@Radosla