Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2210278

Summary:	OMAP statistics are not gathered even after deep-scrub
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Harsh Kumar <hakumar>
Component:	RADOS	Assignee:	Brad Hubbard <bhubbard>
Status:	CLOSED NOTABUG	QA Contact:	Pawan <pdhiran>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.1	CC:	bhubbard, ceph-eng-bugs, cephqe-warriors, nojha, rfriedma, rzarzyns, vumrao
Target Milestone:	---
Target Release:	6.1z3
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-09-11 22:12:25 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Harsh Kumar 2023-05-26 12:37:24 UTC

Description of problem:
OMAP entries do not show up in ceph df statistics even after deep-scrubbing on the pool is completed.
Reference(from ceph pg dump):
Omap statistics are gathered during deep scrub and may be inaccurate soon afterwards depending on utilization. See http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further details.

Key points -
1. OMAP entries made on a pool show up automatically in the ceph df stats if enough number of omaps are written to the pool (observed previously and is still true)
2. Ideally, OMAP stats should show up if deep-scrub is performed on a pool (was working previously, no longer true)
3. Even without deep-scrub, if OSDs which are part of the pool's acting pg set were restarted, OMAP entries were observed to recognized and displayed in the ceph df stats. (was working previously, no longer true)

As of now, with ceph version 16.2.10-172.el8cp and ceph version 17.2.6-65.el9cp, OMAP entries are accounted for only after deep-scrub and OSD restart both are performed.

Version-Release number of selected component (if applicable):
ceph version 17.2.6-65.el9cp (9b65890b2351d108c4d5fa7a6be7011e9e3d2966) quincy
ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific

How reproducible:
3/3

Steps to Reproduce:
1. Configure a Quincy / Pacific Cluster
2. Create a replicated pool with default config
3. Use the python script attached to write objects and OMAPs to the pool from client
 - curl -k https://raw.githubusercontent.com/red-hat-storage/cephci/master/utility/generate_omap_entries.py -O
 - pip3 install docopt
 - python3 generate_omap_entries.py --pool <pool-name> --start 0 --end 20 --key-count 1000
4. Once script has written 20,000 OMAP entries, check the 'ceph df detail' output; Objects in the pool will be 20, but OMAP entries will be 0
5. Trigger deep-scrub on the concerned pool
 - ceph osd pool deep-scrub <pool-name>
6. Once deep-scrubbing is completed, check 'ceph df detail' stats again; Expectation is that OMAP entries should now be listed for the pool, but actually OMAP entries will remain as 0.
7. Choose an OSD from the acting set of any of PGs belonging to the pool; Login to the OSD node, stop the OSD systemd service and disable it; 
8. Use ceph-objectstore-tool to list the omap entries made for a particular object part of a PG whose acting set contains the chosen OSD.
 - ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-11/ --pgid 6.a "omap_obj_4828_14" list-omap
 All the omaps entries for the object will be displayed
8. Enable and start the OSD service which was stopped in Step 7
9. Check 'ceph df detail' stats and now OMAP entries will be visible against the pool.

Actual results:
Ceph does not have a mechanism to explicitly monitor and account OMAP entries in an object in a replicated pool. But it does recognize any new OMAP entries once the concerned pool is deep-scrubbed. However, it was observed that OMAP entries were not recognized and accounted in the ceph df detail stats even after scrubbing and deep-scrubbing the pool multiple times.

Expected results:
OMAP entries should show up in 'ceph df detail' stats after deep-scrubbing of concerned pool has been completed without the need of an OSD restart.

Additional info:

List of objects and their PGs in the pool 're_pool_3' - 
# for i in `rados ls -p re_pool_3`; do ceph osd map re_pool_3 $i; done
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_8' -> pg 6.948d8c40 (6.0) -> up ([12,4,10], p12) acting ([12,4,10], p12)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_17' -> pg 6.c82764a8 (6.8) -> up ([11,14,18], p11) acting ([11,14,18], p11)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_15' -> pg 6.c539e064 (6.4) -> up ([12,16,15], p12) acting ([12,16,15], p12)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_9' -> pg 6.e888b26c (6.c) -> up ([0,16,18], p0) acting ([0,16,18], p0)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_3' -> pg 6.54f1f05c (6.1c) -> up ([7,14,15], p7) acting ([7,14,15], p7)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_6' -> pg 6.150be582 (6.2) -> up ([19,3,5], p19) acting ([19,3,5], p19)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_14' -> pg 6.3ae1d28a (6.a) -> up ([11,18,4], p11) acting ([11,18,4], p11)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_7' -> pg 6.eb2ff3ea (6.a) -> up ([11,18,4], p11) acting ([11,18,4], p11)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_11' -> pg 6.33c2583a (6.1a) -> up ([4,8,5], p4) acting ([4,8,5], p4)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_4' -> pg 6.bfbecf9e (6.1e) -> up ([4,13,6], p4) acting ([4,13,6], p4)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_19' -> pg 6.ed1607c5 (6.5) -> up ([19,6,13], p19) acting ([19,6,13], p19)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_16' -> pg 6.f101bce5 (6.5) -> up ([19,6,13], p19) acting ([19,6,13], p19)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_0' -> pg 6.54776933 (6.13) -> up ([3,9,17], p3) acting ([3,9,17], p3)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_5' -> pg 6.4cc7260b (6.b) -> up ([15,1,4], p15) acting ([15,1,4], p15)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_10' -> pg 6.f2fab59b (6.1b) -> up ([17,3,9], p17) acting ([17,3,9], p17)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_18' -> pg 6.7fd10dbb (6.1b) -> up ([17,3,9], p17) acting ([17,3,9], p17)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_1' -> pg 6.9346df27 (6.7) -> up ([3,19,5], p3) acting ([3,19,5], p3)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_12' -> pg 6.a6b8eeb7 (6.17) -> up ([18,10,2], p18) acting ([18,10,2], p18)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_13' -> pg 6.219c2a6f (6.f) -> up ([5,3,17], p5) acting ([5,3,17], p5)
osdmap e107 pool 're_pool_3' (6) object 'omap_obj_4828_2' -> pg 6.b76947ff (6.1f) -> up ([5,12,1], p5) acting ([5,12,1], p5)

Output of ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-11/ --pgid 6.a "omap_obj_4828_14" list-omap ===> omap_keys_omap_obj_4828_14.txt (part of attachments) contains the list of 1000 OMAPs available for the object "omap_obj_4828_14"


Key Point #3 above talks about OMAP entries getting recognized just by restarting an OSD.
Test logs -
Pacific - http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-WKW5AV/Omap_creations_on_objects_0.log
Quincy - http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1Q1T5A/Omap_creations_on_objects_0.log

The same tests are now failing as only restart of OSD is no longer sufficient to get OMAP stats.

ceph df details stats for the pool post writing omaps and deep-scrub -
{
            "name": "re_pool_3",
            "id": 6,
            "stats": {
                "stored": 0,
                "stored_data": 0,
                "stored_omap": 0,
                "objects": 20,
                "kb_used": 0,
                "bytes_used": 0,
                "data_bytes_used": 0,
                "omap_bytes_used": 0,
                "percent_used": 0,
                "max_avail": 178747604992,
                "quota_objects": 0,
                "quota_bytes": 0,
                "dirty": 0,
                "rd": 0,
                "rd_bytes": 0,
                "wr": 20,
                "wr_bytes": 491520,
                "compress_bytes_used": 0,
                "compress_under_bytes": 0,
                "stored_raw": 0,
                "avail_raw": 509430663513
            }
}

ceph df details stats for the pool post writing omaps, deep-scrub, and OSD restart - 
{
            "name": "re_pool_3",
            "id": 6,
            "stats": {
                "stored": 26606,
                "stored_data": 0,
                "stored_omap": 26606,
                "objects": 20,
                "kb_used": 78,
                "bytes_used": 79818,
                "data_bytes_used": 0,
                "omap_bytes_used": 79818,
                "percent_used": 1.5668155128878425e-07,
                "max_avail": 169809379328,
                "quota_objects": 0,
                "quota_bytes": 0,
                "dirty": 0,
                "rd": 0,
                "rd_bytes": 0,
                "wr": 20,
                "wr_bytes": 491520,
                "compress_bytes_used": 0,
                "compress_under_bytes": 0,
                "stored_raw": 79818,
                "avail_raw": 509428123993
            }
}

Attachment contains stdouts of ceph df detail and ceph pg dump after every step of BZ reproduction.
Cluster logs are also attached.

Comment 1 RHEL Program Management 2023-05-26 12:37:36 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 15 Red Hat Bugzilla 2024-01-10 04:25:27 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days