Bug 1844720
| Summary: | [RGW] Buckets/objects deletion is causing orphan rados objects | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Vikhyat Umrao <vumrao> |
| Component: | RGW | Assignee: | J. Eric Ivancich <ivancich> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Rachana Patel <racpatel> |
| Severity: | high | Docs Contact: | Karen Norteman <knortema> |
| Priority: | high | ||
| Version: | 4.0 | CC: | agunn, anharris, cbodley, ceph-eng-bugs, gsitlani, ivancich, jharriga, jmelvin, kbader, knortema, mbenjamin, mmuench, racpatel, sweil, tchandra, tserlin, twilkins, ukurundw, vereddy, vimishra |
| Target Milestone: | --- | ||
| Target Release: | 5.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Known Issue | |
| Doc Text: |
.Deleting buckets or objects in the Ceph Object Gateway causes orphan RADOS objects
Deleting buckets or objects after the Ceph Object Gateway garbage collection (GC) has processed the GC queue causes large quantities of orphan RADOS objects. These RADOS objects are "leaked" data that belonged to the deleted buckets.
Over time, the number of orphan RADOS objects can fill the data pool and degrade the performance of the storage cluster.
To reclaim the space from these orphan RADOS objects, refer to the link:{object-gw-guide}#finding-orphan-and-leaky-objects_rgw[_Finding orphan and leaky objects_] section of the _{storage-product} Object Gateway Configuration and Administration Guide_.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-11-16 18:03:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1816167 | ||
|
Description
Vikhyat Umrao
2020-06-06 15:19:33 UTC
During the upgrade one OSD node daemons and mons and mgrs got upgraded.
[root@f09-h17-b05-5039ms ~]# ceph versions
{
"mon": {
"ceph version 14.2.8-50.el7cp (53387608e81e6aa2487c952a604db06faa5b2cd0) nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.8-50.el7cp (53387608e81e6aa2487c952a604db06faa5b2cd0) nautilus (stable)": 3
},
"osd": {
"ceph version 14.2.4-51.el7cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)": 264,
"ceph version 14.2.8-50.el7cp (53387608e81e6aa2487c952a604db06faa5b2cd0) nautilus (stable)": 24
},
"mds": {},
"rgw": {
"ceph version 14.2.4-51.el7cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)": 11,
"ceph version 14.2.8-50.el7cp (53387608e81e6aa2487c952a604db06faa5b2cd0) nautilus (stable)": 1
},
"overall": {
"ceph version 14.2.4-51.el7cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)": 275,
"ceph version 14.2.8-50.el7cp (53387608e81e6aa2487c952a604db06faa5b2cd0) nautilus (stable)": 31
}
}
We have captured the listing of rados data pool. # rados -p default.rgw.buckets.data ls > rados.list.txt # du -sh rados.list.txt 4.2G rados.list.txt [root@f09-h17-b05-5039ms ~]# cat rados.list.txt | wc -l 50004910 [root@f09-h17-b05-5039ms ~]# cat rados.list.txt | grep shadow | wc -l 50004910 [root@f09-h17-b05-5039ms ~]# The above confirms that all 50M objects are shadow objects. Here's some preliminary analysis.
The number of orphans listed in /root/rados.list.txt is 50,004,910.
All orphans are "shadow" objects.
It looks like all those objects came from 5 buckets. Here is the result of lopping off everything from "_shadow" onwards, sorting what is left, and running through "uniq -c".
16978508 987371de-e3d9-45cf-b9b8-3c1a19cabd59.11841.1_
3675110 987371de-e3d9-45cf-b9b8-3c1a19cabd59.11856.1_
6175225 987371de-e3d9-45cf-b9b8-3c1a19cabd59.11862.1_
6151689 987371de-e3d9-45cf-b9b8-3c1a19cabd59.21284.1_
17024378 987371de-e3d9-45cf-b9b8-3c1a19cabd59.21287.1_
In the narratives above, 5 buckets/containers are mentioned and three are mentioned by name -- mycontainers3, mycontainers5, and mycontainers6 (sometimes without the "s" -- mycontainer6).
Some questions...
1. How many buckets were there over the life of this cluster? If only 5 why is there a bucket named "mycontainers6"?
2. Would it be fair to say that we do not know at this point whether this is an issue with:
a) 4.0,
b) 4.1, or
c) the upgrade from 4.0 to 4.1 while the workload is running?
If that's not fair, what above answers the question?
Eric
(In reply to J. Eric Ivancich from comment #20) Thanks Eric. Response inline. > Here's some preliminary analysis. > > The number of orphans listed in /root/rados.list.txt is 50,004,910. > > All orphans are "shadow" objects. > > It looks like all those objects came from 5 buckets. Here is the result of > lopping off everything from "_shadow" onwards, sorting what is left, and > running through "uniq -c". > > 16978508 987371de-e3d9-45cf-b9b8-3c1a19cabd59.11841.1_ > 3675110 987371de-e3d9-45cf-b9b8-3c1a19cabd59.11856.1_ > 6175225 987371de-e3d9-45cf-b9b8-3c1a19cabd59.11862.1_ > 6151689 987371de-e3d9-45cf-b9b8-3c1a19cabd59.21284.1_ > 17024378 987371de-e3d9-45cf-b9b8-3c1a19cabd59.21287.1_ > > In the narratives above, 5 buckets/containers are mentioned and three are > mentioned by name -- mycontainers3, mycontainers5, and mycontainers6 > (sometimes without the "s" -- mycontainer6). > > Some questions... > > 1. How many buckets were there over the life of this cluster? If only 5 why > is there a bucket named "mycontainers6"? Yes. mycontainers6 is from RHCS 4.1 cluster which Rachana reproduced on RHCS 4.1 cluster. The details are given in comment#15. Before comment#15 all the details are from RHCS 4 cluster and that had 5 containers starting from mycontainers1 to mycontainers5. > 2. Would it be fair to say that we do not know at this point whether this is > an issue with: > a) 4.0, > b) 4.1, or > c) the upgrade from 4.0 to 4.1 while the workload is running? > > If that's not fair, what above answers the question? > It is reproducible in RHCS 4.0 and RHCS 4.1 both comment#15 talks about RHCS 4.1 mycontainers6 bucket. -- vikhyat Thank you, Vikhyat, for clarifying. I believe I've reproduced the issue on 4.1 with a much more simple test case (i.e., no cosbench). And that will allow me to trace to see what's going on. I'll keep all of you posted. Eric (In reply to J. Eric Ivancich from comment #22) > Thank you, Vikhyat, for clarifying. > > I believe I've reproduced the issue on 4.1 with a much more simple test case > (i.e., no cosbench). And that will allow me to trace to see what's going on. > I'll keep all of you posted. > > Eric Thank you, Eric. I apologize. What I thought was a reproducer is not reproducing the issue. Back to the drawing board.... Eric Thank you, Thomas, for getting the build out so quickly! Eric The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |