Bug 2227689 - [5.3 backport] [rgw][rfe]: Object reindex tool should recover the index for 'versioned' buckets.
Summary: [5.3 backport] [rgw][rfe]: Object reindex tool should recover the index for '...
Keywords:
Status: CLOSED DUPLICATE of bug 2224636
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 5.3
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 5.3z5
Assignee: Matt Benjamin (redhat)
QA Contact: Madhavi Kasturi
URL:
Whiteboard:
Depends On: 2182385
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-31 06:36 UTC by Bipin Kunal
Modified: 2023-12-07 04:25 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of: 2182385
Environment:
Last Closed: 2023-08-04 06:44:22 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-7107 0 None None None 2023-07-31 06:37:25 UTC

Description Bipin Kunal 2023-07-31 06:36:12 UTC
+++ This bug was initially created as a clone of Bug #2182385 +++

Description of problem:

If a versioned bucket created with num_shards 0 is resharded, the metadata is lost.

The rgw-restore-bucket-index tool helps in recovering the index, but it is yet not supported for versioned buckets.


The https://bugzilla.redhat.com/show_bug.cgi?id=2174235#c49

Version-Release number of selected component (if applicable):


ceph-5.3 latest

How reproducible:

Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Matt Benjamin (redhat) on 2023-03-28 19:33:48 IST ---

Hi Eric,

When you become available, could you review options for tackling index recovery for versioned buckets?

thanks!

Matt

--- Additional comment from Manny on 2023-03-28 20:21:44 IST ---

Severity to Urgent and also linked SFDC #03461245

BR
Manny

--- Additional comment from Manny on 2023-03-29 22:25:00 IST ---

Hello @mbenjamin and @ivancich ,

I need to update this customer.

1.  Do we know if this recovery is even possible?
2.  If yes, when will it be done.

Please let me know,
Best regards,
Manny

--- Additional comment from Matt Benjamin (redhat) on 2023-03-29 22:29:21 IST ---

(In reply to Manny from comment #3)
> Hello @mbenjamin and @ivancich ,
> 
> I need to update this customer.
> 
> 1.  Do we know if this recovery is even possible?
> 2.  If yes, when will it be done.
> 
> Please let me know,
> Best regards,
> Manny

Hi Manny,

1. yes, it is possible
2. it will be done in some number of days, speculatively--in an upstream form, that could be offered as a test fix only.  validation by qe will take further time

I recall seeing somewhere else (and I think it should be in this bz, if you're asking here for info to update the customer)
yesterday that Universite de Strasbourg had some 2700 NON-versioned buckets, and a much smaller (less than 30?) number of
versioned buckets.

Matt

--- Additional comment from Manny on 2023-03-29 22:56:35 IST ---

Hello again @mbenjamin ,

The customer provided this excellent summary, in case you don't want to read it all, only 4 versioned buckets lost their metadata

Best regards,
Manny


Customer's Summary:
~~~
From bucket_list3.txt we generated a bucket_list3.json which contains the metadata of all buckets. Please find the file attached and also bucket_list2.json file based on bucket_list2.txt (when we used radosgw-admin version 14).

We compared the metadata of some buckets and saw that all buckets that were returning Input/Ouput error on previous version of radosgw-admin command contain .data.bucket_info.layout.current_index.gen = 1: 
[root@ceph1-utils case03461245]# jq -r '.[] | select(.data.bucket_info.layout.current_index.gen==1) | .data.bucket_info.bucket.name' /tmp/bucket_list3.json | sort | wc -l
1461

Thus, we assume that the index of all buckets with data.bucket_info.layout.current_index.gen = 1  were lost -> 1461 buckets.


Apparently, we can see that 13 buckets have versioning enabled:
[root@ceph1-utils case03461245]# jq '.[] | "\(.data.bucket_info.bucket.name): \(.data.bucket_info.flags)"' /tmp/bucket_list.json | grep -v ": 0"
"sertit-test: 2"
"sertit-rms: 2"
"dnum-ics-iaas-dev-projects: 2"
"s3_sifac_test_FDV: 2"
"medfilm: 6"
"ahversion: 2"
"terraform-fr-cfe-uca-pcscol_sandbox: 2"
"dnum-ics-iaas-projects: 2"
"terraform-eu-sxb-1-pcscolx: 2"
"terraform-eu-sxb-1-pcscol_sandbox: 2"
"goberle: 2"
"spin-1b5946b5-cf0e-4ac1-9d5d-07de8992b2b5: 2"
"dnum-pci-terraform-state: 2"


The list of these 13 buckets contains the 9 buckets you found with your previous command (buckets which do not have index issues).
Thus, we can conclude that only 4 buckets with versioning enabled are impacted by the issue :
[root@ceph1-utils case03461245]# jq '.[] | "\(.data.bucket_info.bucket.name): \(.data.bucket_info.flags): \(.data.bucket_info.layout.current_index.gen)"' /tmp/bucket_list3.json | grep -v
": 0"
"terraform-fr-cfe-uca-pcscol_sandbox: 2: 1"
"terraform-eu-sxb-1-pcscolx: 2: 1"
"terraform-eu-sxb-1-pcscol_sandbox: 2: 1"
"spin-1b5946b5-cf0e-4ac1-9d5d-07de8992b2b5: 2: 1"
~~~

--- Additional comment from Matt Benjamin (redhat) on 2023-03-29 23:00:28 IST ---

Thanks.  Why is needinfo on me being requested again?

Matt

--- Additional comment from Manny on 2023-03-30 01:38:13 IST ---

Because we need to move to Jira where I can mention you and it doesn't generate a `needs info`. lol, sorry

BR
Manny

--- Additional comment from Manny on 2023-04-18 08:34:16 IST ---

Hello @ivancich and @mbenjamin ,

On Wednesday, it will be 3 weeks. is there any update? An ETA when the script will be ready?

Please let us know,
Best regards,
Manny

--- Additional comment from Manny on 2023-05-05 21:09:20 IST ---

Hello @vumrao ,

This is the BZ we spoke about yesterday.

Best regards,
Manny

--- Additional comment from Vikhyat Umrao on 2023-05-05 21:15:12 IST ---

Manny, Matt confirmed in the ENG/Support sync MTG doc, I have added you there that the prototype of this tool was completed yesterday.

--- Additional comment from Manny on 2023-05-05 23:46:36 IST ---

@vumrao ,

OK, good data point, thank you. I'll just say that there is still no ETA

Also, I'm on PTO, starting 13-May-2023 and ending on 22-May-2023

During that time, he may reach out. He's the actual case owner.

BR
Manny

--- Additional comment from Vikhyat Umrao on 2023-05-06 01:11:49 IST ---

(In reply to Manny from comment #11)
> @vumrao ,
> 
> OK, good data point, thank you. I'll just say that there is still no ETA

Yes, and I think it might be a couple of weeks, Matt - any ETA we can share with the support team?
> 
> Also, I'm on PTO, starting 13-May-2023 and ending on 22-May-2023
> 
> During that time, he may reach out. He's the actual case owner.
> 
> BR
> Manny

--- Additional comment from J. Eric Ivancich on 2023-05-10 19:53:52 IST ---

Update:

I tried to use the CLS calls that would normally be called when writing objects to versioned and non-versioned buckets, but artifacts were left behind related to multi-site, namely log fields. So I'm backtracking and using more basic calls to write to the bucket index directly to avoid the logic associated with other aspects of RGW. I should have the first prototype of that completed today.

Eric

--- Additional comment from Ranjini M N on 2023-05-22 14:54:04 IST ---

Hi Eric, 

Can you please let me know if you would like to include this BZ as a known issue for RHCS 5.3z3 release notes? 

If so, can you please provide the doc text? 

Thanks
Ranjini M N

--- Additional comment from Lijo Stephen Thomas on 2023-06-09 15:19:38 IST ---

Hello Eric,
 
Do we have any update on the progress for the recovery tool for the versioned bucket ?

Regards,
Lijo

--- Additional comment from Lijo Stephen Thomas on 2023-06-29 07:23:09 IST ---

Hello Eric,

Re-visting the bug to check on the progress for the recovery tool for versioned bucket, Can you let me know if we have any update here for the customer.


Regards,
Lijo

--- Additional comment from J. Eric Ivancich on 2023-07-07 00:40:17 IST ---

Added 5 commits to ceph-6.1-rhel-patches.

--- Additional comment from errata-xmlrpc on 2023-07-11 09:41:04 IST ---

This bug has been added to advisory RHBA-2023:116292 by Thomas Serlin (tserlin)

--- Additional comment from errata-xmlrpc on 2023-07-11 09:41:06 IST ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2023:116292-01
https://errata.devel.redhat.com/advisory/116292

--- Additional comment from Akash Raj on 2023-07-13 23:40:01 IST ---

Hi Eric.

Please provide the doc type and doc text.

Thanks.

--- Additional comment from J. Eric Ivancich on 2023-07-18 00:28:36 IST ---

(In reply to Akash Raj from comment #20)
> Hi Eric.
> 
> Please provide the doc type and doc text.
> 
> Thanks.

Done; thanks!

--- Additional comment from Lijo Stephen Thomas on 2023-07-18 06:54:51 IST ---

Hello Matt / Eric,

I see the fix is planned for 6.1, any plans to get this backported for 5.3.z ? or will customer have to upgrade to 6.1 to get this recovery tool to recover the objects ?


Regards,
Lijo

--- Additional comment from J. Eric Ivancich on 2023-07-18 23:50:51 IST ---

I'm happy to see how complicated a backport to 5.3.z5 would be. I understand 5.3.z4 is going GA tomorrow.

--- Additional comment from Manny on 2023-07-29 21:12:07 IST ---

Hello @ivancich and @mbenjamin and @bkunal @linuxkidd @vumrao ,

Sorry to spam so many of you with `NeedInfo`, but this customer has been wait for so long and they are smart Ceph customers. If we can pass them the proper PR link, they can install the tool and fix their 4 buckets and we can close our case. In fact, that's what they did for the original tool for non-versioned buckets. They got the tool out of a PR once we knew it worked.

I'd like to NOT make them wait for Ceph Common to get released.

Please let us know, thank you

Best regards,
Manny

--- Additional comment from J. Eric Ivancich on 2023-07-30 18:22:23 IST ---

Hi Manny,

Here's the PR link:
    https://github.com/ceph/ceph/pull/51071

There are 3 issues I see. It currently needs a rebase. The backports were not trivial. And the fix has not gone through downstream QA yet.

I can handle the rebase on Monday. But the other issues seem to be challenges.

Eric

--- Additional comment from Manny on 2023-07-30 21:50:53 IST ---

(In reply to J. Eric Ivancich from comment #25)
> Hi Manny,
> 
> Here's the PR link:
>     https://github.com/ceph/ceph/pull/51071
> 
> There are 3 issues I see. It currently needs a rebase. The backports were
> not trivial. And the fix has not gone through downstream QA yet.
> 
> I can handle the rebase on Monday. But the other issues seem to be
> challenges.
> 
> Eric

Thanks Eric,

Thanks! This sounds way more done than not. Please let us know when we can share this PR link with the customer. Also, if you are so inclined, you can ask Thomas and/or Ken to make a Ceph Common for them.

Thanks again,
Best regards,
Manny

Comment 2 Bipin Kunal 2023-08-04 06:44:22 UTC

*** This bug has been marked as a duplicate of bug 2224636 ***

Comment 3 Red Hat Bugzilla 2023-12-07 04:25:59 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.