2227842 – [rgw-ms][object-encryption][multipart upload]: TOOLING: On multisite setup where SSE-S3 configured, there is md5sum mismatch while downloading multipart object.

Bug 2227842 - [rgw-ms][object-encryption][multipart upload]: TOOLING: On multisite setup where SSE-S3 configured, there is md5sum mismatch while downloading multipart object.

Summary: [rgw-ms][object-encryption][multipart upload]: TOOLING: On multisite setup wh...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RGW-Multisite
Sub Component:
Version:	6.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	6.1z2
Assignee:	Casey Bodley
QA Contact:	Vidushi Mishra
Docs Contact:	Akash Raj
URL:
Whiteboard:
Depends On:	2162337
Blocks:	2192813 2214252 2221020 2228203 2235257
TreeView+	depends on / blocked

Reported:	2023-07-31 16:18 UTC by Matt Benjamin (redhat)
Modified:	2024-03-03 04:25 UTC (History)
CC List:	17 users (show)
Fixed In Version:	ceph-17.2.6-139.el9cp
Doc Type:	Release Note
Doc Text:	.Original multipart uploads can now be identified in multi-site configurations Previously, a data corruption bug was fixed in 6.1z1 that effected multipart uploads with server-side encryption in multi-site configurations. With this enhancement, a new tool, `radosgw-admin bucket resync encrypted multipart`, can be used to identify these original multi-part uploads. The `LastModified` timestamp of any identified object is incremented by 1 NS to cause peer zones to replicate it again. For multi-site deployments that make any use of Server-Side encryption, users are recommended to run this command against every bucket in every zone after all zones have upgraded.
Clone Of:	2162337
Clones:	2228203 (view as bug list)
Environment:
Last Closed:	2023-10-12 16:34:33 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-7117	None	None	None	2023-07-31 16:19:44 UTC
Red Hat Knowledge Base (Solution)	7019437	None	None	None	2023-08-03 02:54:24 UTC
Red Hat Product Errata	RHSA-2023:5693	None	None	None	2023-10-12 16:35:31 UTC

Description Matt Benjamin (redhat) 2023-07-31 16:18:04 UTC

+++ This bug was initially created as a clone of Bug #2162337 +++

**
This bz tracks new tooling to re-replicate compressed/encrypted objects potentially affected by offset corruption.

**


Description of problem:
On multisite setup where SSE-S3 configured, There is md5sum mismatch while downloading multipart object.

Version-Release number of selected component (if applicable):
17.2.5-63.el9cp

How reproducible:
[2/2]

Steps to Reproduce:
1. Deploy 6.0 multisite setup
2. Configure encryption by setting SSE-S3 values in ceph on both sites
3. Create bucket and upload multiprt object with --server-side-encryption parameter from site 1 
4. Wait for sync to get completed
5. Download same object from site 2

Actual results:
After downloading multipart object, it is throwing WARNING message. 

WARNING: MD5 signatures do not match: computed=91396133da891e8548a25de732267f08, received=0a5fa92c9bec8158d53f393db954060c

Expected results:
MD5 signature should match after downloading object

Additional info:
Below are the command outputs from both the sites

Site 1:
[root@extensa008 ~]# md5sum site1\ logs.log 
0a5fa92c9bec8158d53f393db954060c  site1 logs.log
[root@extensa008 ~]# /usr/local/bin/s3cmd put site1\ logs.log s3://ud-tst/new_o2 --server-side-encryption
upload: 'site1 logs.log' -> 's3://ud-tst/new_o2'  [part 1 of 3, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    12.98 MB/s  done
upload: 'site1 logs.log' -> 's3://ud-tst/new_o2'  [part 2 of 3, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    13.06 MB/s  done
upload: 'site1 logs.log' -> 's3://ud-tst/new_o2'  [part 3 of 3, 4MB] [1 of 1]
 4590426 of 4590426   100% in    0s    10.95 MB/s  done
[root@extensa008 ~]# /usr/local/bin/s3cmd info s3://ud-tst/new_o2
s3://ud-tst/new_o2 (object):
   File size: 15076186
   Last mod:  Thu, 19 Jan 2023 08:00:25 GMT
   MIME type: text/plain
   Storage:   STANDARD
   MD5 sum:   0a5fa92c9bec8158d53f393db954060c
   SSE:       AES256
   Policy:    none
   CORS:      none
   ACL:       ud: FULL_CONTROL
   x-amz-meta-s3cmd-attrs: atime:1674115130/ctime:1674115095/gid:0/gname:root/md5:0a5fa92c9bec8158d53f393db954060c/mode:33188/mtime:1674115095/uid:0/uname:root




Site 2:
[root@magna061 ~]# /usr/local/bin/s3cmd info s3://ud-tst/new_o2
s3://ud-tst/new_o2 (object):
   File size: 15076186
   Last mod:  Thu, 19 Jan 2023 08:00:25 GMT
   MIME type: text/plain
   Storage:   STANDARD
   MD5 sum:   0a5fa92c9bec8158d53f393db954060c
   SSE:       AES256
   Policy:    none
   CORS:      none
   ACL:       ud: FULL_CONTROL
   x-amz-meta-s3cmd-attrs: atime:1674115130/ctime:1674115095/gid:0/gname:root/md5:0a5fa92c9bec8158d53f393db954060c/mode:33188/mtime:1674115095/uid:0/uname:root
[root@magna061 ~]# /usr/local/bin/s3cmd get s3://ud-tst/new_o2 site1\ logs.log
download: 's3://ud-tst/new_o2' -> 'site1 logs.log'  [1 of 1]
 15076186 of 15076186   100% in    2s     6.68 MB/s  done
WARNING: MD5 signatures do not match: computed=91396133da891e8548a25de732267f08, received=0a5fa92c9bec8158d53f393db954060c
[root@magna061 ~]# wc -l site1\ logs.log 
221880 site1 logs.log
[root@magna061 ~]# md5sum site1\ logs.log 
91396133da891e8548a25de732267f08  site1 logs.log

--- Additional comment from RHEL Program Management on 2023-01-20 06:17:24 UTC ---

This bug report has Keywords: Regression or TestBlocker.

Since no regressions or test blockers are allowed between releases, it is being proposed as a blocker for this release.

Please resolve \triage ASAP.

--- Additional comment from Madhavi Kasturi on 2023-01-20 06:57:42 UTC ---

Marking this BZ as blocker, as the download of object indicates corruption(md5sum mismatch)

--- Additional comment from Uday kurundwade on 2023-01-20 07:21:25 UTC ---

Additional info:

We tried uploding big zip file from site 1 and downloaded same file on site 2.
Once download gets completed, we unzipped the .zip file for crosschecking data curruption and we encountered error that "central file header signature not found".


Providing command outputs bolow from both the sites

Site 1:

[root@extensa008 ud]# md5sum 0.4.2.c4.zip 
bbc61b47bb298688cc461c39847d335e  0.4.2.c4.zip
[root@extensa008 ud]# /usr/local/bin/s3cmd put 0.4.2.c4.zip s3://ud-tst/0.4.2.c4.zip --server-side-encryption
upload: '0.4.2.c4.zip' -> 's3://ud-tst/0.4.2.c4.zip'  [part 1 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    13.12 MB/s  done
upload: '0.4.2.c4.zip' -> 's3://ud-tst/0.4.2.c4.zip'  [part 2 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    11.12 MB/s  done
upload: '0.4.2.c4.zip' -> 's3://ud-tst/0.4.2.c4.zip'  [part 3 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    11.32 MB/s  done
upload: '0.4.2.c4.zip' -> 's3://ud-tst/0.4.2.c4.zip'  [part 4 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    13.05 MB/s  done
upload: '0.4.2.c4.zip' -> 's3://ud-tst/0.4.2.c4.zip'  [part 5 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    12.01 MB/s  done
upload: '0.4.2.c4.zip' -> 's3://ud-tst/0.4.2.c4.zip'  [part 6 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    12.50 MB/s  done
upload: '0.4.2.c4.zip' -> 's3://ud-tst/0.4.2.c4.zip'  [part 7 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    12.25 MB/s  done
upload: '0.4.2.c4.zip' -> 's3://ud-tst/0.4.2.c4.zip'  [part 8 of 8, 2MB] [1 of 1]
 2750436 of 2750436   100% in    0s    11.67 MB/s  done
[root@extensa008 ud]# /usr/local/bin/s3cmd info s3://ud-tst/0.4.2.c4.zip
s3://ud-tst/0.4.2.c4.zip (object):
   File size: 39450596
   Last mod:  Fri, 20 Jan 2023 07:01:40 GMT
   MIME type: application/zip
   Storage:   STANDARD
   MD5 sum:   bbc61b47bb298688cc461c39847d335e
   SSE:       AES256
   Policy:    none
   CORS:      none
   ACL:       ud: FULL_CONTROL
   x-amz-meta-s3cmd-attrs: atime:1674198037/ctime:1674198037/gid:0/gname:root/md5:bbc61b47bb298688cc461c39847d335e/mode:33188/mtime:1638950136/uid:0/uname:root
[root@extensa008 ud]# md5sum 0.4.2.c4.zip 
bbc61b47bb298688cc461c39847d335e  0.4.2.c4.zip
[root@extensa008 ud]# /usr/local/bin/s3cmd get s3://ud-tst/0.4.2.c4.zip 0.4.2.c4_1.zip
download: 's3://ud-tst/0.4.2.c4.zip' -> '0.4.2.c4_1.zip'  [1 of 1]
 39450596 of 39450596   100% in    0s    88.95 MB/s  done
[root@extensa008 ud]# ls
0.4.2.c4_1.zip  0.4.2.c4.zip
[root@extensa008 ud]# md5sum 0.4.2.c4*
bbc61b47bb298688cc461c39847d335e  0.4.2.c4_1.zip
bbc61b47bb298688cc461c39847d335e  0.4.2.c4.zip



Site 2:


[root@magna061 ud]# /usr/local/bin/s3cmd info s3://ud-tst/0.4.2.c4.zip
s3://ud-tst/0.4.2.c4.zip (object):
   File size: 39450596
   Last mod:  Fri, 20 Jan 2023 07:01:40 GMT
   MIME type: application/zip
   Storage:   STANDARD
   MD5 sum:   bbc61b47bb298688cc461c39847d335e
   SSE:       AES256
   Policy:    none
   CORS:      none
   ACL:       ud: FULL_CONTROL
   x-amz-meta-s3cmd-attrs: atime:1674198037/ctime:1674198037/gid:0/gname:root/md5:bbc61b47bb298688cc461c39847d335e/mode:33188/mtime:1638950136/uid:0/uname:root
[root@magna061 ud]# /usr/local/bin/s3cmd get s3://ud-tst/0.4.2.c4.zip 0.4.2.c4_1.zip
download: 's3://ud-tst/0.4.2.c4.zip' -> '0.4.2.c4_1.zip'  [1 of 1]
 39450596 of 39450596   100% in    5s     6.69 MB/s  done
WARNING: MD5 signatures do not match: computed=b2dd42385c5645445db64cc4c6c5a8dd, received=bbc61b47bb298688cc461c39847d335e
[root@magna061 ud]# ls
0.4.2.c4_1.zip
[root@magna061 ud]# md5sum 0.4.2.c4_1.zip 
b2dd42385c5645445db64cc4c6c5a8dd  0.4.2.c4_1.zip
[root@magna061 ud]# unzip 0.4.2.c4_1.zip 
Archive:  0.4.2.c4_1.zip
error:  expected central file header signature not found (file #63).
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
   creating: 0.4.2.c4/
  inflating: 0.4.2.c4/3rd-party-licenses.pdf  
   creating: 0.4.2.c4/archive/
  inflating: 0.4.2.c4/BUILD.md       
 extracting: 0.4.2.c4/BUILD.no       
  inflating: 0.4.2.c4/CHANGELOG      
  inflating: 0.4.2.c4/cli.sh         
   creating: 0.4.2.c4/conf/
   creating: 0.4.2.c4/conf/.controller/
  inflating: 0.4.2.c4/conf/.controller/config.ini  
   creating: 0.4.2.c4/conf/.driver/
  inflating: 0.4.2.c4/conf/.driver/config.ini  
  inflating: 0.4.2.c4/conf/ampli-config-sample.xml  
  inflating: 0.4.2.c4/conf/cdmi-base-config-sample.xml  
  inflating: 0.4.2.c4/conf/cdmi-swift-config-sample.xml  
  inflating: 0.4.2.c4/conf/controller-tomcat-server.xml  
  inflating: 0.4.2.c4/conf/controller.conf  
  inflating: 0.4.2.c4/conf/cosbench-users.xml  
  inflating: 0.4.2.c4/conf/delay-stage-config-sample.xml  
  inflating: 0.4.2.c4/conf/driver-tomcat-server.xml  
  inflating: 0.4.2.c4/conf/driver-tomcat-server2.xml  
  inflating: 0.4.2.c4/conf/driver-tomcat-server_template.xml  
 extracting: 0.4.2.c4/conf/driver.conf  
 extracting: 0.4.2.c4/conf/driver_template.conf  
  inflating: 0.4.2.c4/conf/filewriter-config-explanation.txt  
  inflating: 0.4.2.c4/conf/gcs-config-sample.xml  
  inflating: 0.4.2.c4/conf/gcs-service-account-sample.json  
  inflating: 0.4.2.c4/conf/hashcheck.xml  
  inflating: 0.4.2.c4/conf/librados-config-sample.xml  
  inflating: 0.4.2.c4/conf/librados-sample-annotated.xml  
  inflating: 0.4.2.c4/conf/noop-config.xml  
  inflating: 0.4.2.c4/conf/noop-read-config.xml  
  inflating: 0.4.2.c4/conf/noop-write-config.xml  
  inflating: 0.4.2.c4/conf/openio-config-sample.xml  
  inflating: 0.4.2.c4/conf/reusedata.xml  
  inflating: 0.4.2.c4/conf/s3-config-sample.xml  
  inflating: 0.4.2.c4/conf/splitrw.xml  
  inflating: 0.4.2.c4/conf/sproxyd-config-sample.xml  
  inflating: 0.4.2.c4/conf/swift-config-sample.xml  
  inflating: 0.4.2.c4/conf/workload-config.xml  
  inflating: 0.4.2.c4/cosbench-start.sh  
  inflating: 0.4.2.c4/cosbench-stop.sh  
  inflating: 0.4.2.c4/COSBenchAdaptorDevGuide.pdf  
  inflating: 0.4.2.c4/COSBenchUserGuide.pdf  
   creating: 0.4.2.c4/ext/
   creating: 0.4.2.c4/ext/adaptor/
   creating: 0.4.2.c4/ext/adaptor/abc-auth/
  inflating: 0.4.2.c4/ext/adaptor/abc-auth/.classpath  
  inflating: 0.4.2.c4/ext/adaptor/abc-auth/.project  
   creating: 0.4.2.c4/ext/adaptor/abc-auth/.settings/
  inflating: 0.4.2.c4/ext/adaptor/abc-auth/.settings/org.eclipse.jdt.core.prefs  
   creating: 0.4.2.c4/ext/adaptor/abc-auth/bin/
   creating: 0.4.2.c4/ext/adaptor/abc-auth/bin/com/
   creating: 0.4.2.c4/ext/adaptor/abc-auth/bin/com/abc/
   creating: 0.4.2.c4/ext/adaptor/abc-auth/bin/com/abc/api/
   creating: 0.4.2.c4/ext/adaptor/abc-auth/bin/com/abc/api/abcAuth/
  inflating: 0.4.2.c4/ext/adaptor/abc-auth/bin/com/abc/api/abcAuth/AbcAuth.class  
  inflating: 0.4.2.c4/ext/adaptor/abc-auth/bin/com/abc/api/abcAuth/AbcAuthFactory.class  
   creating: 0.4.2.c4/ext/adaptor/abc-auth/bin/com/abc/client/
   creating: 0.4.2.c4/ext/adaptor/abc-auth/bin/com/abc/client/abcAuth/
  inflating: 0.4.2.c4/ext/adaptor/abc-auth/bin/com/abc/client/abcAuth/AbcAuthClient.class  
  inflating: 0.4.2.c4/ext/adaptor/abc-auth/bin/com/abc/client/abcAuth/AbcAuthClientException.class  




Adding setup details below:

Site 1: (root/r)
Admin node: magna007.ceph.redhat.com
s3cmd client node: extensa008.ceph.redhat.com


Site 2: (root/r)
Admin node: magna051.ceph.redhat.com
s3cmd client node: magna061.ceph.redhat.com


Thanks,
Uday

--- Additional comment from Uday kurundwade on 2023-01-20 07:24:12 UTC ---

Adding sync status details from both sites

Site 1:

[root@magna007 ubuntu]# radosgw-admin sync status
          realm 1b0f8d7a-498b-44ef-85d3-7ba21f198b8f (india)
      zonegroup 8c10bc9e-9c44-40be-96c3-773ba09db349 (states)
           zone c487201c-2b0d-4f90-93bc-38733a8b995f (delhi)
zonegroup features enabled: resharding
  metadata sync no sync (zone is master)
      data sync source: 57069ccc-931a-4bb2-a0b7-9dd3d8084249 (kank)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
[root@magna007 ubuntu]# radosgw-admin bucket sync status --bucket ud-tst
          realm 1b0f8d7a-498b-44ef-85d3-7ba21f198b8f (india)
      zonegroup 8c10bc9e-9c44-40be-96c3-773ba09db349 (states)
           zone c487201c-2b0d-4f90-93bc-38733a8b995f (delhi)
         bucket :ud-tst[c487201c-2b0d-4f90-93bc-38733a8b995f.166360.1])

    source zone 57069ccc-931a-4bb2-a0b7-9dd3d8084249 (kank)
  source bucket :ud-tst[c487201c-2b0d-4f90-93bc-38733a8b995f.166360.1])
                incremental sync on 11 shards
                bucket is caught up with source



Site 2:

[root@magna051 ubuntu]# radosgw-admin sync status
          realm 1b0f8d7a-498b-44ef-85d3-7ba21f198b8f (india)
      zonegroup 8c10bc9e-9c44-40be-96c3-773ba09db349 (states)
           zone 57069ccc-931a-4bb2-a0b7-9dd3d8084249 (kank)
zonegroup features enabled: resharding
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: c487201c-2b0d-4f90-93bc-38733a8b995f
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
[root@magna051 ubuntu]# radosgw-admin bucket sync status --bucket ud-tst
          realm 1b0f8d7a-498b-44ef-85d3-7ba21f198b8f (india)
      zonegroup 8c10bc9e-9c44-40be-96c3-773ba09db349 (states)
           zone 57069ccc-931a-4bb2-a0b7-9dd3d8084249 (kank)
         bucket :ud-tst[c487201c-2b0d-4f90-93bc-38733a8b995f.166360.1])

    source zone c487201c-2b0d-4f90-93bc-38733a8b995f (delhi)
  source bucket :ud-tst[c487201c-2b0d-4f90-93bc-38733a8b995f.166360.1])
                incremental sync on 11 shards
                bucket is caught up with source

--- Additional comment from Matt Benjamin (redhat) on 2023-01-20 14:29:57 UTC ---

Hi Folks,

1. It's confusing that in all of the information provided, there isn't an apples to apples comparison of the behavior on site1 and site2.

It appears likely that the object is undamaged when downloaded from site1, but damaged when downloaded from site2.  Please confirm.


2. I assume this test was performed in an equivalent manner on the just-release rhcs-5.3, where it passed.  Please confirm.

Matt

--- Additional comment from Uday kurundwade on 2023-01-23 16:06:27 UTC ---

(In reply to Matt Benjamin (redhat) from comment #5)
> Hi Folks,
> 
> 1. It's confusing that in all of the information provided, there isn't an
> apples to apples comparison of the behavior on site1 and site2.
> 
> It appears likely that the object is undamaged when downloaded from site1,
> but damaged when downloaded from site2.  Please confirm.

Yes... This is correct

> 
> 2. I assume this test was performed in an equivalent manner on the
> just-release rhcs-5.3, where it passed.  Please confirm.
> 

Since this feature was TP in 5.3, we have not fully tested on 5.3 Multisite setup.
I tried the same thing on recent 5.3 build(16.2.10-106.el8cp) and we are hitting the same issue on 5.3 as well

Thanks,
Uday

--- Additional comment from Casey Bodley on 2023-01-25 13:52:27 UTC ---

@Marcus: is there a per-bucket key being generated here? if so, multisite configurations would need to guarantee that both sites use the same per-bucket keys

--- Additional comment from Matt Benjamin (redhat) on 2023-01-25 14:00:00 UTC ---

(In reply to Casey Bodley from comment #7)
> @Marcus: is there a per-bucket key being generated here? if so, multisite
> configurations would need to guarantee that both sites use the same
> per-bucket keys

Hi Casey,

Would have been nice to explain the scenario you are thinking of in the meeting.

Matt

--- Additional comment from Casey Bodley on 2023-01-25 14:56:13 UTC ---

(In reply to Casey Bodley from comment #7)
> @Marcus: is there a per-bucket key being generated here? if so, multisite
> configurations would need to guarantee that both sites use the same
> per-bucket keys

sorry, i see that these per-bucket keys are stored in the RGW_ATTR_BUCKET_ENCRYPTION_KEY_ID xattr of the bucket info, which would be replicated by metadata sync

on object upload, this per-bucket key is copied into the RGW_ATTR_CRYPT_KEYID object xattr. these encryption attrs do replicate with the objects, and it's these per-object attrs that we use for decryption. so there doesn't appear to be any reliance on metadata sync here

maybe this is an issue with multipart uploads in general? there's a https://tracker.ceph.com/issues/58473 reporting that "PutBucketEncryption default policy does not apply to multipart uploads"

--- Additional comment from Vidushi Mishra on 2023-01-27 06:55:28 UTC ---

Hi, Casey and Matt,

This issue was seen with 'per-object' encryption and multipart uploads, where the md5sum of the downloaded object does not match on the secondary site.

With a 'put-bucket-encryption policy and multipart uploads', there is already a bug https://bugzilla.redhat.com/show_bug.cgi?id=2153452

Thanks,
Vidushi

--- Additional comment from Veera Raghava Reddy on 2023-03-09 13:09:53 UTC ---

Moving to 6.1

--- Additional comment from Matt Benjamin (redhat) on 2023-03-27 01:13:49 UTC ---

Marcus has claimed verbally that this is likely correct behavior.  Marcus, could you please update here with your explanation?

thanks,

Matt

--- Additional comment from Marcus Watts on 2023-04-05 06:42:22 UTC ---

There are 3 issues here:

1/ etags of encrypted objects are not the md5 checksum of the plaintext data.

2/ etags of multipart objects are not the same as the etag of the same data uploaded as a simple object.

3/ In multisite operation, etags on replicated objects should match the etag of the original object.

Cases 1 + 2 are normal expected behavior of aws s3, and can be observed in ceph independently of multisite operation.

Case 3 needs to be distinguished from cases 1+2.  To detect a violation of this claim, you need to get the etag on the source and destination sites, and see if they differ.  They should not.  In my experimental system, they are the same as they should be.

https://teppen.io/2018/06/23/aws_s3_etags/
describes this behavior and has pointers back to aws doc if you want more details.

--- Additional comment from Casey Bodley on 2023-04-05 15:00:10 UTC ---

(In reply to Marcus Watts from comment #13)
> There are 3 issues here:
> 
> 1/ etags of encrypted objects are not the md5 checksum of the plaintext data.
> 
> 2/ etags of multipart objects are not the same as the etag of the same data
> uploaded as a simple object.
> 
> 3/ In multisite operation, etags on replicated objects should match the etag
> of the original object.

multipart uploads get replicated as normal uploads, so i would expect their etags to differ in that case

--- Additional comment from Matt Benjamin (redhat) on 2023-04-05 16:38:38 UTC ---

Setting to modified, as above.

Specifically, QE folks, for any apparent inconsistency, we need a documented workflow to reproduce.

thanks!

Matt

--- Additional comment from errata-xmlrpc on 2023-04-10 21:37:19 UTC ---

This bug has been added to advisory RHBA-2023:112314 by Ken Dreyer (kdreyer)

--- Additional comment from Uday kurundwade on 2023-04-13 05:48:06 UTC ---

Hi Matt,

I am trying to reproduce this issue on latest 6.1 build.

As per the comment mentioned in comment #13, I will cross verify the Etags of encrypted multipart object on multisite.

Thanks,
Uday

--- Additional comment from Uday kurundwade on 2023-04-18 11:09:59 UTC ---

Hi,

I am able to reproduce this issue on latest 6.1 build (17.2.6-10.el9cp).

Etag value of encrypted object is matching on both the sites but when I download the object from another site, I am seeing same issue. 


Etag info:
Site 1:

{
        "name": "o2",
        "instance": "",
        "ver": {
            "pool": 11,
            "epoch": 176
        },
        "locator": "",
        "exists": "true",
        "meta": {
            "category": 1,
            "size": 39450596,
            "mtime": "2023-04-13T07:20:20.995490Z",
            "etag": "be704c94ef1c492a843315378e644b0a-3",
            "storage_class": "",
            "owner": "ud",
            "owner_display_name": "ud",
            "content_type": "application/zip",
            "accounted_size": 39450596,
            "user_data": "",
            "appendable": "false"
        },
        "tag": "2708f69d-6c52-4aa3-a147-dfbbeb8c55aa.25181.5834175970402933534",
        "flags": 0,
        "pending_map": [],
        "versioned_epoch": 0
    },

Site 2:

{
        "name": "o2",
        "instance": "",
        "ver": {
            "pool": 10,
            "epoch": 222
        },
        "locator": "",
        "exists": "true",
        "meta": {
            "category": 1,
            "size": 39450596,
            "mtime": "2023-04-13T07:20:20.995490Z",
            "etag": "be704c94ef1c492a843315378e644b0a-3",
            "storage_class": "",
            "owner": "ud",
            "owner_display_name": "ud",
            "content_type": "application/zip",
            "accounted_size": 39450596,
            "user_data": "",
            "appendable": "false"
        },
        "tag": "_vD6_SRCg3Lo0JyZ-kQhgmag4EHPOxAN",
        "flags": 0,
        "pending_map": [],
        "versioned_epoch": 0
    },


md5sum info:
Site 1:
[root@ceph-pri-uday-sse-pugpaj-node5 ~]# s3cmd info s3://tst1/o2
s3://tst1/o2 (object):
   File size: 39450596
   Last mod:  Thu, 13 Apr 2023 07:20:20 GMT
   MIME type: application/zip
   Storage:   STANDARD
   MD5 sum:   bbc61b47bb298688cc461c39847d335e
   SSE:       AES256
   Policy:    none
   CORS:      none
   ACL:       ud: FULL_CONTROL
   x-amz-meta-s3cmd-attrs: atime:1681370168/ctime:1681370097/gid:0/gname:root/md5:bbc61b47bb298688cc461c39847d335e/mode:33188/mtime:1638950136/uid:0/uname:root
[root@ceph-pri-uday-sse-pugpaj-node5 ~]# s3cmd get s3://tst1/o2 tmp.zip
download: 's3://tst1/o2' -> 'tmp.zip'  [1 of 1]
 39450596 of 39450596   100% in    0s   195.68 MB/s  done
[root@ceph-pri-uday-sse-pugpaj-node5 ~]# md5sum tmp.zip
bbc61b47bb298688cc461c39847d335e  tmp.zip
[root@ceph-pri-uday-sse-pugpaj-node5 ~]# 


Site 2:
[root@ceph-sec-uday-sse-pugpaj-node5 ~]# s3cmd info s3://tst1/o2
s3://tst1/o2 (object):
   File size: 39450596
   Last mod:  Thu, 13 Apr 2023 07:20:20 GMT
   MIME type: application/zip
   Storage:   STANDARD
   MD5 sum:   bbc61b47bb298688cc461c39847d335e
   SSE:       AES256
   Policy:    none
   CORS:      none
   ACL:       ud: FULL_CONTROL
   x-amz-meta-s3cmd-attrs: atime:1681370168/ctime:1681370097/gid:0/gname:root/md5:bbc61b47bb298688cc461c39847d335e/mode:33188/mtime:1638950136/uid:0/uname:root
[root@ceph-sec-uday-sse-pugpaj-node5 ~]# s3cmd get s3://tst1/o2 tmp.zip
download: 's3://tst1/o2' -> 'tmp.zip'  [1 of 1]
 39450596 of 39450596   100% in    0s   237.75 MB/s  done
WARNING: MD5 signatures do not match: computed=870299b3fc8f180dc892f6333eaaf377, received=bbc61b47bb298688cc461c39847d335e
[root@ceph-sec-uday-sse-pugpaj-node5 ~]# md5sum tmp.zip 
870299b3fc8f180dc892f6333eaaf377  tmp.zip
[root@ceph-sec-uday-sse-pugpaj-node5 ~]# 



Steps I followed:
1. Deploy 6.1 multisite setup
2. Configure encryption by setting SSE-S3 values in ceph on both sites
3. Create bucket and upload multipart object with --server-side-encryption parameter from site 1 
4. Wait for sync to get completed
5. Download same object from site 2

Thanks,
Uday

--- Additional comment from Akash Raj on 2023-05-04 05:39:36 UTC ---

Hi Marcus.

Please provide the doc type and text.

Thanks.

--- Additional comment from daniel parkes on 2023-05-05 15:06:27 UTC ---

Uday just to double check and confirm, on your latest test/reproducer, the mismatch of md5s implies corruption if you try to unzip the downloaded object as you pointed out in comment #3 right?

--- Additional comment from Uday kurundwade on 2023-05-08 09:36:16 UTC ---

(In reply to daniel parkes from comment #20)
> Uday just to double check and confirm, on your latest test/reproducer, the
> mismatch of md5s implies corruption if you try to unzip the downloaded
> object as you pointed out in comment #3 right?

Hi Daniel,

Yes... When I tried to unzip the downloaded object, it was corrupted.

Thanks,
Uday

--- Additional comment from Soumya Koduri on 2023-05-18 08:32:57 UTC ---

Thanks Vidushi for the setup.

Seems like the issue is with only s3cmd client. The object download works fine with aws cli from secondary site.


site2:
[root@ceph-sec-mip-7dz5sw-node5 ~]# s3cmd info s3://bucket-1/l-sse1
s3://bucket-1/l-sse1 (object):
   File size: 204800000
   Last mod:  Thu, 18 May 2023 07:36:35 GMT
   MIME type: application/octet-stream
   Storage:   STANDARD
   MD5 sum:   eda9a9889837ac4bc81d6387d92c1bec
   SSE:       AES256
   Policy:    none
   CORS:      none
   ACL:       user1: FULL_CONTROL
   x-amz-meta-s3cmd-attrs: atime:1684395334/ctime:1684395334/gid:0/gname:root/md5:eda9a9889837ac4bc81d6387d92c1bec/mode:33188/mtime:1684395334/uid:0/uname:root
[root@ceph-sec-mip-7dz5sw-node5 ~]# 
[root@ceph-sec-mip-7dz5sw-node5 ~]# s3cmd get s3://bucket-1/l-sse1 .
download: 's3://bucket-1/l-sse1' -> './l-sse1'  [1 of 1]
 204800000 of 204800000   100% in    0s   243.48 MB/s  done
WARNING: MD5 signatures do not match: computed=dc0bd5886815f0d4c9b005c19508a613, received=eda9a9889837ac4bc81d6387d92c1bec
[root@ceph-sec-mip-7dz5sw-node5 ~]#


[root@ceph-sec-mip-7dz5sw-node5 ~]# aws --endpoint http://localhost s3 cp s3://bucket-1/l-sse1 al-sse1
download: s3://bucket-1/l-sse1 to ./al-sse1                         
[root@ceph-sec-mip-7dz5sw-node5 ~]# 

@Uday,

Could you please re-verify the testcase with .zip file download using awscli.

--- Additional comment from Soumya Koduri on 2023-05-18 10:14:24 UTC ---

Sorry for the confusion. Even though download was successful via aws cli, the downloaded objects seem to differ (regular vs encrypted)

site2:
[root@ceph-sec-mip-7dz5sw-node5 ~]# md5sum al1         (download of regular multipart-upload object)
eda9a9889837ac4bc81d6387d92c1bec  al1
[root@ceph-sec-mip-7dz5sw-node5 ~]# md5sum al-sse1      (download of encrypted multipart-upload object)
dc0bd5886815f0d4c9b005c19508a613  al-sse1
[root@ceph-sec-mip-7dz5sw-node5 ~]# ls -ld al1 al-sse1
-rw-r--r--. 1 root root 204800000 May 18 03:36 al-sse1
-rw-r--r--. 1 root root 204800000 May 18 03:36 al1
[root@ceph-sec-mip-7dz5sw-node5 ~]# diff al1 al-sse1
Binary files al1 and al-sse1 differ
[root@ceph-sec-mip-7dz5sw-node5 ~]#

--- Additional comment from Soumya Koduri on 2023-05-18 11:29:52 UTC ---

(In reply to Casey Bodley from comment #9)
> (In reply to Casey Bodley from comment #7)
> > @Marcus: is there a per-bucket key being generated here? if so, multisite
> > configurations would need to guarantee that both sites use the same
> > per-bucket keys
> 
> sorry, i see that these per-bucket keys are stored in the
> RGW_ATTR_BUCKET_ENCRYPTION_KEY_ID xattr of the bucket info, which would be
> replicated by metadata sync
> 
> on object upload, this per-bucket key is copied into the
> RGW_ATTR_CRYPT_KEYID object xattr. these encryption attrs do replicate with
> the objects, and it's these per-object attrs that we use for decryption. so
> there doesn't appear to be any reliance on metadata sync here
> 
> maybe this is an issue with multipart uploads in general? there's a
> https://tracker.ceph.com/issues/58473 reporting that "PutBucketEncryption
> default policy does not apply to multipart uploads"

The above mentioned issue seem to have been fixed as part of bug2153452. However, as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2153452#c23, I see difference between upstream PR and downstream commit -

https://github.com/ceph/ceph/pull/49409/files#diff-2cb514dfecaff8be7cd4a8497053e815bf3a0de3d091c4472a0d134c5421dbc9R3881
https://gitlab.cee.redhat.com/ceph/ceph/-/commit/344a8585872b84e5b2eb4df23aa5c2c97d9eaf38#

The code (create_s3_policy(..)) is re-arranged in the upstream fix but not in the backport. I am not much familiar with the encryption code.

@Marcus & Casey,
Can you please confirm if we need similar changes in downstream 6.1 code as well?

--- Additional comment from Uday kurundwade on 2023-05-18 12:25:06 UTC ---

Clearing need info based on comment #23

--- Additional comment from Casey Bodley on 2023-05-18 12:52:42 UTC ---

(In reply to Soumya Koduri from comment #24)
> (In reply to Casey Bodley from comment #9)
> > (In reply to Casey Bodley from comment #7)
> > > @Marcus: is there a per-bucket key being generated here? if so, multisite
> > > configurations would need to guarantee that both sites use the same
> > > per-bucket keys
> > 
> > sorry, i see that these per-bucket keys are stored in the
> > RGW_ATTR_BUCKET_ENCRYPTION_KEY_ID xattr of the bucket info, which would be
> > replicated by metadata sync
> > 
> > on object upload, this per-bucket key is copied into the
> > RGW_ATTR_CRYPT_KEYID object xattr. these encryption attrs do replicate with
> > the objects, and it's these per-object attrs that we use for decryption. so
> > there doesn't appear to be any reliance on metadata sync here
> > 
> > maybe this is an issue with multipart uploads in general? there's a
> > https://tracker.ceph.com/issues/58473 reporting that "PutBucketEncryption
> > default policy does not apply to multipart uploads"
> 
> The above mentioned issue seem to have been fixed as part of bug2153452.
> However, as mentioned in
> https://bugzilla.redhat.com/show_bug.cgi?id=2153452#c23, I see difference
> between upstream PR and downstream commit -
> 
> https://github.com/ceph/ceph/pull/49409/files#diff-
> 2cb514dfecaff8be7cd4a8497053e815bf3a0de3d091c4472a0d134c5421dbc9R3881
> https://gitlab.cee.redhat.com/ceph/ceph/-/commit/
> 344a8585872b84e5b2eb4df23aa5c2c97d9eaf38#
> 
> The code (create_s3_policy(..)) is re-arranged in the upstream fix but not
> in the backport. I am not much familiar with the encryption code.
> 
> @Marcus & Casey,
> Can you please confirm if we need similar changes in downstream 6.1 code as
> well?

i'll defer to Marcus on what belongs downstream, but i did update the s3test case for this in https://github.com/ceph/s3-tests/pull/505 if we need to validate the feature there
i updated the s3test case for this in https://github.com/ceph/s3-tests/pull/505, if we need to validate

--- Additional comment from Matt Benjamin (redhat) on 2023-05-18 12:54:26 UTC ---

Hi Soumya and Casey,

Thanks for triaging the behavior and the downstream vs upstream code.

Marcus, can you please prioritize helping to verify the code that should be present on rhcs-6.1?

Matt

--- Additional comment from Marcus Watts on 2023-05-19 04:32:08 UTC ---

When last I looked at this (march 2023), there were 2 problems,
1.
"bucket encryption policy does not work with multipart uploads" - for which I posted a fix.
2.
some sort of decryption problem that resulted in corrupted data when downloading - which I do not have a fix for.

When I first looked at the decryption problem, I thought there was some kind of race condition going on.  Looking back at my notes, ( ( and trying more experiments with my setup that I haven't recycled yet ), I no longer think that's so.  I can consistently download the good data from the originating site, and get bad data out of the replication site.

I am using boto3 w/ python in interactive mode for my testing.  I'll post my notes next in this bz in case anyone else wants to try that.

Looks like a straight-forward data corruption problem.  Much of the data is decrypted correctly, so I think the metadata is all good.  The file length is correct.  What I do see is semi-periodic binary glop written over the data.  Periods look to be approximately binary intervals, like 2k, 8k, etc.  This looks very like the access pattern I'd expect to see out of a modern memory allocator, so I suspect this is most likely some sort of use after free type problem.

--- Additional comment from Marcus Watts on 2023-05-19 04:55:20 UTC ---

These are my notes on testing out bucket encryption policy and multisite operations.

Note the part that makes and then reads bucket_1 "b1-sses3-2".
The resulting file when downloaded can be verified with:
sort -u /tmp/35mb-b2
(at about line 350 in the notes).  You'll see 7 lines of 22 characters--23 including the newline (which is relatively prime to any power of 2).

After this point, in the python session, it is also possible to do
r = client2.download_file(bucket_1, 'b1-sses3-2', "/tmp/35mb-b2a")
r
which will do the identical client path, but against the replicate site, giving a corrupted file.  Then in the shell do
cmp -l /tmp/35mb-b2 /tmp/35mb-b2a
you'll get all the bytes that were corrupted this way.
Looks like 85% of the data is correct.

--- Additional comment from Marcus Watts on 2023-05-23 05:59:02 UTC ---

Upon further investigation, I learned several things about my test case:

1/ the origional "part" structure of the upload is eliminated upon repication 2/ the errors start exactly after the first part ends of the origional upload.  3/ the errors occur in two offset stripes, 16 bytes each out of 4096.  Also my math on how much was wrong is off -- 99% of the data is in fact correct.

The encrypte data is written out in 4k chunks using aes-256-cbc.  16 bytes is exactly one aes-256 block.  So, I believe the errors are due to mis-aligning aes cbc chunks when repacking data during replication.

I see 2 ways to fix this.  a/ pass "part" information back with the replication, so that the receiving site can start new parts and resync with the cbc chunks as necessary, or b/ augment on-disk encryption infomation so that it can store truncated cbc chunks in the middle of a local part.  a/ is likely simplier, so that's what I'll be looking at doing.

--- Additional comment from Matt Benjamin (redhat) on 2023-05-30 13:52:10 UTC ---

Moving to 6.1z1 by agreement in the rhcs-lt call.  It may be possible to move back to rhcs-6.1, depending on fix availability.

Matt

--- Additional comment from errata-xmlrpc on 2023-06-02 09:55:34 UTC ---

This bug has been dropped from advisory RHBA-2023:112314 by Ken Dreyer (kdreyer)

--- Additional comment from Akash Raj on 2023-06-08 09:33:52 UTC ---

Hi Marcus.

Please provide the doc type and text.

Thanks.

--- Additional comment from Ranjini M N on 2023-06-13 11:06:29 UTC ---

Hi Matt and Daniel, 

Can you please provide us the doc text for including it in the RHCS 6.1 release notes as a known issue? 

Thanks
Ranjini M N

--- Additional comment from daniel parkes on 2023-06-14 07:18:35 UTC ---

Hi,

Something along these lines can be added but I would like @Matt also to confirm:

As a part of internal testing, we observed an md5 mismatch of replicated objects when testing rgw's server-side encryption in multisite. This data corruption is specific to s3 multipart uploads with SSE encryption enabled. The corruption only affects the replicated copy. The original object remains intact. 

Encryption of multipart uploads requires special handling around the part boundaries because each part is uploaded and encrypted separately. In multisite, objects are encrypted, and multipart uploads are replicated as a single part. As a result, the replicated copy loses its knowledge about the original part boundaries required to decrypt the data correctly, which causes this corruption.

As an immediate solution, multisite users should not use server-side encryption for multipart uploads.

--- Additional comment from Matt Benjamin (redhat) on 2023-06-14 15:51:01 UTC ---

lgtm

--- Additional comment from Marcus Watts on 2023-06-27 04:16:50 UTC ---

The following things are incompatible with multisite replication of encrypted objects:
 /1/ multipart objects
 /2/ appending to objects - because internally this results in something very like a multipart object.
 /3/ compression (even of simple objects) - because the destination site tries to compress the encrypted data, which is wrong.

--- Additional comment from Manny on 2023-06-27 17:52:27 UTC ---

(In reply to Marcus Watts from comment #37)
> The following things are incompatible with multisite replication of
> encrypted objects:
>  /1/ multipart objects
>  /2/ appending to objects - because internally this results in something
> very like a multipart object.
>  /3/ compression (even of simple objects) - because the destination site
> tries to compress the encrypted data, which is wrong.

Hello Marcus,

I read this update and I see that you're ensuring we identify every possible use case which leads to this silent corruption of remote objects.

I have been tasked with writing the KCS for this issue and, given the subject matter, it must be 1000% accurate.
I have the following questions

1.)  Can corruption on the remote Ceph Cluster be found by comparing the ETag when a HEAD request is done, on the local and remote object?
2.)  Regardless of the method to used to find corruption, with RHCS Engineering provide any automation for this?
3.)  I need the procedure (post Ceph upgrade) to fix a corrupt object from the good local object

In case it's helpful, please see (https://access.redhat.com/solutions/7019437) and look at the "Diagnostic Steps", thank you

Best regards,
Manny

--- Additional comment from Casey Bodley on 2023-06-27 18:28:14 UTC ---

there is work in progress on a repair tool in https://github.com/ceph/ceph/pull/51842, but the process has not been reviewed or finalized yet

--- Additional comment from Manny on 2023-06-27 19:16:24 UTC ---

(In reply to Casey Bodley from comment #39)
> there is work in progress on a repair tool in
> https://github.com/ceph/ceph/pull/51842, but the process has not been
> reviewed or finalized yet

Casey,

Would the same tool find the corruption also? Sorry, trying to be hopeful,   :)
If it only remediates, that's fine, just let me know, thank you

Best regards,
Manny

--- Additional comment from Casey Bodley on 2023-06-27 20:15:32 UTC ---

(In reply to Manny from comment #40)
> (In reply to Casey Bodley from comment #39)
> > there is work in progress on a repair tool in
> > https://github.com/ceph/ceph/pull/51842, but the process has not been
> > reviewed or finalized yet
> 
> Casey,
> 
> Would the same tool find the corruption also? Sorry, trying to be hopeful,  
> :)
> If it only remediates, that's fine, just let me know, thank you

as currently implemented, it's a radosgw-admin command that scans the given bucket for corrupted (encrypted+multipart) objects. for any objects that it finds, it reschedules their replication

--- Additional comment from Manny on 2023-06-27 21:46:40 UTC ---

(In reply to Casey Bodley from comment #41)
> (In reply to Manny from comment #40)
> > (In reply to Casey Bodley from comment #39)
> > > there is work in progress on a repair tool in
> > > https://github.com/ceph/ceph/pull/51842, but the process has not been
> > > reviewed or finalized yet
> > 
> > Casey,
> > 
> > Would the same tool find the corruption also? Sorry, trying to be hopeful,  
> > :)
> > If it only remediates, that's fine, just let me know, thank you
> 
> as currently implemented, it's a radosgw-admin command that scans the given
> bucket for corrupted (encrypted+multipart) objects. for any objects that it
> finds, it reschedules their replication

Thanks Casey,

The KCS (https://access.redhat.com/solutions/7019437) is now as complete as it can be for now. If you can, give it a look, you comments are welcome. If you don't want to clutter up the BZ, hit me up on Slack/G Chat, thank you

Best regards,
Manny

--- Additional comment from Marcus Watts on 2023-06-28 07:47:10 UTC ---

I was asked to provide a more verbose explanation so here it is.

- the problem -

Ceph encrypts objects using cbc, in 2k chunks.  The "ivec" for each
2k chunk is uniquely generated using the byte offset of the chunk.
A multipart object has multiple parts, each of which is separatey
encrypted using a data offset starting at 0.

When a multisite object is replicated, the original parts structure
is lost, and the object is converted into one new part.  This causes
several problems with encrypted multipart objects.

Firstly, since the data offset is no longer restarted at 0 on each
original part, the wrong ivec will be generated, and 16 bytes will be corrupted
at the start of each 2k chunk for all data after the original
first part.  Secondly, if the part offsets are not a multiple of 2k,
chunks will be offset from the original 2k chunks, and data will be
corrupted at the start of each new 2k chunk, and also at the start of
each old 2k chunk.

Ceph has an extention to the s3 protocol which makes it possible to
append to an existing object.  If this is done, the object becomes a
multipart object, and will be corrupted on replication just like any
other multipart object.

Ceph also spports object compression.  The replication logic can
incorrectly try to compress encrypted objects, which will lead to some
sort kind of failure - either the object will fail to be replicated,
or if replication succeeds, the object will be corrupted and might not
even be readable.  XXX somebody should check: code is wrong, but I don't
know exactly which way it will break.

- finding corrupted objects -

encrypted objects -- multipart objecs have different etag
        RGW_ATTR_PREFIX         user.rgw.etag
encrypted objects have special "mode" attribute.
        RGW_ATTR_CRYPT_MODE     user.rgw.crypt.mode
compression read-back - also mode attribute?
        RGW_ATTR_COMPRESSION    user.rgw.compression
replication - attribute added to any replicated object.
        XXX what is it?

XXX These are the attributes on the rados object- how do they
    appear to the 'HEAD' command and in the s3 protocol?

Casey, above talks of a code to prod the replcation engine, I think
that won't be useful until the replcation path itself is fixed, which
I'm currenty working on.

- To avoid: -

don't encrypt multipart objects
don't enable compression and encryption
don't append to encrypted objects

--- Additional comment from Casey Bodley on 2023-06-28 15:22:05 UTC ---

(In reply to Marcus Watts from comment #43)
> 
> - finding corrupted objects -
> 
> encrypted objects -- multipart objecs have different etag
>         RGW_ATTR_PREFIX         user.rgw.etag
> encrypted objects have special "mode" attribute.
>         RGW_ATTR_CRYPT_MODE     user.rgw.crypt.mode
> compression read-back - also mode attribute?
>         RGW_ATTR_COMPRESSION    user.rgw.compression
> replication - attribute added to any replicated object.
>         XXX what is it?

the upstream reef release will add a 'user.rgw.amz-replication-status' xattr (and 'x-amz-replication-status' header) to replicated objects. that may already be present in some downstream releases, but only for objects that replicated after upgrading to that release

> 
> XXX These are the attributes on the rados object- how do they
>     appear to the 'HEAD' command and in the s3 protocol?

the response headers would include the 'ETag', and 'x-amz-server-side-encryption' for encrypted objects. compression is transparent to clients, so is not visible to HeadObject/GetObject requests

regarding the etags, multipart uploads do have a special etag format "<md5sum>-<num parts>". multisite replicates these objects as a single part, but i believe it copies the source object's etag. so the -uncorrupted- source object would have a multipart manifest and a multipart etag, while the -corrupted- replicas would have a single-part manifest, a multipart etag, and an encryption header

you can check whether an object's manifest is multipart with:
$ radosgw-admin object stat --bucket=X --object=Y | grep part_id
            "cur_part_id": 1,
            "cur_part_id": 10,

a multipart manifest would have non-zero part ids

> 
> Casey, above talks of a code to prod the replcation engine, I think
> that won't be useful until the replcation path itself is fixed, which
> I'm currenty working on.

that's right

> 
> - To avoid: -
> 
> don't encrypt multipart objects
> don't enable compression and encryption

as a point of clarification, the capability to combine compression+encryption does not exist on any ceph release

--- Additional comment from jquinn on 2023-06-28 16:18:12 UTC ---

Hi Akash, 

We need to ensure that the documentation that references links to https://access.redhat.com/articles/7019228 is updated to reflect the new KCS https://access.redhat.com/solutions/7019437.  This new KCS was created as an solution rather than article, cleaned up the formatting to meet KCS standards, and worked with Casey and Marcus to ensure all sections reflect the proper information needed to diagnose and correct the issue.   

Please let me know if you have any questions.    

Thanks, 
Joe Quinn

--- Additional comment from Casey Bodley on 2023-06-28 18:02:50 UTC ---

(In reply to Marcus Watts from comment #43)
> 
> Ceph also spports object compression.  The replication logic can
> incorrectly try to compress encrypted objects, which will lead to some
> sort kind of failure - either the object will fail to be replicated,
> or if replication succeeds, the object will be corrupted and might not
> even be readable.  XXX somebody should check: code is wrong, but I don't
> know exactly which way it will break.

thanks Marcus! Shilpa and I looked into this part, and i've updated https://tracker.ceph.com/issues/57905#note-10 with our findings

--- Additional comment from  on 2023-07-07 15:08:37 UTC ---

Marcus added the following eight commits to ceph-6.1-rhel-patches today:

4bdc9ccfb05 rgw/crypt: apply rgw_crypt_default_encryption_key by default
39b0edfe8fe rgw: fetch_remote_obj() preserves original part lengths for BlockDecrypt
a1cc5caef2c rgw: BlockDecrypt filter parses manifest parts before construction
38d27d87c6b rgw: fetch_remote_obj() preserves RGW_ATTR_COMPRESSION of encrypted objects
188731951bd rgw: fetch_remote_obj() will never verify etags of encrypted objects
d0a772a006b rgw: rgwx-skip-decrypt also skips decompression of encrypted objects
e58eb66269f rgw: support full object encryption stack on compression
a31894a10e5 rgw/sse-s3: fix bucket encryption of multipart upload


I assume this BZ was meant to move to MODIFIED, so moving.

Thomas

--- Additional comment from errata-xmlrpc on 2023-07-11 04:12:13 UTC ---

This bug has been added to advisory RHBA-2023:116292 by Thomas Serlin (tserlin)

--- Additional comment from Casey Bodley on 2023-07-12 20:40:49 UTC ---

note that were additional upstream changes related to compression+encryption in https://github.com/ceph/ceph/pull/52300. they require the admin to opt-in to the feature, and also prevent it from being enabled in a multisite configuration where some older zones wouldn't replicate the objects correctly

--- Additional comment from Akash Raj on 2023-07-13 18:07:26 UTC ---

Hi Marcus.

Please provide the doc type and doc text.

Thanks.

--- Additional comment from Uday kurundwade on 2023-07-24 09:38:45 UTC ---

On version 17.2.6-98.el9cp, I am able to download the multipart object from another site without any failure and md5sum is also matching with uploaded and downloaded object. 

Primary site:
[root@ceph-pri-uday-sse-lsgt6c-node5 ~]# s3cmd put cos.zip s3://uday/a3 --server-side-encryption
upload: 'cos.zip' -> 's3://uday/a3'  [part 1 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    39.00 MB/s  done
upload: 'cos.zip' -> 's3://uday/a3'  [part 2 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    43.68 MB/s  done
upload: 'cos.zip' -> 's3://uday/a3'  [part 3 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    49.07 MB/s  done
upload: 'cos.zip' -> 's3://uday/a3'  [part 4 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    49.62 MB/s  done
upload: 'cos.zip' -> 's3://uday/a3'  [part 5 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    50.53 MB/s  done
upload: 'cos.zip' -> 's3://uday/a3'  [part 6 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    45.13 MB/s  done
upload: 'cos.zip' -> 's3://uday/a3'  [part 7 of 8, 5MB] [1 of 1]
 5242880 of 5242880   100% in    0s    42.26 MB/s  done
upload: 'cos.zip' -> 's3://uday/a3'  [part 8 of 8, 2MB] [1 of 1]
 2750436 of 2750436   100% in    0s    43.95 MB/s  done
[root@ceph-pri-uday-sse-lsgt6c-node5 ~]# s3cmd info s3://uday/a3
s3://uday/a3 (object):
   File size: 39450596
   Last mod:  Mon, 24 Jul 2023 07:26:10 GMT
   MIME type: application/zip
   Storage:   STANDARD
   MD5 sum:   bbc61b47bb298688cc461c39847d335e
   SSE:       AES256
   Policy:    none
   CORS:      none
   ACL:       ud: FULL_CONTROL
   x-amz-meta-s3cmd-attrs: atime:1689852160/ctime:1689851543/gid:0/gname:root/md5:bbc61b47bb298688cc461c39847d335e/mode:33188/mtime:1638950136/uid:0/uname:root
[root@ceph-pri-uday-sse-lsgt6c-node5 ~]# md5sum cos.zip 
bbc61b47bb298688cc461c39847d335e  cos.zip
[root@ceph-pri-uday-sse-lsgt6c-node5 ~]# ls
cos.zip
[root@ceph-pri-uday-sse-lsgt6c-node5 ~]# s3cmd get s3://uday/a3
download: 's3://uday/a3' -> './a3'  [1 of 1]
 39450596 of 39450596   100% in    0s   305.37 MB/s  done
[root@ceph-pri-uday-sse-lsgt6c-node5 ~]# ls
a3  cos.zip



Secondary site:
[root@ceph-sec-uday-sse-lsgt6c-node5 ~]# s3cmd info s3://uday/a3
s3://uday/a3 (object):
   File size: 39450596
   Last mod:  Mon, 24 Jul 2023 07:26:10 GMT
   MIME type: application/zip
   Storage:   STANDARD
   MD5 sum:   bbc61b47bb298688cc461c39847d335e
   SSE:       AES256
   Policy:    none
   CORS:      none
   ACL:       ud: FULL_CONTROL
   x-amz-meta-s3cmd-attrs: atime:1689852160/ctime:1689851543/gid:0/gname:root/md5:bbc61b47bb298688cc461c39847d335e/mode:33188/mtime:1638950136/uid:0/uname:root
[root@ceph-sec-uday-sse-lsgt6c-node5 ~]# ls
a2
[root@ceph-sec-uday-sse-lsgt6c-node5 ~]# s3cmd get s3://uday/a3
download: 's3://uday/a3' -> './a3'  [1 of 1]
 39450596 of 39450596   100% in    0s   243.15 MB/s  done
[root@ceph-sec-uday-sse-lsgt6c-node5 ~]# ls
a2  a3
[root@ceph-sec-uday-sse-lsgt6c-node5 ~]# md5sum a3
bbc61b47bb298688cc461c39847d335e  a3



Based on these observation, Moving tis bz to verified state

--- Additional comment from Manny on 2023-07-28 23:44:19 UTC ---

Hello @cbodley ,

As RHCS 6.1z1 will GA on or about 02-Aug-2023, our KCS needs to be updated. Specifically, the improvement to radosgw-admin. From (https://github.com/ceph/ceph/pull/51842) it seems the command will be "radosgw-admin bucket resync encrypted multipart".

Currently, the KCS (https://access.redhat.com/solutions/7019437) just makes a promise that details will be provided. I need to fix this before RHCS 6.1z1 goes GA. Please let us know, thank you

Best regards,
Manny

--- Additional comment from Casey Bodley on 2023-07-31 14:11:49 UTC ---

hi Manny, that admin command is working for me but hasn't been reviewed. i'm happy to cherry-pick it somewhere so QE can start testing. Matt, can we create a separate BZ to track that, since this one is already verified?

--- Additional comment from Matt Benjamin (redhat) on 2023-07-31 16:14:58 UTC ---

(In reply to Casey Bodley from comment #53)
> hi Manny, that admin command is working for me but hasn't been reviewed. i'm
> happy to cherry-pick it somewhere so QE can start testing. Matt, can we
> create a separate BZ to track that, since this one is already verified?

sure, if I put that in 6.1z2, can we cherry pick there?

Matt

Comment 9 errata-xmlrpc 2023-10-12 16:34:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security, enhancement, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:5693

Comment 10 Red Hat Bugzilla 2024-03-03 04:25:19 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.