Bug 2024107

Summary:	Retrieval of cached objects with `s3 sync` after change in object size in underlying storage results in an InvalidRange error
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	Ben Eli <belimele>
Component:	Multi-Cloud Object Gateway	Assignee:	Danny <dzaken>
Status:	CLOSED ERRATA	QA Contact:	Ben Eli <belimele>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.9	CC:	aindenba, etamir, ikave, kramdoss, mmuench, muagarwa, nbecker, ocs-bugs, odf-bz-bot
Target Milestone:	---	Keywords:	AutomationBackLog
Target Release:	ODF 4.10.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	4.10.0-79	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-04-13 18:50:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ben Eli 2021-11-17 10:32:22 UTC

Description of problem (please be detailed as possible and provide log
snippests):
Elaborated in the reproduction steps

Version of all relevant components (if applicable):
Observed in ODF 4.9-rc, potentially happening in all ODF versions supporting cache.

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Partially - cache is not fully working as intended.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
N/A

If this is a regression, please provide more details to justify this:
We did not test for this case in the past, so it is not possible to know.

Steps to Reproduce:
1. Create a bucket on top of a namespacestore with a cache policy (I used a TTL of 600000)
2. Generate a 1MiB file using dd -
dd if=/dev/urandom of={file path} bs=1M count=1 status=none"
3. Upload the object to the MCG bucket
4. Generate a 10MiB file *that has the same name* using dd -
dd if=/dev/urandom of={file path} bs=1M count=1 status=none"
5. Upload the 10MiB file directly to the underlying storage bucket
6. Try to download the object from the MCG bucket

Actual results:
The download fails with an error - 
(InvalidRange) when calling the GetObject operation: The requested range cannot be satisfied

Expected results:
The download succeeds

Additional info:
We did not run into this until today because in step 4, we always generated another 1MiB file. When both files are the same size (or, both are specifically 1MiB), the problem does not happen.

Comment 2 Ben Eli 2021-11-17 13:29:58 UTC

Correction - step 4 should say
dd if=/dev/urandom of={file path} bs=10M count=1 status=none

bs=10M, not  1M (which is step 2).

Also, the trailing double quotes are not part of the command, and should be ignored.

Comment 3 Alexander Indenbaum 2021-11-17 18:31:32 UTC

Tested with noobaa-image: noobaa/noobaa-core:5.9.0-20210722

Tried the described scenario with a large NS-CACHE bucketclass TTL (600000000).

I am using AWS CLI to access both the hub AWS bucket and the MCG cache bucket.

Test 
1. Generated 1m random file and copy in (aws s3 cp) through MCG endpoint
2. Generated 10m random file and copy in (aws s3 cp) through amazon endpoint, same obj name as before
3. Copy back (aws s3 cp) through MCG endpoint, got back the 1m file, i.e. from the cache.
4. Interestingly list through MCG endpoint (aws s3 ls) shows 10m, i.e. hub bucket object size, so there is an inconsistency between GetObject and ListObjects using MCG endpoint.

Then I reinstalled the system and retried with a tiny NS-CACHE bucketclass TTL, (6000) in Step 4. using the MCG endpoint, got back the 10m file, i.e. from the hub.

@ben eli, could you please describe how do you upload and download? What tool/client do you use? Is multipart upload download w/partNuber used?

Thank you!

Comment 4 Ben Eli 2021-11-18 11:10:38 UTC

Hey Alexander - thanks a lot for the in-depth testing and elaborate answer!

I can confirm that I have also noticed the size mismatch when listing the object (10M were shown although the cache should be 1M).
It's good to see that this part is reproducible too.

I debugged the issue further, and narrowed it down to it happening *only* when using the s3 sync command.
s3 cp works as intended, and copies the right file - just like in your testing.

So there seems to be two separate issues:
1. s3 ls states the size of the object as is in the underlying endpoint, and not the one that's in the cache
2. s3 sync fails to copy the object with the aforementioned error (InvalidRange), while s3 cp works.aa

We're using AWS CLI 1.18.69, with all default settings.

Comment 5 Alexander Indenbaum 2021-11-25 07:04:21 UTC

Thank you Ben!

The scenario is an out-of-band update of a cached object in the hub with a larger one. By design, NooBaa ListObjects() returns the hub's picture of the world.

The client (AWS CLI) is doing `sync` (https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html), and issues ranged 
 GetObject() (https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html), after performing ListObjects(). The client uses an open range `X-`, the NooBaa endpoint  throws an InvalidRange exception here (https://github.com/noobaa/noobaa-core/blob/60c5ced7347593b8f488ed0dd1957a46ee40c08a/src/util/http_utils.js#L218), since the range start `X` is greater than the cached object size.

The root cause is inconsistency between ListObjects() and GetObject().

Comment 6 Alexander Indenbaum 2021-11-29 09:43:26 UTC

Hello Ben!

I got clarification about the cache's TTL value meaning. The idea is to have 3 options for TTL:
TTL > 0 - means that we wait that time before revalidation
TTL = 0 - means that we always revalidate
TTL < 0 - means that we never revalidate
The last case was meant to support cases where there are no expected out-of-band overwrites to the hub, or the entire dataset is immutable.

Could you try using TTL = 0 for this AWS CLI `sync` test?

Comment 7 Ben Eli 2021-12-03 10:02:23 UTC

Hi Alexander,

Thanks for the explanation!

The goal of the test, at the moment, is to intentionally test the case that it's testing (that the cached object is retrieved after the object in the underlying storage has changed).
We had a call about this, and decided to keep the current test logic that replaces the underlying storage object with one that has different data, but an identical size.
This allows us to continue using "sync", and still verifies that the hashes are different.

Comment 8 Ben Eli 2021-12-16 11:11:59 UTC

Addendum:
After discussing this further with Alexander, we have reached the conclusion that it'd be good if this bug would be followed by an action item;
Perhaps by changing the default TTL from a positive integer to a negative one or 0. 
Alternatively (or perhaps, in addition), a clarification should be added to the docs, which states that this behavior is expected, and that positive TTL should not be used in cases where out-of-band writes are expected.

Comment 9 Nimrod Becker 2022-01-04 09:47:01 UTC

Ben, did you had a chance to test with TTL 0?

Comment 21 errata-xmlrpc 2022-04-13 18:50:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372