Bug 2290847 - [MCG] Expired object deletion fails with lifecycle errors
Summary: [MCG] Expired object deletion fails with lifecycle errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.16
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.16.0
Assignee: Nimrod Becker
QA Contact: Mahesh Shetty
URL:
Whiteboard:
Depends On:
Blocks: 2279742
TreeView+ depends on / blocked
 
Reported: 2024-06-07 08:36 UTC by Prasad Desala
Modified: 2024-07-24 10:30 UTC (History)
5 users (show)

Fixed In Version: 4.16.0-126
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-07-17 13:24:34 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-core pull 8118 0 None Merged Fixing lifecycle after #8074 2024-06-10 14:35:23 UTC
Github noobaa noobaa-core pull 8137 0 None open [Backport to 5.16] Fixing lifecycle schema issues 2024-06-13 07:07:56 UTC
Red Hat Product Errata RHSA-2024:4591 0 None None None 2024-07-17 13:24:42 UTC

Description Prasad Desala 2024-06-07 08:36:46 UTC
Description of problem (please be detailed as possible and provide log
snippests):
==========================================================================
This issue was observed while running automated system test test_object_expiration_with_disruptions. The cluster is enabled with FIPS , KMS, encryption at rest and huge pages. As part of the test, we set the expiration to 1day in bucket lifecycle:

$ s3api get-bucket-lifecycle --bucket=oc-bucket-cf372cfa27044d5190a946a91b671e
urllib3/connectionpool.py:1056: InsecureRequestWarning: Unverified HTTPS request is being made to host 's3-openshift-storage.apps.tdesala-j6.qe.rh-ocs.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
{
    "Rules": [
        {
            "Expiration": {
                "Days": 1
            },
            "ID": "data-expire",
            "Status": "Enabled"
        }
    ]
}

After that, we changed the creation time to 2023, so that the background runner catches the object expiration and reduced object expiration check interval to 1 minute. However, the object still had not expired even after a long time:

$ s3 ls s3://oc-bucket-cf372cfa27044d5190a946a91b671e
urllib3/connectionpool.py:1056: InsecureRequestWarning: Unverified HTTPS request is being made to host 's3-openshift-storage.apps.tdesala-j6.qe.rh-ocs.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
2023-06-08 12:08:37    1048576 Random_object0
2023-06-08 12:08:37         16 obj-key-14ecd0b881404431860551f1a4bc8230
2023-06-08 12:08:37         16 obj-key-441d2b5feac44617a2804c862dc19faa
2023-06-08 12:08:37         16 obj-key-51a224dcf3274ca48169dc8c3fe6c67e
2023-06-08 12:08:37         16 obj-key-a47aed0c1a1a436faa640a39779adf4a
2023-06-08 12:08:37         16 obj-key-edb9b92e13b54b5aa36e441bd7b25217

Seeing lifecycle failed error in the core logs for the same bucket (6662a2e1697f140022a01119)

Jun-7 6:16:21.629 [BGWorkers/32]    [L0] core.server.bg_services.lifecycle:: LIFECYCLE PROCESSING bucket: SENSITIVE-46d7e558517253bd (bucket id: 6662a2e1697f140022a01119 ) rule { id: 'data-expire', filter: { prefix: '' }, status: 'Enabled', expiration: { days: 1 } }
Jun-7 6:16:21.639 [WebServer/34] [ERROR] core.rpc.rpc_schema:: INVALID_SCHEMA_REPLY SERVER object_api#/methods/delete_multiple_objects_by_filter ERRORS: [ { instancePath: '', schemaPath: 'object_api#/methods/delete_multiple_objects_by_filter/reply/additionalProperties', keyword: 'additionalProperties', params: { additionalProperty: 'objects_deleted' }, message: 'must NOT have additional properties', schema: false, parentSchema: { type: 'object', properties: { num_objects_deleted: { type: 'integer' } }, additionalProperties: false }, data: { objects_deleted: 0 } }, [length]: 1 ] REPLY: { objects_deleted: 0 }
Jun-7 6:16:21.639 [WebServer/34] [ERROR] CONSOLE:: RPC._on_request: ERROR srv object_api.delete_multiple_objects_by_filter reqid 520@wss://localhost:8443/(27a6wcq) connid ws://[::1]:51552/(2f198l4) Error: INVALID_SCHEMA_REPLY SERVER object_api#/methods/delete_multiple_objects_by_filter
    at method_api.validate_reply (/root/node_modules/noobaa-core/src/rpc/rpc_schema.js:129:31)
    at RPC._on_request (/root/node_modules/noobaa-core/src/rpc/rpc.js:351:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Jun-7 6:16:21.640 [BGWorkers/32] [ERROR] core.rpc.rpc:: RPC._request: response ERROR srv object_api.delete_multiple_objects_by_filter reqid 520@wss://localhost:8443/(27a6wcq) connid wss://localhost:8443/(27a6wcq) params { bucket: SENSITIVE-46d7e558517253bd, create_time: 1717654581, prefix: '', size_less: undefined, size_greater: undefined, tags: undefined, limit: 1000 } took [6.1+2.0=8.1] [RpcError: INVALID_SCHEMA_REPLY SERVER object_api#/methods/delete_multiple_objects_by_filter] { rpc_code: 'INVALID_SCHEMA_REPLY' }
Jun-7 6:16:21.640 [BGWorkers/32] [ERROR] core.server.bg_services.lifecycle:: LIFECYCLE FAILED processing [RpcError: INVALID_SCHEMA_REPLY SERVER object_api#/methods/delete_multiple_objects_by_filter] { rpc_code: 'INVALID_SCHEMA_REPLY' } 
Jun-7 6:16:21.640 [BGWorkers/32]    [L0] core.server.bg_services.lifecycle:: LIFECYCLE: END
Jun-7 6:16:21.640 [BGWorkers/32]   [LOG] CONSOLE:: TieringSpillbackWorker: start running
Jun-7 6:16:21.693 [BGWorkers/32]   [LOG] core.util.background_scheduler:: run_background_worker key rotator UNCAUGHT ERROR [Error: ENOENT: no such file or directory, stat '/etc/noobaa-server/root_keys'] { errno: -2, code: 'ENOENT', syscall: 'stat', path: '/etc/noobaa-server/root_keys' } Error: ENOENT: no such file or directory, stat '/etc/noobaa-server/root_keys

Version of all relevant components (if applicable):
OCP: 4.16.0-0.nightly-2024-06-06-064349
ODF: 4.16.0-120

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?
No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes (2/2)

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
===================
1.Run https://github.com/red-hat-storage/ocs-ci/blob/94822697d154d895a1437b5cea20687ef682472a/tests/cross_functional/system_test/test_object_expiration::TestObjectExpiration::test_object_expiration_with_disruptions

Actual results:
===============
Objects failed to be deleted even after their expiry.

Expected results:
=================
Objects should successfully get deleted after their expiry

Comment 13 Sunil Kumar Acharya 2024-06-25 12:09:21 UTC
Please update the RDT flag/text appropriately.

Comment 15 errata-xmlrpc 2024-07-17 13:24:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591


Note You need to log in before you can comment on or make changes to this bug.