Description of problem: after disabling notification_v2, observed rgw crash with complete-multipart-upload on notification configured bucket with kafka-broker and kafka-ssl endpoint in the topic even on an upgraded environment from 7.1 to 8.0 (where notification_v2 is disabled by default), rgw crashing with complete-multipart-upload on notification enabled bucket with kafka-ssl endpoint in the topic rgw crash snippet in rgw logs at debug_level 20: -9> 2024-11-21T05:51:47.987+0000 7f490c106640 20 Kafka publish: reused existing topic: cephci-kafka-broker-ack-type-b47c2192b2284487 -8> 2024-11-21T05:51:47.987+0000 7f490c106640 20 Kafka publish (with callback, tag=171): OK. Queue has: 1 callbacks -7> 2024-11-21T05:51:47.990+0000 7f492c947640 20 handle_completion(): completion ok for obj=prefix1key_davidh.198-bucky-3629-1_70 -6> 2024-11-21T05:51:48.037+0000 7f490c106640 20 Kafka run: ack received with result=Success -5> 2024-11-21T05:51:48.037+0000 7f490c106640 20 Kafka run: n/ack received, invoking callback with tag=171 -4> 2024-11-21T05:51:48.037+0000 7f49b4a57640 20 req 15718117396428226637 0.066999547s s3:complete_multipart get_obj_state: octx=0x562ae0fae620 obj=davidh.198-bucky-3629-1:_multipart_prefix1key_davidh.198-bucky-3629-1_70.2~NG_akG9dWbdusnno4QxYCP0_00k8Y-d.meta state=0x562adff321e8 s->prefetch_data=0 -3> 2024-11-21T05:51:48.037+0000 7f49b4a57640 20 req 15718117396428226637 0.066999547s s3:complete_multipart get_obj_state: octx=0x562ae0fae620 obj=davidh.198-bucky-3629-1:_multipart_prefix1key_davidh.198-bucky-3629-1_70.2~NG_akG9dWbdusnno4QxYCP0_00k8Y-d.meta state=0x562adff321e8 s->prefetch_data=0 -2> 2024-11-21T05:51:48.037+0000 7f49b4a57640 20 req 15718117396428226637 0.066999547s s3:complete_multipart prepare_atomic_modification: state is not atomic. state=0x562adff321e8 -1> 2024-11-21T05:51:48.038+0000 7f49b4a57640 20 req 15718117396428226637 0.067999534s s3:complete_multipart bucket index object: :.dir.9ebac6ff-1b96-47e9-8a41-f975432acaaf.56862.2.3 0> 2024-11-21T05:51:48.046+0000 7f490c106640 -1 *** Caught signal (Aborted) ** in thread 7f490c106640 thread_name:kafka_manager ceph version 19.2.0-53.el9cp (677d8728b1c91c14d54eedf276ac61de636606f8) squid (stable) 1: /lib64/libc.so.6(+0x3e730) [0x7f4a3aee6730] 2: /lib64/libc.so.6(+0x8ba6c) [0x7f4a3af33a6c] 3: raise() 4: abort() 5: /lib64/libc.so.6(+0x29170) [0x7f4a3aed1170] 6: /lib64/libc.so.6(+0x37217) [0x7f4a3aedf217] 7: /lib64/libc.so.6(+0x92248) [0x7f4a3af3a248] 8: (std::_Function_handler<void (int), RGWPubSubKafkaEndpoint::send(rgw_pubsub_s3_event const&, optional_yield)::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int&&)+0x95) [0x562adaf1d035] 9: (rgw::kafka::message_callback(rd_kafka_s*, rd_kafka_message_s const*, void*)+0x20f) [0x562adaf873ff] 10: /lib64/librdkafka.so.1(+0x256ef) [0x7f4a3b62e6ef] 11: /lib64/librdkafka.so.1(+0x5b862) [0x7f4a3b664862] 12: rd_kafka_poll() 13: (rgw::kafka::Manager::run()+0x5a9) [0x562adaf8eff9] 14: /lib64/libstdc++.so.6(+0xdbad4) [0x7f4a3b283ad4] 15: /lib64/libc.so.6(+0x89d22) [0x7f4a3af31d22] 16: /lib64/libc.so.6(+0x10ed40) [0x7f4a3afb6d40] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. fail log snippet: Traceback (most recent call last): File "/home/cephuser/rgw-tests/ceph-qe-scripts/rgw/v2/tests/s3_swift/test_bucket_notifications.py", line 539, in <module> test_exec(config, ssh_con) File "/home/cephuser/rgw-tests/ceph-qe-scripts/rgw/v2/tests/s3_swift/test_bucket_notifications.py", line 327, in test_exec reusable.upload_mutipart_object( File "/home/cephuser/rgw-tests/ceph-qe-scripts/rgw/v2/tests/s3_swift/reusable.py", line 609, in upload_mutipart_object mpu.complete(MultipartUpload=parts_info) File "/home/cephuser/venv/lib64/python3.9/site-packages/boto3/resources/factory.py", line 581, in do_action response = action(self, *args, **kwargs) File "/home/cephuser/venv/lib64/python3.9/site-packages/boto3/resources/action.py", line 88, in __call__ response = getattr(parent.meta.client, operation_name)(*args, **params) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/client.py", line 569, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/client.py", line 1005, in _make_api_call http, parsed_response = self._make_request( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/client.py", line 1029, in _make_request return self._endpoint.make_request(operation_model, request_dict) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/endpoint.py", line 119, in make_request return self._send_request(request_dict, operation_model) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/endpoint.py", line 200, in _send_request while self._needs_retry( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/endpoint.py", line 360, in _needs_retry responses = self._event_emitter.emit( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/hooks.py", line 412, in emit return self._emitter.emit(aliased_event_name, **kwargs) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/hooks.py", line 256, in emit return self._emit(event_name, kwargs) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/hooks.py", line 239, in _emit response = handler(**kwargs) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 207, in __call__ if self._checker(**checker_kwargs): File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 284, in __call__ should_retry = self._should_retry( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 320, in _should_retry return self._checker(attempt_number, response, caught_exception) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 363, in __call__ checker_response = checker( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 247, in __call__ return self._check_caught_exception( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 416, in _check_caught_exception raise caught_exception File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/endpoint.py", line 279, in _do_get_response http_response = self._send(request) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/endpoint.py", line 383, in _send return self.http_session.send(request) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/httpsession.py", line 493, in send raise EndpointConnectionError(endpoint_url=request.url, error=e) botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://10.0.67.212:80/davidh.198-bucky-3629-1/prefix1key_davidh.198-bucky-3629-1_70?uploadId=2~NG_akG9dWbdusnno4QxYCP0_00k8Y-d" Version-Release number of selected component (if applicable): ceph version 19.2.0-53.el9cp How reproducible: intermittent Steps to Reproduce: 1.deploy rhcs8.0 cluster and disable notification_v2 or try on an upgraded env from 7.1 to 8.0 2.create an rgw user and bucket 3.create a topic and put bucket notifications 2024-11-21 05:50:47,251 INFO: executing cmd: radosgw-admin topic get --topic cephci-kafka-broker-ack-type-b47c2192b2284487 2024-11-21 05:50:47,583 INFO: cmd excuted 2024-11-21 05:50:47,584 INFO: { "owner": "davidh.198", "name": "cephci-kafka-broker-ack-type-b47c2192b2284487", "dest": { "push_endpoint": "kafka://localhost:9093", "push_endpoint_args": "Version=2010-03-31&ca-location=/usr/local/kafka/y-ca.crt&kafka-ack-level=broker&use-ssl=true&verify-ssl=false", "push_endpoint_topic": "cephci-kafka-broker-ack-type-b47c2192b2284487", "stored_secret": false, "persistent": false, "persistent_queue": "", "time_to_live": "None", "max_retries": "None", "retry_sleep_duration": "None" }, "arn": "arn:aws:sns:default::cephci-kafka-broker-ack-type-b47c2192b2284487", "opaqueData": "", "policy": "" } 2024-11-21 05:50:47,619 INFO: get bucket notification for bucket : davidh.198-bucky-3629-1 2024-11-21 05:50:47,667 INFO: bucket notification for bucket: davidh.198-bucky-3629-1 is { "ResponseMetadata": { "RequestId": "tx000002ecb61d399da34ee-00673eca37-56862-default", "HostId": "", "HTTPStatusCode": 200, "HTTPHeaders": { "x-amz-request-id": "tx000002ecb61d399da34ee-00673eca37-56862-default", "content-type": "application/xml", "server": "Ceph Object Gateway (squid)", "content-length": "372", "date": "Thu, 21 Nov 2024 05:50:47 GMT", "connection": "Keep-Alive" }, "RetryAttempts": 0 }, "TopicConfigurations": [ { "Id": "notification-Multipart", "TopicArn": "arn:aws:sns:default::cephci-kafka-broker-ack-type-b47c2192b2284487", "Events": [ "s3:ObjectCreated:*", "s3:ObjectRemoved:*" ], "Filter": { "Key": { "FilterRules": [ { "Name": "prefix", "Value": "prefix1" } ] } } } ] } 4.create multipart-upload, upload parts and complete-multipart-upload. observed rgw crash after few iterations of multipart objects upload. Actual results: observed rgw crash with complete-multipart-upload on a bucket with notifications configured with kafka-broker on a kafka-ssl endpoint after disabling notification_v2 Expected results: rgw should not crash even if we disable notification_v2 Additional info: fail log on fresh 8.0 cluster after disabling notification_v2: http://magna002.ceph.redhat.com/cephci-jenkins/hsm/TFA_squid_kafka_ssl_notif/test_bucket_notification_ssl_kafka_broker_multipart.console.log_fresh_deploy_8.0_disable_notif_v2_iter2 rgw debug logs: http://magna002.ceph.redhat.com/cephci-jenkins/hsm/TFA_squid_kafka_ssl_notif/rgw_logs_debug_20_with_rgw_crash_log fail log on an upgraded environment from 7.1 to 8.0: http://magna002.ceph.redhat.com/cephci-jenkins/hsm/TFA_squid_kafka_ssl_notif/test_bucket_notification_ssl_kafka_broker_multipart.console.log_upgraded_cluster_v2_enabled_disabled_and_enabled
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2025:9775