Description of problem: after disabling notification_v2, observed rgw crash with complete-multipart-upload on notification configured bucket with kafka-broker and kafka-ssl endpoint in the topic even on an upgraded environment from 7.1 to 8.0 (where notification_v2 is disabled by default), rgw crashing with complete-multipart-upload on notification enabled bucket with kafka-ssl endpoint in the topic rgw crash snippet in rgw logs at debug_level 20: -9> 2024-11-21T05:51:47.987+0000 7f490c106640 20 Kafka publish: reused existing topic: cephci-kafka-broker-ack-type-b47c2192b2284487 -8> 2024-11-21T05:51:47.987+0000 7f490c106640 20 Kafka publish (with callback, tag=171): OK. Queue has: 1 callbacks -7> 2024-11-21T05:51:47.990+0000 7f492c947640 20 handle_completion(): completion ok for obj=prefix1key_davidh.198-bucky-3629-1_70 -6> 2024-11-21T05:51:48.037+0000 7f490c106640 20 Kafka run: ack received with result=Success -5> 2024-11-21T05:51:48.037+0000 7f490c106640 20 Kafka run: n/ack received, invoking callback with tag=171 -4> 2024-11-21T05:51:48.037+0000 7f49b4a57640 20 req 15718117396428226637 0.066999547s s3:complete_multipart get_obj_state: octx=0x562ae0fae620 obj=davidh.198-bucky-3629-1:_multipart_prefix1key_davidh.198-bucky-3629-1_70.2~NG_akG9dWbdusnno4QxYCP0_00k8Y-d.meta state=0x562adff321e8 s->prefetch_data=0 -3> 2024-11-21T05:51:48.037+0000 7f49b4a57640 20 req 15718117396428226637 0.066999547s s3:complete_multipart get_obj_state: octx=0x562ae0fae620 obj=davidh.198-bucky-3629-1:_multipart_prefix1key_davidh.198-bucky-3629-1_70.2~NG_akG9dWbdusnno4QxYCP0_00k8Y-d.meta state=0x562adff321e8 s->prefetch_data=0 -2> 2024-11-21T05:51:48.037+0000 7f49b4a57640 20 req 15718117396428226637 0.066999547s s3:complete_multipart prepare_atomic_modification: state is not atomic. state=0x562adff321e8 -1> 2024-11-21T05:51:48.038+0000 7f49b4a57640 20 req 15718117396428226637 0.067999534s s3:complete_multipart bucket index object: :.dir.9ebac6ff-1b96-47e9-8a41-f975432acaaf.56862.2.3 0> 2024-11-21T05:51:48.046+0000 7f490c106640 -1 *** Caught signal (Aborted) ** in thread 7f490c106640 thread_name:kafka_manager ceph version 19.2.0-53.el9cp (677d8728b1c91c14d54eedf276ac61de636606f8) squid (stable) 1: /lib64/libc.so.6(+0x3e730) [0x7f4a3aee6730] 2: /lib64/libc.so.6(+0x8ba6c) [0x7f4a3af33a6c] 3: raise() 4: abort() 5: /lib64/libc.so.6(+0x29170) [0x7f4a3aed1170] 6: /lib64/libc.so.6(+0x37217) [0x7f4a3aedf217] 7: /lib64/libc.so.6(+0x92248) [0x7f4a3af3a248] 8: (std::_Function_handler<void (int), RGWPubSubKafkaEndpoint::send(rgw_pubsub_s3_event const&, optional_yield)::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int&&)+0x95) [0x562adaf1d035] 9: (rgw::kafka::message_callback(rd_kafka_s*, rd_kafka_message_s const*, void*)+0x20f) [0x562adaf873ff] 10: /lib64/librdkafka.so.1(+0x256ef) [0x7f4a3b62e6ef] 11: /lib64/librdkafka.so.1(+0x5b862) [0x7f4a3b664862] 12: rd_kafka_poll() 13: (rgw::kafka::Manager::run()+0x5a9) [0x562adaf8eff9] 14: /lib64/libstdc++.so.6(+0xdbad4) [0x7f4a3b283ad4] 15: /lib64/libc.so.6(+0x89d22) [0x7f4a3af31d22] 16: /lib64/libc.so.6(+0x10ed40) [0x7f4a3afb6d40] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. fail log snippet: Traceback (most recent call last): File "/home/cephuser/rgw-tests/ceph-qe-scripts/rgw/v2/tests/s3_swift/test_bucket_notifications.py", line 539, in <module> test_exec(config, ssh_con) File "/home/cephuser/rgw-tests/ceph-qe-scripts/rgw/v2/tests/s3_swift/test_bucket_notifications.py", line 327, in test_exec reusable.upload_mutipart_object( File "/home/cephuser/rgw-tests/ceph-qe-scripts/rgw/v2/tests/s3_swift/reusable.py", line 609, in upload_mutipart_object mpu.complete(MultipartUpload=parts_info) File "/home/cephuser/venv/lib64/python3.9/site-packages/boto3/resources/factory.py", line 581, in do_action response = action(self, *args, **kwargs) File "/home/cephuser/venv/lib64/python3.9/site-packages/boto3/resources/action.py", line 88, in __call__ response = getattr(parent.meta.client, operation_name)(*args, **params) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/client.py", line 569, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/client.py", line 1005, in _make_api_call http, parsed_response = self._make_request( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/client.py", line 1029, in _make_request return self._endpoint.make_request(operation_model, request_dict) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/endpoint.py", line 119, in make_request return self._send_request(request_dict, operation_model) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/endpoint.py", line 200, in _send_request while self._needs_retry( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/endpoint.py", line 360, in _needs_retry responses = self._event_emitter.emit( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/hooks.py", line 412, in emit return self._emitter.emit(aliased_event_name, **kwargs) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/hooks.py", line 256, in emit return self._emit(event_name, kwargs) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/hooks.py", line 239, in _emit response = handler(**kwargs) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 207, in __call__ if self._checker(**checker_kwargs): File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 284, in __call__ should_retry = self._should_retry( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 320, in _should_retry return self._checker(attempt_number, response, caught_exception) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 363, in __call__ checker_response = checker( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 247, in __call__ return self._check_caught_exception( File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/retryhandler.py", line 416, in _check_caught_exception raise caught_exception File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/endpoint.py", line 279, in _do_get_response http_response = self._send(request) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/endpoint.py", line 383, in _send return self.http_session.send(request) File "/home/cephuser/venv/lib64/python3.9/site-packages/botocore/httpsession.py", line 493, in send raise EndpointConnectionError(endpoint_url=request.url, error=e) botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://10.0.67.212:80/davidh.198-bucky-3629-1/prefix1key_davidh.198-bucky-3629-1_70?uploadId=2~NG_akG9dWbdusnno4QxYCP0_00k8Y-d" Version-Release number of selected component (if applicable): ceph version 19.2.0-53.el9cp How reproducible: intermittent Steps to Reproduce: 1.deploy rhcs8.0 cluster and disable notification_v2 or try on an upgraded env from 7.1 to 8.0 2.create an rgw user and bucket 3.create a topic and put bucket notifications 2024-11-21 05:50:47,251 INFO: executing cmd: radosgw-admin topic get --topic cephci-kafka-broker-ack-type-b47c2192b2284487 2024-11-21 05:50:47,583 INFO: cmd excuted 2024-11-21 05:50:47,584 INFO: { "owner": "davidh.198", "name": "cephci-kafka-broker-ack-type-b47c2192b2284487", "dest": { "push_endpoint": "kafka://localhost:9093", "push_endpoint_args": "Version=2010-03-31&ca-location=/usr/local/kafka/y-ca.crt&kafka-ack-level=broker&use-ssl=true&verify-ssl=false", "push_endpoint_topic": "cephci-kafka-broker-ack-type-b47c2192b2284487", "stored_secret": false, "persistent": false, "persistent_queue": "", "time_to_live": "None", "max_retries": "None", "retry_sleep_duration": "None" }, "arn": "arn:aws:sns:default::cephci-kafka-broker-ack-type-b47c2192b2284487", "opaqueData": "", "policy": "" } 2024-11-21 05:50:47,619 INFO: get bucket notification for bucket : davidh.198-bucky-3629-1 2024-11-21 05:50:47,667 INFO: bucket notification for bucket: davidh.198-bucky-3629-1 is { "ResponseMetadata": { "RequestId": "tx000002ecb61d399da34ee-00673eca37-56862-default", "HostId": "", "HTTPStatusCode": 200, "HTTPHeaders": { "x-amz-request-id": "tx000002ecb61d399da34ee-00673eca37-56862-default", "content-type": "application/xml", "server": "Ceph Object Gateway (squid)", "content-length": "372", "date": "Thu, 21 Nov 2024 05:50:47 GMT", "connection": "Keep-Alive" }, "RetryAttempts": 0 }, "TopicConfigurations": [ { "Id": "notification-Multipart", "TopicArn": "arn:aws:sns:default::cephci-kafka-broker-ack-type-b47c2192b2284487", "Events": [ "s3:ObjectCreated:*", "s3:ObjectRemoved:*" ], "Filter": { "Key": { "FilterRules": [ { "Name": "prefix", "Value": "prefix1" } ] } } } ] } 4.create multipart-upload, upload parts and complete-multipart-upload. observed rgw crash after few iterations of multipart objects upload. Actual results: observed rgw crash with complete-multipart-upload on a bucket with notifications configured with kafka-broker on a kafka-ssl endpoint after disabling notification_v2 Expected results: rgw should not crash even if we disable notification_v2 Additional info: fail log on fresh 8.0 cluster after disabling notification_v2: http://magna002.ceph.redhat.com/cephci-jenkins/hsm/TFA_squid_kafka_ssl_notif/test_bucket_notification_ssl_kafka_broker_multipart.console.log_fresh_deploy_8.0_disable_notif_v2_iter2 rgw debug logs: http://magna002.ceph.redhat.com/cephci-jenkins/hsm/TFA_squid_kafka_ssl_notif/rgw_logs_debug_20_with_rgw_crash_log fail log on an upgraded environment from 7.1 to 8.0: http://magna002.ceph.redhat.com/cephci-jenkins/hsm/TFA_squid_kafka_ssl_notif/test_bucket_notification_ssl_kafka_broker_multipart.console.log_upgraded_cluster_v2_enabled_disabled_and_enabled