Description of problem: with kafka-ack-level=broker, objects deletion with boto3 failed with read timeout on endpoint url and observed many delete notifications sent for single object. the first delete notification received for every object has correct object sizes and the repeated one's have object size 0 this issue is seen only for kafka-ack-type broker and persistent=false this issue is not seen on rhcs6.1 and observed on rhcs7.0 pass log for rhcs6.1 : http://magna002.ceph.redhat.com/cephci-jenkins/test-runs/17.2.6-136/Weekly/rgw/9/tier-2_rgw_test_bucket_notifications/ failure log on rhcs7.0 : http://magna002.ceph.redhat.com/cephci-jenkins/test-runs/18.2.0-6/Weekly/rgw/4/tier-2_rgw_test_bucket_notifications/ also, if we try to delete all objects from a bucket not configured with notifications using boto3 resoruce, it is working fine moreover, this issue is not seen when tried to delete objects recursively using aws-cli AWS_ACCESS_KEY_ID=abc1 AWS_SECRET_ACCESS_KEY=abc1 aws --endpoint-url http://localhost:80 s3 rm s3://notif-bkt6 --recursive Version-Release number of selected component (if applicable): ceph version 18.2.0-27.el9cp How reproducible: always Steps to Reproduce: 1.deploy a cluster on rhcs7.0 with rgw daemon 2.create a rgw user radosgw-admin user create --display-name "user1" --uid user1 --access_key abc1 --secret_key abc1 3.create a bucket AWS_ACCESS_KEY_ID=abc1 AWS_SECRET_ACCESS_KEY=abc1 aws --endpoint-url http://localhost:80 s3 mb s3://notif-bkt5 4.create a topic with kafka-ack-type=broker AWS_ACCESS_KEY_ID=abc1 AWS_SECRET_ACCESS_KEY=abc1 aws --endpoint-url http://localhost:80 sns create-topic --name=topic_for_delete_testing5 --attributes='{"push-endpoint": "kafka://localhost:9092","kafka-ack-level":"broker", "use-ssl": "false", "verify-ssl": "false"}' 5.put bucket notifications for the bucket AWS_ACCESS_KEY_ID=abc1 AWS_SECRET_ACCESS_KEY=abc1 aws --endpoint-url http://localhost:80 s3api put-bucket-notification-configuration --bucket notif-bkt5 --notification-configuration='{"TopicConfigurations": [{"Id": "notif_for_delete_testing5", "TopicArn": "arn:aws:sns:shared::topic_for_delete_testing5", "Events": ["s3:ObjectCreated:*", "s3:ObjectRemoved:*"]}]}' 6.create a random file base64 /dev/urandom | head -c 15KB > obj 7.run the below code to upload and then delete all objects at once using boto3 resource import boto3 import time bucket='notif-bkt5' rgw_conn = boto3.resource( "s3", aws_access_key_id="abc1", aws_secret_access_key="abc1", endpoint_url="http://localhost:80" ) bkt_conn = rgw_conn.Bucket(bucket) objects_count = 25 print(f"uploading {objects_count} objects in bucket: {bucket}") for obj_index in range(objects_count): obj_conn = bkt_conn.Object(f"prefix1_obj_{obj_index}") obj_conn.upload_file('/home/cephuser/obj') time.sleep(5) print(f"listing all objects in bucket: {bucket}") objects_conn = bkt_conn.objects all_objects = objects_conn.all() print(f"all objects: {all_objects}") for obj in all_objects: print(f"object_name: {obj.key}") time.sleep(5) print(f"deleting all objects in bucket: {bucket}") response = objects_conn.delete() print(response) 8. the above code fails at objects deletion with read timeout error botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "http://localhost:80/notif-bkt5?delete" 9. received 105 notifications altogether for both put and delete. out of which only 25 notifcations are ObjectCreated:Put (which are correct actually) and the rest are ObjectRemoved:Delete 10. also noticed some objects are still remain in the bucket after failure with boto3 deletion [cephuser@ceph-pri-hmaheswa-ms-rhcs7-0kgilg-node5 ~]$ AWS_ACCESS_KEY_ID=abc1 AWS_SECRET_ACCESS_KEY=abc1 aws --endpoint-url http://localhost:80 s3 ls s3://notif-bkt5 2023-09-15 12:46:07 15000 prefix1_obj_23 2023-09-15 12:46:08 15000 prefix1_obj_24 2023-09-15 12:45:55 15000 prefix1_obj_3 2023-09-15 12:45:55 15000 prefix1_obj_4 2023-09-15 12:45:56 15000 prefix1_obj_5 2023-09-15 12:45:57 15000 prefix1_obj_6 2023-09-15 12:45:57 15000 prefix1_obj_7 2023-09-15 12:45:58 15000 prefix1_obj_8 2023-09-15 12:45:59 15000 prefix1_obj_9 [cephuser@ceph-pri-hmaheswa-ms-rhcs7-0kgilg-node5 ~]$ Actual results: objects deletion with boto3 resource failed with read timeout and seen repetitive delete notifications for each object Expected results: objects deletion with boto3 resource is successful and seen only one delete notification for each object Additional info: manual testing details are present in this doc: https://docs.google.com/document/d/1S3Pp3XIi8BxrjJ-JoaZzVEV0w3aGGs8ZjhZgnFwesME/edit?usp=sharing rgw_node: 10.0.207.70 creds: cephuser/cephuser ; root/password
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:7780