Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2305659

Summary: [8.0][minimal-pausing-reshard]: Error during resharding bucket squid-1:(16) Device or resource busy while an incremental sync was in progress.
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vidushi Mishra <vimishra>
Component: RGWAssignee: J. Eric Ivancich <ivancich>
Status: CLOSED ERRATA QA Contact: Vidushi Mishra <vimishra>
Severity: high Docs Contact: Rivka Pollack <rpollack>
Priority: unspecified    
Version: 8.0CC: ceph-eng-bugs, cephqe-warriors, ivancich, mbenjamin, rpollack, tserlin
Target Milestone: ---   
Target Release: 9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-20.1.0-26 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2026-01-29 06:51:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vidushi Mishra 2024-08-19 04:59:52 UTC
Description of problem:

Error during resharding bucket squid-1:(16) Device or resource busy while an incremental sync was in progress.


Version-Release number of selected component (if applicable):

ceph version 19.1.0-19.0.TEST.bz2303488.el9cp


How reproducible:
1/1

Steps to Reproduce:

1. Create a bucket 'squid-1', and upload 10M bi-directionally (5M<=>5M) in a multisite environment. 
2. Let the sync complete on both sites and the bucket 'squid-1" should have 10M objects.
3. Now upload further around 70M objects to the bucket bi-directionally.
4. We observed that incremental sync was in progress, rgw process stopped.

-------------- journalctl logs ------------------

Aug 16 08:12:39 extensa027 radosgw[673703]: RGW-SYNC:data:sync:shard[50]: incremental sync on squid-1:893184e0-7f65-4b1d-9e0b-45e08370ce62.58084.1:310shard: 50on gen 2
Aug 16 08:12:39 extensa027 radosgw[673703]: RGW-SYNC:data:sync:shard[50]: incremental sync on squid-1:893184e0-7f65-4b1d-9e0b-45e08370ce62.58084.1:1334shard: 50on gen 2
Aug 16 08:12:39 extensa027 radosgw[673703]: RGW-SYNC:data:sync:shard[50]: incremental sync on squid-1:893184e0-7f65-4b1d-9e0b-45e08370ce62.58084.1:1846shard: 50on gen 2
Aug 16 08:43:12 extensa027 radosgw[673703]: INFO: RGWReshardLock::lock found lock on reshard.0000000003 to be held by another RGW process; skipping for now
Aug 16 08:43:12 extensa027 radosgw[673703]: INFO: RGWReshardLock::lock found lock on reshard.0000000005 to be held by another RGW process; skipping for now
Aug 16 08:43:12 extensa027 radosgw[673703]: INFO: RGWReshardLock::lock found lock on reshard.0000000007 to be held by another RGW process; skipping for now
Aug 16 08:43:12 extensa027 radosgw[673703]: INFO: RGWReshardLock::lock found lock on reshard.0000000009 to be held by another RGW process; skipping for now
Aug 16 08:59:50 extensa027 systemd[1]: Stopping Ceph rgw.india.ms_io.extensa027.akmjht for eb380afa-58d8-11ef-b05f-ac1f6bcba9f0...
Aug 16 08:59:51 extensa027 radosgw[673703]: received  signal: Terminated from /run/podman-init -- /usr/bin/radosgw -n client.rgw.india.ms_io.extensa027.akmjht -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journa>
Aug 16 08:59:51 extensa027 ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0-rgw-india-ms_io-extensa027-akmjht[673699]: 2024-08-16T08:59:51.099+0000 7f76b4472640 -1 received  signal: Terminated from /run/podman-init -- /usr/bin/radosgw -n client.rgw.in>
Aug 16 08:59:51 extensa027 radosgw[673703]: handle_sigterm
Aug 16 08:59:51 extensa027 radosgw[673703]: handle_sigterm set alarm for 120
Aug 16 08:59:51 extensa027 radosgw[673703]: shutting down
Aug 16 08:59:51 extensa027 ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0-rgw-india-ms_io-extensa027-akmjht[673699]: 2024-08-16T08:59:51.099+0000 7f76b7cde880 -1 shutting down
Aug 16 09:00:01 extensa027 bash[1702160]: time="2024-08-16T09:00:01Z" level=warning msg="StopSignal SIGTERM failed to stop container ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0-rgw-india-ms_io-extensa027-akmjht in 10 seconds, resorting to SIGKILL"
Aug 16 09:00:01 extensa027 podman[1702160]: 2024-08-16 09:00:01.140251067 +0000 UTC m=+10.070226320 container died dcfdb35d4093c2584f7edbe5f02363131b971187ddcbf6e1182330c0741fb03c (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha2>
Aug 16 09:00:01 extensa027 podman[1702160]: 2024-08-16 09:00:01.147909526 +0000 UTC m=+10.077884760 container cleanup dcfdb35d4093c2584f7edbe5f02363131b971187ddcbf6e1182330c0741fb03c (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@s>
Aug 16 09:00:01 extensa027 bash[1702160]: ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0-rgw-india-ms_io-extensa027-akmjht
Aug 16 09:00:01 extensa027 podman[1702181]: 2024-08-16 09:00:01.176148472 +0000 UTC m=+0.032068019 container remove dcfdb35d4093c2584f7edbe5f02363131b971187ddcbf6e1182330c0741fb03c (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha>
Aug 16 09:00:01 extensa027 systemd[1]: ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0.ms_io.extensa027.akmjht.service: Main process exited, code=exited, status=137/n/a
Aug 16 09:00:01 extensa027 systemd[1]: ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0.ms_io.extensa027.akmjht.service: Failed with result 'exit-code'.
Aug 16 09:00:01 extensa027 systemd[1]: Stopped Ceph rgw.india.ms_io.extensa027.akmjht for eb380afa-58d8-11ef-b05f-ac1f6bcba9f0.
Aug 16 09:00:01 extensa027 systemd[1]: ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0.ms_io.extensa027.akmjht.service: Consumed 3h 36min 39.595s CPU time.
Aug 16 09:00:01 extensa027 systemd[1]: Starting Ceph rgw.india.ms_io.extensa027.akmjht for eb380afa-58d8-11ef-b05f-ac1f6bcba9f0...
Aug 16 09:00:01 extensa027 podman[1702318]: 2024-08-16 09:00:01.617484117 +0000 UTC m=+0.034128331 container create b98ae76b94da49055f4e45ee64995dbee60f76c384e729d3a10e7b7a2acd27e5 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha>
Aug 16 09:00:01 extensa027 podman[1702318]: 2024-08-16 09:00:01.639559653 +0000 UTC m=+0.056203867 container init b98ae76b94da49055f4e45ee64995dbee60f76c384e729d3a10e7b7a2acd27e5 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha25>
Aug 16 09:00:01 extensa027 podman[1702318]: 2024-08-16 09:00:01.643420351 +0000 UTC m=+0.060064563 container start b98ae76b94da49055f4e45ee64995dbee60f76c384e729d3a10e7b7a2acd27e5 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha2>
Aug 16 09:00:01 extensa027 podman[1702318]: 2024-08-16 09:00:01.606318249 +0000 UTC m=+0.022962460 image pull 652118a6a665ff09967994475a05cf5c97048eb415daf642e588df5bb8bccc11 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:4d45ed8d5>
Aug 16 09:00:01 extensa027 bash[1702318]: b98ae76b94da49055f4e45ee64995dbee60f76c384e729d3a10e7b7a2acd27e5
Aug 16 09:00:01 extensa027 systemd[1]: Started Ceph rgw.india.ms_io.extensa027.akmjht for eb380afa-58d8-11ef-b05f-ac1f6bcba9f0.
Aug 16 09:00:01 extensa027 radosgw[1702346]: deferred set uid:gid to 167:167 (ceph:ceph)
Aug 16 09:00:01 extensa027 radosgw[1702346]: ceph version 19.1.0-19.0.TEST.bz2303488.el9cp (850a825c38990c3b49474dffc8c191bc3db0a459) squid (rc), process radosgw, pid 2
Aug 16 09:00:01 extensa027 radosgw[1702346]: framework: beast
Aug 16 09:00:01 extensa027 radosgw[1702346]: framework conf key: port, val: 80
Aug 16 09:00:01 extensa027 radosgw[1702346]: init_numa not setting numa affinity
Aug 16 09:00:01 extensa027 radosgw[1702346]: v1 topic migration: starting v1 topic migration..
Aug 16 09:00:01 extensa027 radosgw[1702346]: LDAP not started since no server URIs were provided in the configuration.
Aug 16 09:00:01 extensa027 ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0-rgw-india-ms_io-extensa027-akmjht[1702339]: 2024-08-16T09:00:01.787+0000 7f3a827da880 -1 LDAP not started since no server URIs were provided in the configuration.
Aug 16 09:00:01 extensa027 radosgw[1702346]: rgw main: Lua ERROR: failed to find luarocks
Aug 16 09:00:01 extensa027 radosgw[1702346]: framework: beast
Aug 16 09:00:01 extensa027 radosgw[1702346]: framework conf key: ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
Aug 16 09:00:01 extensa027 radosgw[1702346]: framework conf key: ssl_private_key, val: config://rgw/cert/$realm/$zone.key
Aug 16 09:00:01 extensa027 radosgw[1702346]: starting handler: beast
Aug 16 09:00:01 extensa027 radosgw[1702346]: supplemental groups:
Aug 16 09:00:01 extensa027 radosgw[1702346]:   992 = qat
Aug 16 09:00:01 extensa027 radosgw[1702346]: set uid:gid to 167:167 (ceph:ceph)
Aug 16 09:00:01 extensa027 radosgw[1702346]: mgrc service_daemon_register rgw.59546 metadata {arch=x86_64,ceph_release=squid,ceph_version=ceph version 19.1.0-19.0.TEST.bz2303488.el9cp (850a825c38990c3b49474dffc8c191bc3db0a459) squid (rc),ceph_v>
Aug 16 09:00:01 extensa027 radosgw[1702346]: v1 topic migration: finished v1 topic migration
Aug 16 09:50:01 extensa027 radosgw[1702346]: INFO: RGWReshardLock::lock found lock on squid-1:893184e0-7f65-4b1d-9e0b-45e08370ce62.58084.1 to be held by another RGW process; skipping for now
Aug 16 09:50:01 extensa027 radosgw[1702346]: rgw reshard worker thread: process_entry: Error during resharding bucket squid-1:(16) Device or resource busy
Aug 17 00:00:47 extensa027 radosgw[1702346]: received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
Aug 17 00:00:47 extensa027 ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0-rgw-india-ms_io-extensa027-akmjht[1702339]: 2024-08-17T00:00:47.676+0000 7f3a7ef6e640 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), >
Aug 17 00:00:47 extensa027 radosgw[1702346]: received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
Aug 17 00:00:47 extensa027 ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0-rgw-india-ms_io-extensa027-akmjht[1702339]: 2024-08-17T00:00:47.691+0000 7f3a7ef6e640 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), >
Aug 18 00:00:47 extensa027 radosgw[1702346]: received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
Aug 18 00:00:47 extensa027 ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0-rgw-india-ms_io-extensa027-akmjht[1702339]: 2024-08-18T00:00:47.676+0000 7f3a7ef6e640 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), >
Aug 18 00:00:47 extensa027 radosgw[1702346]: received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
Aug 18 00:00:47 extensa027 ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0-rgw-india-ms_io-extensa027-akmjht[1702339]: 2024-08-18T00:00:47.690+0000 7f3a7ef6e640 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), >
Aug 19 00:00:47 extensa027 radosgw[1702346]: received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
Aug 19 00:00:47 extensa027 ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0-rgw-india-ms_io-extensa027-akmjht[1702339]: 2024-08-19T00:00:47.676+0000 7f3a7ef6e640 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), >
Aug 19 00:00:47 extensa027 radosgw[1702346]: received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
Aug 19 00:00:47 extensa027 ceph-eb380afa-58d8-11ef-b05f-ac1f6bcba9f0-rgw-india-ms_io-extensa027-akmjht[1702339]: 2024-08-19T00:00:47.691+0000 7f3a7ef6e640 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(),

Actual results:




Expected results:


Additional info:

Comment 19 errata-xmlrpc 2026-01-29 06:51:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 9.0 Security and Enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2026:1536

Comment 20 Red Hat Bugzilla 2026-02-06 04:25:54 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days or the product is inactive and locked