Description of problem: [RGW] Files incorrectly uploaded with swift api Version-Release number of selected component (if applicable): Red Hat Ceph Storage 1.3.2 ceph version 0.94.5-12.el7cp (b08a982b961058eae6ee7c6a0efd2666d0bb4b1a) Covered in detail and excellently in upstream tracker: http://tracker.ceph.com/issues/17040 How reproducible: Always if reproduced correctly 1. Number of hosts and osds 2. Erasure code k and m values 3. Erasure code plugin 4. Bucket name and object Two most important point: 5. 6. object size for me it was (size=55042267) when I have seen this issue in my testbed Steps to Reproduce: 1. Setup Ceph cluster with 9 hosts with 18 osds 2. Create .rgw.buckets pool with: # ceph osd erasure-code-profile set bigdata-isa k=5 m=2 plugin=isa technique=reed_sol_van # ceph osd pool create .rgw.buckets 256 256 erasure bigdata-isa pool '.rgw.buckets' created 3.Set in ceph.conf [global] section: in Ceph.conf [global] ... osd_pool_erasure_code_stripe_width = 16384 # ceph osd dump | grep -i .rgw.buckets pool 41 '.rgw.buckets.extra' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 26506 flags hashpspool stripe_width 0 pool 42 '.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 26510 flags hashpspool stripe_width 0 pool 46 '.rgw.buckets' erasure size 7 min_size 5 crush_ruleset 3 object_hash rjenkins pg_num 256 pgp_num 256 last_change 26537 flags hashpspool stripe_width 16480 4. and for All 18 OSDs have below given value set: # ceph daemon osd.28 config show | grep osd_pool_erasure_code_stripe_width "osd_pool_erasure_code_stripe_width": "16384",
- **I am able to reproduce it with specific object size and specific setup** As given below: ------------------------- Testbed details ----------------------------------------------- # ceph osd erasure-code-profile set bigdata-isa k=5 m=2 plugin=isa technique=reed_sol_van # ceph osd erasure-code-profile ls bigdata-isa default myprofile # ceph osd erasure-code-profile get bigdata-isa directory=/usr/lib64/ceph/erasure-code k=5 m=2 plugin=isa technique=reed_sol_van # ceph osd pool create .rgw.buckets 256 256 erasure bigdata-isa pool '.rgw.buckets' created in Ceph.conf [global] ... osd_pool_erasure_code_stripe_width = 16384 # ceph osd dump | grep -i .rgw.buckets pool 41 '.rgw.buckets.extra' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 26506 flags hashpspool stripe_width 0 pool 42 '.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 26510 flags hashpspool stripe_width 0 pool 46 '.rgw.buckets' erasure size 7 min_size 5 crush_ruleset 3 object_hash rjenkins pg_num 256 pgp_num 256 last_change 26537 flags hashpspool stripe_width 16480 All OSDs have below given value set: # ceph daemon osd.28 config show | grep osd_pool_erasure_code_stripe_width "osd_pool_erasure_code_stripe_width": "16384", root rgw { id -25 # do not change unnecessarily # weight 6.660 alg straw2 hash 0 # rjenkins1 item ceph-node-11 weight 0.740 item ceph-node-13 weight 0.740 item ceph-node-5 weight 0.740 item ceph-node-6 weight 0.740 item ceph-node-7 weight 0.740 item ceph-node-4 weight 0.740 item ceph-node-8 weight 0.740 item ceph-node-10 weight 0.740 item ceph-node-9 weight 0.740 } rule .rgw.buckets { ruleset 3 type erasure min_size 3 max_size 7 step set_chooseleaf_tries 5 step set_choose_tries 100 step take rgw step chooseleaf indep 0 type host step emit } # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -25 6.65991 root rgw -3 0.73999 host ceph-node-11 0 0.45000 osd.0 up 1.00000 1.00000 3 0.28999 osd.3 up 1.00000 1.00000 -5 0.73999 host ceph-node-13 2 0.45000 osd.2 up 1.00000 1.00000 5 0.28999 osd.5 up 1.00000 1.00000 -17 0.73999 host ceph-node-5 9 0.45000 osd.9 up 1.00000 1.00000 10 0.28999 osd.10 up 1.00000 1.00000 -18 0.73999 host ceph-node-6 12 0.45000 osd.12 up 1.00000 1.00000 13 0.28999 osd.13 up 1.00000 1.00000 -19 0.73999 host ceph-node-7 15 0.45000 osd.15 up 1.00000 1.00000 16 0.28999 osd.16 up 1.00000 1.00000 -20 0.73999 host ceph-node-4 18 0.45000 osd.18 up 1.00000 1.00000 19 0.28999 osd.19 up 1.00000 1.00000 -22 0.73999 host ceph-node-8 24 0.45000 osd.24 up 1.00000 1.00000 25 0.28999 osd.25 up 1.00000 1.00000 -23 0.73999 host ceph-node-10 27 0.45000 osd.27 up 1.00000 1.00000 28 0.28999 osd.28 up 1.00000 1.00000 -24 0.73999 host ceph-node-9 30 0.45000 osd.30 up 1.00000 1.00000 31 0.28999 osd.31 up 1.00000 1.00000 --------------------------- Testing logs from my testbed -------------------------------- 1. I have created two files with names 456 and 457 below given size. # ll total 3702644 -rw-r--r--. 1 root root 55042267 Aug 22 15:18 456 -rw-r--r--. 1 root root 55042267 Aug 22 15:22 457 2. Uploaded them via swift api: # swift -A http://192.168.2.248:80/auth/1.0 -U testuser:swift -K 'i5dF5PdRI***********qh8tDZzYtzGc' upload evgeny-test 456 456 # swift -A http://192.168.2.248:80/auth/1.0 -U testuser:swift -K 'i5dF5PdRI***********qh8tDZzYtzGc' upload evgeny-test 457 457 3. Listed the files via s3cmd: # vumrao @ vikhyat in ~ [15:22:57] $ s3cmd ls s3://evgeny-test 2016-08-22 09:51 55042267 s3://evgeny-test/456 2016-08-22 09:52 55042267 s3://evgeny-test/457 4. s3cmd get is failing for filename 456: Getting error "ERROR: got unexpected error when trying to read object: -2" # vumrao @ vikhyat in ~ [15:23:01] $ s3cmd get s3://evgeny-test/456 download: 's3://evgeny-test/456' -> './456' [1 of 1] download: 's3://evgeny-test/456' -> './456' [1 of 1] 34065908 of 55042267 61% in 30s 1078.46 kB/s failed WARNING: Retrying failed request: /456 (EOF from S3!) WARNING: Waiting 3 sec... download: 's3://evgeny-test/456' -> './456' [1 of 1] ERROR: S3 error: 404 (Not Found) 5. logs for file : 456 # tail -f client.radosgw.gateway.log 2016-08-22 15:23:09.518354 7fa1eafa5700 1 ====== starting new request req=0x7fa21c043c00 ===== 2016-08-22 15:23:09.536907 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=4705184 stripe_ofs=4705184 part_ofs=0 rule->part_size=0 2016-08-22 15:23:09.537250 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=8899488 stripe_ofs=8899488 part_ofs=0 rule->part_size=0 2016-08-22 15:23:09.537422 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=13093792 stripe_ofs=13093792 part_ofs=0 rule->part_size=0 2016-08-22 15:23:09.537688 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=17288096 stripe_ofs=17288096 part_ofs=0 rule->part_size=0 2016-08-22 15:23:09.648155 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=21482400 stripe_ofs=21482400 part_ofs=0 rule->part_size=0 2016-08-22 15:23:09.704990 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=25676704 stripe_ofs=25676704 part_ofs=0 rule->part_size=0 2016-08-22 15:23:09.738192 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=29871008 stripe_ofs=29871008 part_ofs=0 rule->part_size=0 2016-08-22 15:23:09.750909 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=34065312 stripe_ofs=34065312 part_ofs=0 rule->part_size=0 2016-08-22 15:23:09.863284 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=38259616 stripe_ofs=38259616 part_ofs=0 rule->part_size=0 2016-08-22 15:23:10.000044 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=42453920 stripe_ofs=42453920 part_ofs=0 rule->part_size=0 2016-08-22 15:23:10.047265 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=46648224 stripe_ofs=46648224 part_ofs=0 rule->part_size=0 2016-08-22 15:23:10.085067 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=50842528 stripe_ofs=50842528 part_ofs=0 rule->part_size=0 2016-08-22 15:23:10.085185 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=55036832 stripe_ofs=55036832 part_ofs=0 rule->part_size=0 2016-08-22 15:23:10.085250 7fa1eafa5700 0 RGWObjManifest::operator++(): result: ofs=55042267 stripe_ofs=55042267 part_ofs=0 rule->part_size=0 2016-08-22 15:23:10.091220 7fa252ffd700 0 ERROR: got unexpected error when trying to read object: -2 ^^ Getting error "ERROR: got unexpected error when trying to read object: -2" 2016-08-22 15:23:10.362631 7fa1eafa5700 1 ====== req done req=0x7fa21c043c00 http_status=404 ====== 2016-08-22 15:23:10.362737 7fa1eafa5700 1 civetweb: 0x7fa21c01a600: 192.168.223.112 - - [22/Aug/2016:15:23:09 +0530] "GET /evgeny-test/456 HTTP/1.1" 404 0 - - 2016-08-22 15:23:43.370119 7fa1ea7a4700 1 ====== starting new request req=0x7fa220090c80 ===== 2016-08-22 15:23:43.384741 7fa1ea7a4700 0 RGWObjManifest::operator++(): result: ofs=38259616 stripe_ofs=38259616 part_ofs=510880 rule->part_size=0 2016-08-22 15:23:43.384833 7fa1ea7a4700 0 RGWObjManifest::operator++(): result: ofs=42453920 stripe_ofs=42453920 part_ofs=510880 rule->part_size=0 2016-08-22 15:23:43.384917 7fa1ea7a4700 0 RGWObjManifest::operator++(): result: ofs=46648224 stripe_ofs=46648224 part_ofs=510880 rule->part_size=0 2016-08-22 15:23:43.385012 7fa1ea7a4700 0 RGWObjManifest::operator++(): result: ofs=50842528 stripe_ofs=50842528 part_ofs=510880 rule->part_size=0 2016-08-22 15:23:43.490133 7fa1ea7a4700 0 RGWObjManifest::operator++(): result: ofs=55036832 stripe_ofs=55036832 part_ofs=510880 rule->part_size=0 2016-08-22 15:23:43.522875 7fa1ea7a4700 0 RGWObjManifest::operator++(): result: ofs=55042267 stripe_ofs=55042267 part_ofs=510880 rule->part_size=0 2016-08-22 15:23:43.529380 7fa252ffd700 0 ERROR: got unexpected error when trying to read object: -2 2016-08-22 15:23:43.664445 7fa1ea7a4700 1 ====== req done req=0x7fa220090c80 http_status=404 ====== 2016-08-22 15:23:43.664548 7fa1ea7a4700 1 civetweb: 0x7fa2200176e0: 192.168.223.112 - - [22/Aug/2016:15:23:43 +0530] "GET /evgeny-test/456 HTTP/1.1" 404 0 - - 4. s3cmd get is failing for filename 457: # vumrao @ vikhyat in ~ [15:23:43] C:12 $ s3cmd get s3://evgeny-test/457 download: 's3://evgeny-test/457' -> './457' [1 of 1] download: 's3://evgeny-test/457' -> './457' [1 of 1] 38260212 of 55042267 69% in 30s 1207.09 kB/s failed WARNING: Retrying failed request: /457 (EOF from S3!) WARNING: Waiting 3 sec... download: 's3://evgeny-test/457' -> './457' [1 of 1] ERROR: S3 error: 404 (Not Found) 5. logs for file : 457 2016-08-22 15:25:19.113203 7fa1e9fa3700 1 ====== starting new request req=0x7fa218054390 ===== 2016-08-22 15:25:19.113396 7fa1e9fa3700 1 ====== req done req=0x7fa218054390 http_status=400 ====== 2016-08-22 15:25:19.113510 7fa1e9fa3700 1 civetweb: 0x7fa218017680: 192.168.223.112 - - [22/Aug/2016:15:25:19 +0530] "GET /evgeny-test/457 HTTP/1.1" 400 0 - - 2016-08-22 15:25:19.116793 7fa1e87a0700 1 ====== starting new request req=0x7fa234002410 ===== 2016-08-22 15:25:19.135190 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=4705184 stripe_ofs=4705184 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.135318 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=8899488 stripe_ofs=8899488 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.135391 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=13093792 stripe_ofs=13093792 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.135481 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=17288096 stripe_ofs=17288096 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.224520 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=21482400 stripe_ofs=21482400 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.462679 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=25676704 stripe_ofs=25676704 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.517497 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=29871008 stripe_ofs=29871008 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.559498 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=34065312 stripe_ofs=34065312 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.605023 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=38259616 stripe_ofs=38259616 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.645311 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=42453920 stripe_ofs=42453920 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.670437 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=46648224 stripe_ofs=46648224 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.960881 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=50842528 stripe_ofs=50842528 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.999302 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=55036832 stripe_ofs=55036832 part_ofs=0 rule->part_size=0 2016-08-22 15:25:19.999545 7fa1e87a0700 0 RGWObjManifest::operator++(): result: ofs=55042267 stripe_ofs=55042267 part_ofs=0 rule->part_size=0 2016-08-22 15:25:20.017600 7fa252ffd700 0 ERROR: got unexpected error when trying to read object: -2 ^^ Getting error "ERROR: got unexpected error when trying to read object: -2" 2016-08-22 15:25:20.066954 7fa1e87a0700 1 ====== req done req=0x7fa234002410 http_status=404 ====== 2016-08-22 15:25:20.067045 7fa1e87a0700 1 civetweb: 0x7fa23401aa40: 192.168.223.112 - - [22/Aug/2016:15:25:19 +0530] "GET /evgeny-test/457 HTTP/1.1" 404 0 - - 2016-08-22 15:25:53.341747 7fa1eb7a6700 1 ====== starting new request req=0x7fa22c000ba0 ===== 2016-08-22 15:25:53.357689 7fa1eb7a6700 0 RGWObjManifest::operator++(): result: ofs=42453920 stripe_ofs=42453920 part_ofs=510880 rule->part_size=0 2016-08-22 15:25:53.357744 7fa1eb7a6700 0 RGWObjManifest::operator++(): result: ofs=46648224 stripe_ofs=46648224 part_ofs=510880 rule->part_size=0 2016-08-22 15:25:53.357802 7fa1eb7a6700 0 RGWObjManifest::operator++(): result: ofs=50842528 stripe_ofs=50842528 part_ofs=510880 rule->part_size=0 2016-08-22 15:25:53.357857 7fa1eb7a6700 0 RGWObjManifest::operator++(): result: ofs=55036832 stripe_ofs=55036832 part_ofs=510880 rule->part_size=0 2016-08-22 15:25:53.470335 7fa1eb7a6700 0 RGWObjManifest::operator++(): result: ofs=55042267 stripe_ofs=55042267 part_ofs=510880 rule->part_size=0 2016-08-22 15:25:53.481785 7fa252ffd700 0 ERROR: got unexpected error when trying to read object: -2 2016-08-22 15:25:53.675115 7fa1eb7a6700 1 ====== req done req=0x7fa22c000ba0 http_status=404 ====== 2016-08-22 15:25:53.675200 7fa1eb7a6700 1 civetweb: 0x7fa22c01a600: 192.168.223.112 - - [22/Aug/2016:15:25:53 +0530] "GET /evgeny-test/457 HTTP/1.1" 404 0 - - ---------------------------------
http://tracker.ceph.com/issues/17040#note-3 - As per upstream tracker note-3 When osd_pool_erasure_code_stripe_width is set to default 4096, it is no longer reproducible. - I have tested the same with osd_pool_erasure_code_stripe_width is set to default 4096 it is working fine. So to me also it looks like issue is with object size including with osd_pool_erasure_code_stripe_width value as 16384.
It could be this upstream issue: http://tracker.ceph.com/issues/15745 hammer: https://github.com/ceph/ceph/pull/9407
(In reply to Orit Wasserman from comment #4) > It could be this upstream issue: http://tracker.ceph.com/issues/15745 > > hammer: https://github.com/ceph/ceph/pull/9407 Thank you Orit. I have created test packages with 9407 patch and fix looks good. I am not able to reproduce.
Hi Bara, The text is sufficient, Orit
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-1972.html