Bug 1369013 - [RGW] Files incorrectly uploaded with swift api
Summary: [RGW] Files incorrectly uploaded with swift api
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW
Version: 1.3.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 1.3.3
Assignee: Orit Wasserman
QA Contact: Tejas
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1372735
TreeView+ depends on / blocked
 
Reported: 2016-08-22 10:39 UTC by Vikhyat Umrao
Modified: 2019-11-14 08:59 UTC (History)
9 users (show)

Fixed In Version: RHEL: ceph-0.94.8-1.el7cp Ubuntu: ceph_0.94.8-2redhat1trusty
Doc Type: Bug Fix
Doc Text:
.Ceph Object Gateway now properly uploads files to erasure-coded pools Under certain conditions, Ceph Object Gateway did not properly upload files to an erasure-coded pool by using the SWIFT API. Consequently, such files were broken and an attempt to download them failed with the following error message: ---- ERROR: got unexpected error when trying to read object: -2 ---- The underlying source code has been modified, and Ceph Object Gateway now properly uploads files to erasure-coded pools.
Clone Of:
Environment:
Last Closed: 2016-09-29 13:00:44 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 15745 None None None 2016-08-23 10:17:55 UTC
Ceph Project Bug Tracker 17040 None None None 2016-08-22 10:39:54 UTC
Red Hat Knowledge Base (Solution) 2578081 None None None 2016-08-28 13:58:29 UTC
Red Hat Product Errata RHSA-2016:1972 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage 1.3.3 security, bug fix, and enhancement update 2016-09-29 16:51:21 UTC

Description Vikhyat Umrao 2016-08-22 10:39:03 UTC
Description of problem:
[RGW] Files incorrectly uploaded with swift api

Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 1.3.2
ceph version 0.94.5-12.el7cp (b08a982b961058eae6ee7c6a0efd2666d0bb4b1a)

Covered in detail and excellently in upstream tracker:
http://tracker.ceph.com/issues/17040

How reproducible:
Always if reproduced correctly
1. Number of hosts and osds
2. Erasure code k and m values
3. Erasure code plugin
4. Bucket name and object

Two most important point:

5. 
6. object size for me it was (size=55042267) when I have seen this issue in my testbed

Steps to Reproduce:
1. Setup Ceph cluster with 9 hosts with 18 osds
2. Create .rgw.buckets pool with:

# ceph osd erasure-code-profile set bigdata-isa k=5 m=2 plugin=isa technique=reed_sol_van

# ceph osd pool create .rgw.buckets 256 256 erasure bigdata-isa
pool '.rgw.buckets' created


3.Set in ceph.conf [global] section:

in Ceph.conf

[global]
...
osd_pool_erasure_code_stripe_width = 16384

# ceph osd dump | grep -i .rgw.buckets
pool 41 '.rgw.buckets.extra' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 26506 flags hashpspool stripe_width 0
pool 42 '.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 26510 flags hashpspool stripe_width 0
pool 46 '.rgw.buckets' erasure size 7 min_size 5 crush_ruleset 3 object_hash rjenkins pg_num 256 pgp_num 256 last_change 26537 flags hashpspool stripe_width 16480


4. and for All 18 OSDs have below given value set:

# ceph daemon osd.28 config show | grep osd_pool_erasure_code_stripe_width
    "osd_pool_erasure_code_stripe_width": "16384",

Comment 1 Vikhyat Umrao 2016-08-22 10:53:32 UTC
- **I am able to reproduce it with specific object size and specific setup** As given below:


------------------------- Testbed details -----------------------------------------------


# ceph osd erasure-code-profile set bigdata-isa k=5 m=2 plugin=isa technique=reed_sol_van

# ceph osd erasure-code-profile ls
bigdata-isa
default
myprofile

# ceph osd erasure-code-profile get bigdata-isa
directory=/usr/lib64/ceph/erasure-code
k=5
m=2
plugin=isa
technique=reed_sol_van


# ceph osd pool create .rgw.buckets 256 256 erasure bigdata-isa
pool '.rgw.buckets' created


in Ceph.conf

[global]
...
osd_pool_erasure_code_stripe_width = 16384

# ceph osd dump | grep -i .rgw.buckets
pool 41 '.rgw.buckets.extra' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 26506 flags hashpspool stripe_width 0
pool 42 '.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 26510 flags hashpspool stripe_width 0
pool 46 '.rgw.buckets' erasure size 7 min_size 5 crush_ruleset 3 object_hash rjenkins pg_num 256 pgp_num 256 last_change 26537 flags hashpspool stripe_width 16480


All OSDs have below given value set:

# ceph daemon osd.28 config show | grep osd_pool_erasure_code_stripe_width
    "osd_pool_erasure_code_stripe_width": "16384",



root rgw {
        id -25          # do not change unnecessarily
        # weight 6.660
        alg straw2
        hash 0  # rjenkins1
        item ceph-node-11 weight 0.740
        item ceph-node-13 weight 0.740
        item ceph-node-5 weight 0.740
        item ceph-node-6 weight 0.740
        item ceph-node-7 weight 0.740
        item ceph-node-4 weight 0.740
        item ceph-node-8 weight 0.740
        item ceph-node-10 weight 0.740
        item ceph-node-9 weight 0.740
}

rule .rgw.buckets {
        ruleset 3
        type erasure
        min_size 3
        max_size 7
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take rgw
        step chooseleaf indep 0 type host
        step emit
}



# ceph osd tree
ID  WEIGHT  TYPE NAME                   UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-25 6.65991 root rgw                                                      
 -3 0.73999     host ceph-node-11                                       
  0 0.45000         osd.0                    up  1.00000          1.00000 
  3 0.28999         osd.3                    up  1.00000          1.00000 
 -5 0.73999     host ceph-node-13                                       
  2 0.45000         osd.2                    up  1.00000          1.00000 
  5 0.28999         osd.5                    up  1.00000          1.00000 
-17 0.73999     host ceph-node-5                                        
  9 0.45000         osd.9                    up  1.00000          1.00000 
 10 0.28999         osd.10                   up  1.00000          1.00000 
-18 0.73999     host ceph-node-6                                        
 12 0.45000         osd.12                   up  1.00000          1.00000 
 13 0.28999         osd.13                   up  1.00000          1.00000 
-19 0.73999     host ceph-node-7                                        
 15 0.45000         osd.15                   up  1.00000          1.00000 
 16 0.28999         osd.16                   up  1.00000          1.00000 
-20 0.73999     host ceph-node-4                                        
 18 0.45000         osd.18                   up  1.00000          1.00000 
 19 0.28999         osd.19                   up  1.00000          1.00000 
-22 0.73999     host ceph-node-8                                        
 24 0.45000         osd.24                   up  1.00000          1.00000 
 25 0.28999         osd.25                   up  1.00000          1.00000 
-23 0.73999     host ceph-node-10                                       
 27 0.45000         osd.27                   up  1.00000          1.00000 
 28 0.28999         osd.28                   up  1.00000          1.00000 
-24 0.73999     host ceph-node-9                                        
 30 0.45000         osd.30                   up  1.00000          1.00000 
 31 0.28999         osd.31                   up  1.00000          1.00000 

--------------------------- Testing logs from my testbed --------------------------------

1. I have created two files with names 456 and 457 below given size.

# ll
total 3702644
-rw-r--r--. 1 root root   55042267 Aug 22 15:18 456
-rw-r--r--. 1 root root   55042267 Aug 22 15:22 457

2. Uploaded them via swift api:


# swift -A http://192.168.2.248:80/auth/1.0 -U testuser:swift -K 'i5dF5PdRI***********qh8tDZzYtzGc' upload evgeny-test 456
456

# swift -A http://192.168.2.248:80/auth/1.0 -U testuser:swift -K 'i5dF5PdRI***********qh8tDZzYtzGc' upload evgeny-test 457
457


3. Listed the files via s3cmd:

# vumrao @ vikhyat in ~ [15:22:57] 
$ s3cmd ls s3://evgeny-test
2016-08-22 09:51  55042267   s3://evgeny-test/456
2016-08-22 09:52  55042267   s3://evgeny-test/457

4. s3cmd get is failing for filename 456:
Getting error "ERROR: got unexpected error when trying to read object: -2"


# vumrao @ vikhyat in ~ [15:23:01] 
$ s3cmd get s3://evgeny-test/456
download: 's3://evgeny-test/456' -> './456'  [1 of 1]
download: 's3://evgeny-test/456' -> './456'  [1 of 1]
 34065908 of 55042267    61% in   30s  1078.46 kB/s  failed
WARNING: Retrying failed request: /456 (EOF from S3!)
WARNING: Waiting 3 sec...
download: 's3://evgeny-test/456' -> './456'  [1 of 1]
ERROR: S3 error: 404 (Not Found)

5. logs for file : 456

# tail -f client.radosgw.gateway.log


2016-08-22 15:23:09.518354 7fa1eafa5700  1 ====== starting new request req=0x7fa21c043c00 =====
2016-08-22 15:23:09.536907 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=4705184 stripe_ofs=4705184 part_ofs=0 rule->part_size=0
2016-08-22 15:23:09.537250 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=8899488 stripe_ofs=8899488 part_ofs=0 rule->part_size=0
2016-08-22 15:23:09.537422 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=13093792 stripe_ofs=13093792 part_ofs=0 rule->part_size=0
2016-08-22 15:23:09.537688 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=17288096 stripe_ofs=17288096 part_ofs=0 rule->part_size=0
2016-08-22 15:23:09.648155 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=21482400 stripe_ofs=21482400 part_ofs=0 rule->part_size=0
2016-08-22 15:23:09.704990 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=25676704 stripe_ofs=25676704 part_ofs=0 rule->part_size=0
2016-08-22 15:23:09.738192 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=29871008 stripe_ofs=29871008 part_ofs=0 rule->part_size=0
2016-08-22 15:23:09.750909 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=34065312 stripe_ofs=34065312 part_ofs=0 rule->part_size=0
2016-08-22 15:23:09.863284 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=38259616 stripe_ofs=38259616 part_ofs=0 rule->part_size=0
2016-08-22 15:23:10.000044 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=42453920 stripe_ofs=42453920 part_ofs=0 rule->part_size=0
2016-08-22 15:23:10.047265 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=46648224 stripe_ofs=46648224 part_ofs=0 rule->part_size=0
2016-08-22 15:23:10.085067 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=50842528 stripe_ofs=50842528 part_ofs=0 rule->part_size=0
2016-08-22 15:23:10.085185 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=55036832 stripe_ofs=55036832 part_ofs=0 rule->part_size=0
2016-08-22 15:23:10.085250 7fa1eafa5700  0 RGWObjManifest::operator++(): result: ofs=55042267 stripe_ofs=55042267 part_ofs=0 rule->part_size=0
2016-08-22 15:23:10.091220 7fa252ffd700  0 ERROR: got unexpected error when trying to read object: -2


^^ Getting error "ERROR: got unexpected error when trying to read object: -2"


2016-08-22 15:23:10.362631 7fa1eafa5700  1 ====== req done req=0x7fa21c043c00 http_status=404 ======
2016-08-22 15:23:10.362737 7fa1eafa5700  1 civetweb: 0x7fa21c01a600: 192.168.223.112 - - [22/Aug/2016:15:23:09 +0530] "GET /evgeny-test/456 HTTP/1.1" 404 0 - -


2016-08-22 15:23:43.370119 7fa1ea7a4700  1 ====== starting new request req=0x7fa220090c80 =====
2016-08-22 15:23:43.384741 7fa1ea7a4700  0 RGWObjManifest::operator++(): result: ofs=38259616 stripe_ofs=38259616 part_ofs=510880 rule->part_size=0
2016-08-22 15:23:43.384833 7fa1ea7a4700  0 RGWObjManifest::operator++(): result: ofs=42453920 stripe_ofs=42453920 part_ofs=510880 rule->part_size=0
2016-08-22 15:23:43.384917 7fa1ea7a4700  0 RGWObjManifest::operator++(): result: ofs=46648224 stripe_ofs=46648224 part_ofs=510880 rule->part_size=0
2016-08-22 15:23:43.385012 7fa1ea7a4700  0 RGWObjManifest::operator++(): result: ofs=50842528 stripe_ofs=50842528 part_ofs=510880 rule->part_size=0
2016-08-22 15:23:43.490133 7fa1ea7a4700  0 RGWObjManifest::operator++(): result: ofs=55036832 stripe_ofs=55036832 part_ofs=510880 rule->part_size=0
2016-08-22 15:23:43.522875 7fa1ea7a4700  0 RGWObjManifest::operator++(): result: ofs=55042267 stripe_ofs=55042267 part_ofs=510880 rule->part_size=0
2016-08-22 15:23:43.529380 7fa252ffd700  0 ERROR: got unexpected error when trying to read object: -2



2016-08-22 15:23:43.664445 7fa1ea7a4700  1 ====== req done req=0x7fa220090c80 http_status=404 ======
2016-08-22 15:23:43.664548 7fa1ea7a4700  1 civetweb: 0x7fa2200176e0: 192.168.223.112 - - [22/Aug/2016:15:23:43 +0530] "GET /evgeny-test/456 HTTP/1.1" 404 0 - -


4. s3cmd get is failing for filename 457:

# vumrao @ vikhyat in ~ [15:23:43] C:12
$ s3cmd get s3://evgeny-test/457
download: 's3://evgeny-test/457' -> './457'  [1 of 1]
download: 's3://evgeny-test/457' -> './457'  [1 of 1]
 38260212 of 55042267    69% in   30s  1207.09 kB/s  failed
WARNING: Retrying failed request: /457 (EOF from S3!)
WARNING: Waiting 3 sec...
download: 's3://evgeny-test/457' -> './457'  [1 of 1]
ERROR: S3 error: 404 (Not Found)

5. logs for file : 457

2016-08-22 15:25:19.113203 7fa1e9fa3700  1 ====== starting new request req=0x7fa218054390 =====
2016-08-22 15:25:19.113396 7fa1e9fa3700  1 ====== req done req=0x7fa218054390 http_status=400 ======
2016-08-22 15:25:19.113510 7fa1e9fa3700  1 civetweb: 0x7fa218017680: 192.168.223.112 - - [22/Aug/2016:15:25:19 +0530] "GET /evgeny-test/457 HTTP/1.1" 400 0 - -
2016-08-22 15:25:19.116793 7fa1e87a0700  1 ====== starting new request req=0x7fa234002410 =====
2016-08-22 15:25:19.135190 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=4705184 stripe_ofs=4705184 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.135318 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=8899488 stripe_ofs=8899488 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.135391 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=13093792 stripe_ofs=13093792 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.135481 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=17288096 stripe_ofs=17288096 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.224520 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=21482400 stripe_ofs=21482400 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.462679 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=25676704 stripe_ofs=25676704 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.517497 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=29871008 stripe_ofs=29871008 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.559498 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=34065312 stripe_ofs=34065312 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.605023 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=38259616 stripe_ofs=38259616 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.645311 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=42453920 stripe_ofs=42453920 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.670437 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=46648224 stripe_ofs=46648224 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.960881 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=50842528 stripe_ofs=50842528 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.999302 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=55036832 stripe_ofs=55036832 part_ofs=0 rule->part_size=0
2016-08-22 15:25:19.999545 7fa1e87a0700  0 RGWObjManifest::operator++(): result: ofs=55042267 stripe_ofs=55042267 part_ofs=0 rule->part_size=0
2016-08-22 15:25:20.017600 7fa252ffd700  0 ERROR: got unexpected error when trying to read object: -2

^^ Getting error "ERROR: got unexpected error when trying to read object: -2"


2016-08-22 15:25:20.066954 7fa1e87a0700  1 ====== req done req=0x7fa234002410 http_status=404 ======
2016-08-22 15:25:20.067045 7fa1e87a0700  1 civetweb: 0x7fa23401aa40: 192.168.223.112 - - [22/Aug/2016:15:25:19 +0530] "GET /evgeny-test/457 HTTP/1.1" 404 0 - -


2016-08-22 15:25:53.341747 7fa1eb7a6700  1 ====== starting new request req=0x7fa22c000ba0 =====
2016-08-22 15:25:53.357689 7fa1eb7a6700  0 RGWObjManifest::operator++(): result: ofs=42453920 stripe_ofs=42453920 part_ofs=510880 rule->part_size=0
2016-08-22 15:25:53.357744 7fa1eb7a6700  0 RGWObjManifest::operator++(): result: ofs=46648224 stripe_ofs=46648224 part_ofs=510880 rule->part_size=0
2016-08-22 15:25:53.357802 7fa1eb7a6700  0 RGWObjManifest::operator++(): result: ofs=50842528 stripe_ofs=50842528 part_ofs=510880 rule->part_size=0
2016-08-22 15:25:53.357857 7fa1eb7a6700  0 RGWObjManifest::operator++(): result: ofs=55036832 stripe_ofs=55036832 part_ofs=510880 rule->part_size=0
2016-08-22 15:25:53.470335 7fa1eb7a6700  0 RGWObjManifest::operator++(): result: ofs=55042267 stripe_ofs=55042267 part_ofs=510880 rule->part_size=0
2016-08-22 15:25:53.481785 7fa252ffd700  0 ERROR: got unexpected error when trying to read object: -2
2016-08-22 15:25:53.675115 7fa1eb7a6700  1 ====== req done req=0x7fa22c000ba0 http_status=404 ======
2016-08-22 15:25:53.675200 7fa1eb7a6700  1 civetweb: 0x7fa22c01a600: 192.168.223.112 - - [22/Aug/2016:15:25:53 +0530] "GET /evgeny-test/457 HTTP/1.1" 404 0 - -
---------------------------------

Comment 2 Vikhyat Umrao 2016-08-22 13:51:47 UTC
http://tracker.ceph.com/issues/17040#note-3

- As per upstream tracker note-3

When osd_pool_erasure_code_stripe_width is set to default 4096, it is no longer reproducible.

- I have tested the same with osd_pool_erasure_code_stripe_width is set to default 4096 it is working fine.

So to me also it looks like issue is with object size including with osd_pool_erasure_code_stripe_width value as 16384.

Comment 4 Orit Wasserman 2016-08-22 15:38:01 UTC
It could be this upstream issue: http://tracker.ceph.com/issues/15745

hammer: https://github.com/ceph/ceph/pull/9407

Comment 6 Vikhyat Umrao 2016-08-23 10:17:23 UTC
(In reply to Orit Wasserman from comment #4)
> It could be this upstream issue: http://tracker.ceph.com/issues/15745
> 
> hammer: https://github.com/ceph/ceph/pull/9407

Thank you Orit. I have created test packages with 9407 patch and fix looks good. I am not able to reproduce.

Comment 15 Orit Wasserman 2016-09-22 16:39:19 UTC
Hi Bara,
The text is sufficient,
Orit

Comment 16 errata-xmlrpc 2016-09-29 13:00:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-1972.html


Note You need to log in before you can comment on or make changes to this bug.