Bug 1469137 - aio_read doesn't work when rados_osd_op_timeout is used.
aio_read doesn't work when rados_osd_op_timeout is used.
Status: NEW
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS (Show other bugs)
2.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: 2.5
Assigned To: Josh Durgin
ceph-qe-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-10 09:07 EDT by Mehdi ABAAKOUK
Modified: 2017-07-30 11:21 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
script to reproduce (1.57 KB, text/plain)
2017-07-10 09:07 EDT, Mehdi ABAAKOUK
no flags Details
The script to reproduce (1.57 KB, text/plain)
2017-07-10 09:27 EDT, Mehdi ABAAKOUK
no flags Details
The script to reproduce (1.66 KB, text/plain)
2017-07-10 09:41 EDT, Mehdi ABAAKOUK
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 20616 None None None 2017-07-13 09:48 EDT

  None (edit)
Description Mehdi ABAAKOUK 2017-07-10 09:07:37 EDT
Created attachment 1295819 [details]
script to reproduce

Description of problem:

In Gnocchi, we recently encounter some data corruption when "rados_osd_op_timeout" is set. After digging, we end up that aio_read() doesn't
return the expected data and doesn't return any error.

The issue on Gnocchi side: https://github.com/gnocchixyz/gnocchi/pull/190
This have been workarounded by doing read() instead of aio_read()


The bug have been discovered by RDO CI, Ceph version was 10.2.7, but I can reproduce it on many other version.

How reproducible:

I have attach a python script to reproduce it.

Actual results:

When "rados_osd_op_timeout" is ceph, aio_read() returned data are corrupted.

Expected results:

No corruption
Comment 2 Mehdi ABAAKOUK 2017-07-10 09:10:02 EDT
Actual output of the script:

 no timeout read(): 'my fancy blob' : True
 with timeout read(): 'my fancy blob' : True
 no timeout aio_read(): 'my fancy blob' : True
 with timeout aio_read(): 'no timeout ai' : False

The last line shows that aio_read doesn't return the expected blob.
Comment 3 Mehdi ABAAKOUK 2017-07-10 09:27 EDT
Created attachment 1295832 [details]
The script to reproduce
Comment 4 Mehdi ABAAKOUK 2017-07-10 09:41 EDT
Created attachment 1295834 [details]
The script to reproduce

I have added the result of rados_aio_get_return_value()
Comment 5 Josh Durgin 2017-07-18 20:28:52 EDT
Retargetting since this only affects jewel + earlier. I'm not sure exactly what commit fixed it.

Note You need to log in before you can comment on or make changes to this bug.