Bug 1469137 - aio_read doesn't work when rados_osd_op_timeout is used.
Summary: aio_read doesn't work when rados_osd_op_timeout is used.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 2.4
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: 2.*
Assignee: Josh Durgin
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-10 13:07 UTC by Mehdi ABAAKOUK
Modified: 2019-02-09 01:47 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-09 01:47:38 UTC
Embargoed:


Attachments (Terms of Use)
script to reproduce (1.57 KB, text/plain)
2017-07-10 13:07 UTC, Mehdi ABAAKOUK
no flags Details
The script to reproduce (1.57 KB, text/plain)
2017-07-10 13:27 UTC, Mehdi ABAAKOUK
no flags Details
The script to reproduce (1.66 KB, text/plain)
2017-07-10 13:41 UTC, Mehdi ABAAKOUK
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 20616 0 None None None 2017-07-13 13:48:20 UTC

Description Mehdi ABAAKOUK 2017-07-10 13:07:37 UTC
Created attachment 1295819 [details]
script to reproduce

Description of problem:

In Gnocchi, we recently encounter some data corruption when "rados_osd_op_timeout" is set. After digging, we end up that aio_read() doesn't
return the expected data and doesn't return any error.

The issue on Gnocchi side: https://github.com/gnocchixyz/gnocchi/pull/190
This have been workarounded by doing read() instead of aio_read()


The bug have been discovered by RDO CI, Ceph version was 10.2.7, but I can reproduce it on many other version.

How reproducible:

I have attach a python script to reproduce it.

Actual results:

When "rados_osd_op_timeout" is ceph, aio_read() returned data are corrupted.

Expected results:

No corruption

Comment 2 Mehdi ABAAKOUK 2017-07-10 13:10:02 UTC
Actual output of the script:

 no timeout read(): 'my fancy blob' : True
 with timeout read(): 'my fancy blob' : True
 no timeout aio_read(): 'my fancy blob' : True
 with timeout aio_read(): 'no timeout ai' : False

The last line shows that aio_read doesn't return the expected blob.

Comment 3 Mehdi ABAAKOUK 2017-07-10 13:27:08 UTC
Created attachment 1295832 [details]
The script to reproduce

Comment 4 Mehdi ABAAKOUK 2017-07-10 13:41:25 UTC
Created attachment 1295834 [details]
The script to reproduce

I have added the result of rados_aio_get_return_value()

Comment 5 Josh Durgin 2017-07-19 00:28:52 UTC
Retargetting since this only affects jewel + earlier. I'm not sure exactly what commit fixed it.

Comment 6 Josh Durgin 2019-02-09 01:47:38 UTC
Fixed in luminous+later, not severe enough to warrant patching jewel.


Note You need to log in before you can comment on or make changes to this bug.