Bug 1469137

Summary: aio_read doesn't work when rados_osd_op_timeout is used.
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Mehdi ABAAKOUK <mabaakou>
Component: RADOSAssignee: Josh Durgin <jdurgin>
Status: CLOSED CURRENTRELEASE QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 2.4CC: bhubbard, ceph-eng-bugs, dzafman, jdanjou, kchai, mhackett
Target Milestone: rc   
Target Release: 2.*   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-09 01:47:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
script to reproduce
none
The script to reproduce
none
The script to reproduce none

Description Mehdi ABAAKOUK 2017-07-10 13:07:37 UTC
Created attachment 1295819 [details]
script to reproduce

Description of problem:

In Gnocchi, we recently encounter some data corruption when "rados_osd_op_timeout" is set. After digging, we end up that aio_read() doesn't
return the expected data and doesn't return any error.

The issue on Gnocchi side: https://github.com/gnocchixyz/gnocchi/pull/190
This have been workarounded by doing read() instead of aio_read()


The bug have been discovered by RDO CI, Ceph version was 10.2.7, but I can reproduce it on many other version.

How reproducible:

I have attach a python script to reproduce it.

Actual results:

When "rados_osd_op_timeout" is ceph, aio_read() returned data are corrupted.

Expected results:

No corruption

Comment 2 Mehdi ABAAKOUK 2017-07-10 13:10:02 UTC
Actual output of the script:

 no timeout read(): 'my fancy blob' : True
 with timeout read(): 'my fancy blob' : True
 no timeout aio_read(): 'my fancy blob' : True
 with timeout aio_read(): 'no timeout ai' : False

The last line shows that aio_read doesn't return the expected blob.

Comment 3 Mehdi ABAAKOUK 2017-07-10 13:27:08 UTC
Created attachment 1295832 [details]
The script to reproduce

Comment 4 Mehdi ABAAKOUK 2017-07-10 13:41:25 UTC
Created attachment 1295834 [details]
The script to reproduce

I have added the result of rados_aio_get_return_value()

Comment 5 Josh Durgin 2017-07-19 00:28:52 UTC
Retargetting since this only affects jewel + earlier. I'm not sure exactly what commit fixed it.

Comment 6 Josh Durgin 2019-02-09 01:47:38 UTC
Fixed in luminous+later, not severe enough to warrant patching jewel.