Bug 1956963
| Summary: | fio is unusable with ioengine=libaio (does I/O, segfault, no output) | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Alexey Dobriyan <adobriyan> |
| Component: | fio | Assignee: | Eric Sandeen <esandeen> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 34 | CC: | esandeen, pportant, ykorman |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-09 20:05:48 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I wonder if libaio ioengine should be builtin given the importance of the async I/O. Unfortunately the ioengine selection is all or nothing upstream, at least for now.
Would you be able to check/test 3.26 from rawhide? I think this is probably fixed by this upstream commit:
commit 48ff7df9daea86c82a572b0a840bb8371b6b1a29
Author: Eric Sandeen <sandeen>
Date: Mon Jan 25 13:23:48 2021 -0600
fio: fix dlopen refcounting of dynamic engines
ioengine_load() will dlclose the dynamic library if it matches one
that we've already got open, but this defeats the built-in refcounting
done by dlopen/dlclose. As each thread exits, it calls free_ioengine(),
and this may do a final dlclose on a dynamic ioengine that is still
in use if we don't have the proper reference count.
Fix this by dropping the explicit dlclose of a "matching" dlopened
dynamic engine library, and let each dlclose decrement the refcount
on the engine library as is normal.
This also adds/modifies a couple of debug messages to help track this.
Signed-off-by: Eric Sandeen <sandeen>
Signed-off-by: Jens Axboe <axboe>
Sorry for not getting that pushed to F34. If you can't test it I'll just push it, it should resolve this issue. I guess I didn't realize that this was never fixed in F34.
I rebuilt fio-3.26-1.fc35.src.rpm and installed onto F34.
It does NOT work, segfaults in the same place
td->io_ops->dlhandle = NULL;
Hrm, thank you for testing. I'll dig into this, I guess it's yet another, different problem w/ the dlopen'd ioengines... I have trouble tracking what's going on with these dynamic engines.
I /think/
dlclose(td->io_ops->dlhandle);
td->io_ops->dlhandle = NULL;
segfaults because the dlclose actually removes the symbol at io_ops, and therefore we can no longer reference io_ops->dlhandle.
It seems that we should simply not try to set dlhandle to NULL after the dlclose.
Deleting "td->io_ops->dlhandle = NULL" line seems to work. Yup, thanks for testing it. I sent that upstream. fixed in 3.26-2.fc34 |
Description of problem: ioengine=libaio is broken with dlopen'ed fio-engine-* packages Version-Release number of selected component (if applicable): fio.x86_64 3.25-3.fc34 fio-engine-libaio.x86_64 3.25-3.fc34 How reproducible: 100% Steps to Reproduce: [global] filename=1.dd direct=1 thread ioengine=libaio iodepth=1 bs=4096 [j] rw=read size=64M Actual results: fio[1521]: segfault at 7f90314458a0 ip 0000558dfaccaa45 sp 00007f9030c2fd40 error 6 in fio[558dfacbb000+76000] Code: 00 48 c7 83 18 10 00 00 00 00 00 00 48 8b 78 20 48 85 ff 74 1d f6 05 51 81 16 00 04 75 27 e8 52 1f ff ff 48 8b 83 40 42 04 00 <48> c7 40 20 00 00 00 00 48 c7 83 40 42 04 00 00 00 00 00 5b c3 66 Additional info: 00000000000279d0 <free_ioengine@@Base>: ... 27a39: e8 52 1f ff ff call 19990 <dlclose@plt> 27a3e: 48 8b 83 40 42 04 00 mov rax,QWORD PTR [rbx+0x44240] 27a45: ***** 48 c7 40 20 00 00 00 mov QWORD PTR [rax+0x20],0x0 27a4c: 00 27a4d: 48 c7 83 40 42 04 00 mov QWORD PTR [rbx+0x44240],0x0 27a54: 00 00 00 00 27a58: 5b pop rbx 27a59: c3 ret RIP corresponds to bogus td->io_ops. void free_ioengine(struct thread_data *td) { dprint(FD_IO, "free ioengine %s\n", td->io_ops->name); if (td->eo && td->io_ops->options) { options_free(td->io_ops->options, td->eo); free(td->eo); td->eo = NULL; } if (td->io_ops->dlhandle) { dprint(FD_IO, "dlclose ioengine %s\n", td->io_ops->name); dlclose(td->io_ops->dlhandle); ======> td->io_ops->dlhandle = NULL; } td->io_ops = NULL; } ioengine=psync works because sync I/O is builtin.