Description of problem: ioengine=libaio is broken with dlopen'ed fio-engine-* packages Version-Release number of selected component (if applicable): fio.x86_64 3.25-3.fc34 fio-engine-libaio.x86_64 3.25-3.fc34 How reproducible: 100% Steps to Reproduce: [global] filename=1.dd direct=1 thread ioengine=libaio iodepth=1 bs=4096 [j] rw=read size=64M Actual results: fio[1521]: segfault at 7f90314458a0 ip 0000558dfaccaa45 sp 00007f9030c2fd40 error 6 in fio[558dfacbb000+76000] Code: 00 48 c7 83 18 10 00 00 00 00 00 00 48 8b 78 20 48 85 ff 74 1d f6 05 51 81 16 00 04 75 27 e8 52 1f ff ff 48 8b 83 40 42 04 00 <48> c7 40 20 00 00 00 00 48 c7 83 40 42 04 00 00 00 00 00 5b c3 66 Additional info: 00000000000279d0 <free_ioengine@@Base>: ... 27a39: e8 52 1f ff ff call 19990 <dlclose@plt> 27a3e: 48 8b 83 40 42 04 00 mov rax,QWORD PTR [rbx+0x44240] 27a45: ***** 48 c7 40 20 00 00 00 mov QWORD PTR [rax+0x20],0x0 27a4c: 00 27a4d: 48 c7 83 40 42 04 00 mov QWORD PTR [rbx+0x44240],0x0 27a54: 00 00 00 00 27a58: 5b pop rbx 27a59: c3 ret RIP corresponds to bogus td->io_ops. void free_ioengine(struct thread_data *td) { dprint(FD_IO, "free ioengine %s\n", td->io_ops->name); if (td->eo && td->io_ops->options) { options_free(td->io_ops->options, td->eo); free(td->eo); td->eo = NULL; } if (td->io_ops->dlhandle) { dprint(FD_IO, "dlclose ioengine %s\n", td->io_ops->name); dlclose(td->io_ops->dlhandle); ======> td->io_ops->dlhandle = NULL; } td->io_ops = NULL; } ioengine=psync works because sync I/O is builtin.
I wonder if libaio ioengine should be builtin given the importance of the async I/O.
Unfortunately the ioengine selection is all or nothing upstream, at least for now. Would you be able to check/test 3.26 from rawhide? I think this is probably fixed by this upstream commit: commit 48ff7df9daea86c82a572b0a840bb8371b6b1a29 Author: Eric Sandeen <sandeen> Date: Mon Jan 25 13:23:48 2021 -0600 fio: fix dlopen refcounting of dynamic engines ioengine_load() will dlclose the dynamic library if it matches one that we've already got open, but this defeats the built-in refcounting done by dlopen/dlclose. As each thread exits, it calls free_ioengine(), and this may do a final dlclose on a dynamic ioengine that is still in use if we don't have the proper reference count. Fix this by dropping the explicit dlclose of a "matching" dlopened dynamic engine library, and let each dlclose decrement the refcount on the engine library as is normal. This also adds/modifies a couple of debug messages to help track this. Signed-off-by: Eric Sandeen <sandeen> Signed-off-by: Jens Axboe <axboe> Sorry for not getting that pushed to F34. If you can't test it I'll just push it, it should resolve this issue. I guess I didn't realize that this was never fixed in F34.
I rebuilt fio-3.26-1.fc35.src.rpm and installed onto F34. It does NOT work, segfaults in the same place td->io_ops->dlhandle = NULL;
Hrm, thank you for testing. I'll dig into this, I guess it's yet another, different problem w/ the dlopen'd ioengines...
I have trouble tracking what's going on with these dynamic engines. I /think/ dlclose(td->io_ops->dlhandle); td->io_ops->dlhandle = NULL; segfaults because the dlclose actually removes the symbol at io_ops, and therefore we can no longer reference io_ops->dlhandle. It seems that we should simply not try to set dlhandle to NULL after the dlclose.
Deleting "td->io_ops->dlhandle = NULL" line seems to work.
Yup, thanks for testing it. I sent that upstream.
fixed in 3.26-2.fc34