Created attachment 1156064 [details] Test code to demonstrate segfault When using lvm2app and at the same time we use lvmetad and we don't have permission to access lvmetad.socket, the call to lvm2app segfaults (lvm_vg_open here): # ./test vg /run/lvm/lvmetad.socket: connect failed: Permission denied WARNING: Failed to connect to lvmetad. Falling back to device scanning. Segmentation fault (core dumped) (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00007ffff7acf68d in _lock_vol (cmd=0x602030, resource=0x7fffffffdfe0 "vg", flags=49, lv_op=LV_NOOP, lv=0x0) at locking/locking.c:275 #2 0x00007ffff7acf99f in lock_vol (cmd=0x602030, vol=0x7fffffffe606 "vg", flags=49, lv=0x0) at locking/locking.c:355 #3 0x00007ffff7b03063 in _vg_lock_and_read (cmd=0x602030, vg_name=0x7fffffffe606 "vg", vgid=0x0, lock_flags=33, status_flags=0, read_flags=0, lockd_state=0) at metadata/metadata.c:5733 #4 0x00007ffff7b0351b in vg_read (cmd=0x602030, vg_name=0x7fffffffe606 "vg", vgid=0x0, read_flags=0, lockd_state=0) at metadata/metadata.c:5854 #5 0x00007ffff7a8e57c in _lvm_vg_open (libh=0x602030, vgname=0x7fffffffe606 "vg", mode=0x4009e7 "r", flags=0) at lvm_vg.c:221 #6 0x00007ffff7a8e5fa in lvm_vg_open (libh=0x602030, vgname=0x7fffffffe606 "vg", mode=0x4009e7 "r", flags=0) at lvm_vg.c:238 #7 0x00000000004008d0 in main (argc=2, argv=0x7fffffffe358) at test.c:22 Bisect gives me this upstream commit which is the culprit (from lvm2 v2.02.151): 5e9e43074a6c5e251ee44768421879b03ad2e530 ("lvmetad: rework command connection setup and checking"). The backtrace above is with this commit as HEAD. I'm also attaching the lvm2app test code which I compiled with: gcc -g -O0 -llvm2app test.c -o test Also, the error message was not issued before the change: /run/lvm/lvmetad.socket: connect failed: Permission denied WARNING: Failed to connect to lvmetad. Falling back to device scanning. If we're not running as root or with enough rights, we should probably detect this earlier, if possible.
I ran the test code as non-root user, that's why I didn't have access to lvmetad.socket. Anne Mulhern reported this first, so I'm adding her to the CC list - she hit the segfault in different part of the code, but I think the reason would be the same, citing from her email (where yum is using lvm2app through python binding somewhere): When I run "yum info yum" not as root, I get an lvm error message: lvmetad_socket_present failed: Permission denied WARNING: Failed to connect to lvmetad. Falling back to device scanning. and then a segfault, and, when I look at the backtrace, it looks like this: Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () Missing separate debuginfos, use: debuginfo-install dbus-libs-1.6.12-13.el7.x86_64 dbus-python-1.1.1-9.el7.x86_64 libnl-1.1.4-3.el7.x86_64 m2crypto-0.21.1-17.el7.x86_64 python-ethtool-0.8-5.el7.x86_64 python-simplejson-3.3.3-1.el7.x86_64 (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00007f7f4aa97d95 in lvm_quit () from /lib64/liblvm2app.so.2.2 #2 0x00007f7f4adc4185 in _liblvm_cleanup () from /usr/lib64/python2.7/site-packages/lvm.so #3 0x00007f7f5eb0441e in call_ll_exitfuncs () at /usr/src/debug/Python-2.7.5/Python/pythonrun.c:1772 #4 Py_Finalize () at /usr/src/debug/Python-2.7.5/Python/pythonrun.c:563 #5 0x00007f7f5eb03c78 in Py_Exit (sts=sts@entry=0) at /usr/src/debug/Python-2.7.5/Python/pythonrun.c:1781 #6 0x00007f7f5eb03db7 in handle_system_exit () at /usr/src/debug/Python-2.7.5/Python/pythonrun.c:1155 #7 0x00007f7f5eb0407d in handle_system_exit () at /usr/src/debug/Python-2.7.5/Python/pythonrun.c:1177 #8 PyErr_PrintEx (set_sys_last_vars=set_sys_last_vars@entry=1) at /usr/src/debug/Python-2.7.5/Python/pythonrun.c:1165 #9 0x00007f7f5eb0427a in PyErr_Print () at /usr/src/debug/Python-2.7.5/Python/pythonrun.c:1068 #10 0x00007f7f5eb04c9e in PyRun_SimpleFileExFlags (fp=<optimized out>, fp@entry=0x23e8900, filename=filename@entry=0x7ffe52ee363c "/usr/bin/yum", closeit=closeit@entry=1, flags=flags@entry=0x7ffe52ee1370) at /usr/src/debug/Python-2.7.5/Python/pythonrun.c:956 #11 0x00007f7f5eb05093 in PyRun_AnyFileExFlags (fp=fp@entry=0x23e8900, filename=filename@entry=0x7ffe52ee363c "/usr/bin/yum", ---Type <return> to continue, or q <return> to quit--- closeit=closeit@entry=1, flags=flags@entry=0x7ffe52ee1370) at /usr/src/debug/Python-2.7.5/Python/pythonrun.c:756 #12 0x00007f7f5eb15caf in Py_Main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/Python-2.7.5/Modules/main.c:640 #13 0x00007f7f5dd42b15 in __libc_start_main (main=0x4006f0 <main>, argc=4, ubp_av=0x7ffe52ee1538, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe52ee1528) at libc-start.c:274
There are a few issues. - The actual segfault is because init_locking() is not being called in lvm_init() in liblvm2app. This is because lvm_init() is returning early here: if (stored_errno()) return (lvm_t) cmd; Apparently the failures in create_toolcontext() which are expected (due to non-root usage), are causing errno to be set, which is causing lvm_init() to return an incompletely initialized cmd struct to the caller, which segfaults when it tries to use it. The error conditions from create_toolcontext() need to be properly checked (not looking at errno!), and if create_toolcontext does not return a usable cmd struct, lvm_init() must return NULL. - liblvm2app lvm_init() connects to lvmetad since commit 5e9e43074a6. I've written a patch to change this by moving the lvmetad_connect() out of the lvm_init() code path and into a second init call that's made when liblvm2app calls actually need it. (first patch on temp branch dev-dct-lvmetad-init-1) - the python code calls lvm_init() simply when the lvm python module is loaded. It means that things that include the lvm python module but generally never use lvm, are running liblvm2app:lvm_init() which is quite a bit of lvm code. I've created a patch (second on temp branch dev-dct-lvmetad-init-1) which moves python's liblvm2app:lvm_init() call to the point where lvm is actually used. I've no idea if this is correct or works, but it's the general idea we want so that we avoid running lvm code in python programs which are never really going to use it (e.g. yum). - Non-root usage. There are issues even with normal lvm commands being run as non-root. I opened bug 1335294 about this.
Fix for the first issue: https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=87d9406725b23e6c01e55014606ff047d7375951 I'm looking for review/testing of the python change. The liblvm2app lvm_init patch works, but it could be skipped if the python code can be changed.
rhel7 was the end of the line for lvmetad and lvm2app (this was perhaps partly fixed four years ago)