Bug 983218
Summary: | libguestfs double free when kernel link fails during launch | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Attila Fazekas <afazekas> | ||||||||
Component: | libguestfs | Assignee: | Richard W.M. Jones <rjones> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 19 | CC: | mbooth, rjones, virt-maint | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | libguestfs-1.22.4-2.fc19 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 983690 983691 (view as bug list) | Environment: | |||||||||
Last Closed: | 2013-07-22 00:34:11 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 983690, 983691 | ||||||||||
Attachments: |
|
Description
Attila Fazekas
2013-07-10 18:01:50 UTC
Thanks -- I also saw two crashes in virt-manager (written in Python) related to libguestfs, only one of which has really been fixed. So this looks serious. Do you know if there's a way to collect a core dump? For enabling the core dump creation at boot time you should change two config files. -------------------------- /etc/security/limits.conf: -------------------------- #<domain> <type> <item> <value> # * soft core -1 ----------------------------- You may need to restart the services in order to take effect. ----------------- /etc/sysctl.conf: ----------------- fs.suid_dumpable=1 kernel.core_pattern=/tmp/core.%e.%p.%h.%t ----------------- You can change these setting in runtime by $ sysctl -w kernel.core_pattern=/tmp/core.%e.%p.%h.%t You should verify the core dump limit is not zero (The /etc/limits/* are processed when you are logging in). $ ulimit -c unlimited For testing is it really creates core dump you should try cause a segfault. test.c: int a[1]; int main() { a[65536] = 42; } -bash-4.2$ make test cc test.c -o test -bash-4.2$ ./test Segmentation fault (core dumped) You should have a new core dump file which named and placed according to the kernel.core_pattern. For generating a backtrace you can use the $ gdb /usr/bin/python --core=<mycore_file> ... (gdb) bt This is one thread's the backtrace. #0 0x0000003b7d435a19 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x0000003b7d437128 in __GI_abort () at abort.c:90 #2 0x0000003b7d475d47 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x3b7d57db88 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x0000003b7d47d0e8 in malloc_printerr (ptr=<optimized out>, str=0x3b7d57dc40 "double free or corruption (out)", action=3) at malloc.c:4916 #4 _int_free (av=0x3b7d7b9780 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3768 #5 0x00007f298c33b961 in launch_libvirt (g=0x3af3390, libvirt_uri=<optimized out>) at launch-libvirt.c:488 #6 0x00007f298c3357b0 in guestfs__launch (g=g@entry=0x3af3390) at launch.c:88 #7 0x00007f298c2daf6d in guestfs_launch (g=g@entry=0x3af3390) at actions-3.c:142 #8 0x00007f298c5ab969 in py_guestfs_launch (self=<optimized out>, args=<optimized out>) at guestfs-py.c:2352 #9 0x0000003b7f4ddcee in call_function (oparg=<optimized out>, pp_stack=0x7f296bffe230) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4098 #10 PyEval_EvalFrameEx (f=f@entry= Frame 0x3d944e0, for file /usr/lib/python2.7/site-packages/guestfs.py, line 325, in launch (self=<GuestFS(_o=<PyCapsule at remote 0x3fbae70>, _python_return_dict=False) at remote 0x393eb90>), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2740 #11 0x0000003b7f4dec7d in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x3fc0368, argcount=1, kws=kws@entry=0x7f29a70d6068, kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, closure=0x0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3330 #12 0x0000003b7f46dd7d in function_call (func=<function at remote 0x3e5cc08>, arg=(<GuestFS(_o=<PyCapsule at remote 0x3fbae70>, _python_return_dict=False) at remote 0x393eb90>,), kw={}) at /usr/src/debug/Python-2.7.5/Objects/funcobject.c:526 #13 0x0000003b7f449dd3 in PyObject_Call (func=func@entry=<function at remote 0x3e5cc08>, arg=arg@entry=(<GuestFS(_o=<PyCapsule at remote 0x3fbae70>, _python_return_dict=False) at remote 0x393eb90>,), kw=kw@entry={}) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2529 #14 0x0000003b7f4d9f1d in ext_do_call (nk=<optimized out>, na=<optimized out>, flags=<optimized out>, pp_stack=0x7f296bffe4f0, func=<function at remote 0x3e5cc08>) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4411 #15 PyEval_EvalFrameEx ( f=f@entry=Frame 0x7f2964000ef0, for file /usr/lib/python2.7/site-packages/eventlet/tpool.py, line 76, in tworker (reqq=<Queue(unfinished_tasks=42, queue=<collections.deque at remote 0x374c7c0>, maxsize=-1, all_tasks_done=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x3751590>, acquire=<built-in method acquire of thread.lock object at remote 0x3751590>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x3751590>) at remote 0x375dbd0>, mutex=<thread.lock at remote 0x3751590>, not_full=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x3751590>, acquire=<built-in method acquire of thread.lock object at remote 0x3751590>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x3751590>) at remote 0x375db90>, not_empty=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x3751590>, acquire=<built-in method acquire of thread.lock object at remote 0x3751590>, _Conditio...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2779 #16 0x0000003b7f4dec7d in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x375db68, argcount=1, kws=kws@entry=0x7f29a70d6068, kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, closure=0x0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3330 #17 0x0000003b7f46dd7d in function_call (func=<function at remote 0x31967d0>, arg=(<Queue(unfinished_tasks=42, queue=<collections.deque at remote 0x374c7c0>, maxsize=-1, all_tasks_done=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x3751590>, acquire=<built-in method acquire of thread.lock object at remote 0x3751590>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x3751590>) at remote 0x375dbd0>, mutex=<thread.lock at remote 0x3751590>, not_full=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x3751590>, acquire=<built-in method acquire of thread.lock object at remote 0x3751590>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x3751590>) at remote 0x375db90>, not_empty=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x3751590>, acquire=<built-in method acquire of thread.lock object at remote 0x3751590>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x3751590>) at remote 0x375d...(truncated), kw={}) at /usr/src/debug/Python-2.7.5/Objects/funcobject.c:526 #18 0x0000003b7f449dd3 in PyObject_Call (func=func@entry=<function at remote 0x31967d0>, ---Type <return> to continue, or q <return> to quit---bt arg=arg@entry=(<Queue(unfinished_tasks=42, queue=<collections.deque at remote 0x374c7c0>, maxsize=-1, all_tasks_done=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x3751590>, acquire=<built-in method acquire of thread.lock object at remote 0x3751590>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x3751590>) at remote 0x375dbd0>, mutex=<thread.lock at remote 0x3751590>, not_full=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x3751590>, acquire=<built-in method acquire of thread.lock object at remote 0x3751590>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x3751590>) at remote 0x375db90>, not_empty=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x3751590>, acquire=<built-in method acquire of thread.lock object at remote 0x3751590>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x3751590>) at remote 0x375d...(truncated), kw=kw@entry={}) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2529 #19 0x0000003b7f4d9f1d in ext_do_call (nk=<optimized out>, na=<optimized out>, flags=<optimized out>, pp_stack=0x7f296bffe7b0, func=<function at remote 0x31967d0>) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4411 #20 PyEval_EvalFrameEx ( f=f@entry=Frame 0x7f2964000d20, for file /usr/lib64/python2.7/threading.py, line 764, in run (self=<Thread(_Thread__ident=139815882323712, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515f0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515f0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515f0>) at remote 0x375dd50>, _Thread__name='tpool_thread_9', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515d0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515d0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515d0>) at remote 0x375dc90>) at remote 0x375dc50>, _Thread__stderr=<file at remote 0x7f29a70f11e0>, _Thread__target=<function at remote 0x31967d0>, _Thread__kwargs={}, _Verbose__verbose=False, ...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2779 #21 0x0000003b7f4dd80c in fast_function (nk=<optimized out>, na=1, n=1, pp_stack=0x7f296bffe910, func=<function at remote 0x17356e0>) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4184 #22 call_function (oparg=<optimized out>, pp_stack=0x7f296bffe910) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4119 #23 PyEval_EvalFrameEx ( f=f@entry=Frame 0x7f2964000ae0, for file /usr/lib64/python2.7/threading.py, line 811, in __bootstrap_inner (self=<Thread(_Thread__ident=139815882323712, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515f0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515f0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515f0>) at remote 0x375dd50>, _Thread__name='tpool_thread_9', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515d0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515d0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515d0>) at remote 0x375dc90>) at remote 0x375dc50>, _Thread__stderr=<file at remote 0x7f29a70f11e0>, _Thread__target=<function at remote 0x31967d0>, _Thread__kwargs={}, _Verbose__v...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2740 #24 0x0000003b7f4dd80c in fast_function (nk=<optimized out>, na=1, n=1, pp_stack=0x7f296bffea70, func=<function at remote 0x1735848>) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4184 #25 call_function (oparg=<optimized out>, pp_stack=0x7f296bffea70) at /usr/src/debug/Python-2.7.5/Python/ceval.c:4119 #26 PyEval_EvalFrameEx ( f=f@entry=Frame 0x7f2964000910, for file /usr/lib64/python2.7/threading.py, line 784, in __bootstrap (self=<Thread(_Thread__ident=139815882323712, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515f0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515f0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515f0>) at remote 0x375dd50>, _Thread__name='tpool_thread_9', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515d0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515d0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515d0>) at remote 0x375dc90>) at remote 0x375dc50>, _Thread__stderr=<file at remote 0x7f29a70f11e0>, _Thread__target=<function at remote 0x31967d0>, _Thread__kwargs={}, _Verbose__verbose...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:2740 #27 0x0000003b7f4dec7d in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x375da68, argcount=1, kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, closure=0x0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3330 ---Type <return> to continue, or q <return> to quit--- #28 0x0000003b7f46dca0 in function_call (func=<function at remote 0x1735758>, arg=(<Thread(_Thread__ident=139815882323712, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515f0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515f0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515f0>) at remote 0x375dd50>, _Thread__name='tpool_thread_9', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515d0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515d0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515d0>) at remote 0x375dc90>) at remote 0x375dc50>, _Thread__stderr=<file at remote 0x7f29a70f11e0>, _Thread__target=<function at remote 0x31967d0>, _Thread__kwargs={}, _Verbose__verbose=False, _Thread__args=(<Queue(unfinished_tasks=42, queue=<collections.deque at remote 0x374c7c0>...(truncated), kw=0x0) at /usr/src/debug/Python-2.7.5/Objects/funcobject.c:526 #29 0x0000003b7f449dd3 in PyObject_Call (func=func@entry=<function at remote 0x1735758>, arg=arg@entry=(<Thread(_Thread__ident=139815882323712, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515f0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515f0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515f0>) at remote 0x375dd50>, _Thread__name='tpool_thread_9', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515d0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515d0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515d0>) at remote 0x375dc90>) at remote 0x375dc50>, _Thread__stderr=<file at remote 0x7f29a70f11e0>, _Thread__target=<function at remote 0x31967d0>, _Thread__kwargs={}, _Verbose__verbose=False, _Thread__args=(<Queue(unfinished_tasks=42, queue=<collections.deque at remote 0x374c7c0>...(truncated), kw=kw@entry=0x0) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2529 #30 0x0000003b7f458555 in instancemethod_call (func=<function at remote 0x1735758>, arg=(<Thread(_Thread__ident=139815882323712, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515f0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515f0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515f0>) at remote 0x375dd50>, _Thread__name='tpool_thread_9', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x37515d0>, acquire=<built-in method acquire of thread.lock object at remote 0x37515d0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x37515d0>) at remote 0x375dc90>) at remote 0x375dc50>, _Thread__stderr=<file at remote 0x7f29a70f11e0>, _Thread__target=<function at remote 0x31967d0>, _Thread__kwargs={}, _Verbose__verbose=False, _Thread__args=(<Queue(unfinished_tasks=42, queue=<collections.deque at remote 0x374c7c0>...(truncated), kw=0x0) at /usr/src/debug/Python-2.7.5/Objects/classobject.c:2602 #31 0x0000003b7f449dd3 in PyObject_Call (func=func@entry=<instancemethod at remote 0x3743e10>, arg=arg@entry=(), kw=<optimized out>) at /usr/src/debug/Python-2.7.5/Objects/abstract.c:2529 #32 0x0000003b7f4d8af7 in PyEval_CallObjectWithKeywords (func=<instancemethod at remote 0x3743e10>, arg=(), kw=<optimized out>) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3967 #33 0x0000003b7f50c282 in t_bootstrap (boot_raw=0x37da040) at /usr/src/debug/Python-2.7.5/Modules/threadmodule.c:614 #34 0x0000003b7dc07c53 in start_thread (arg=0x7f296bfff700) at pthread_create.c:308 #35 0x0000003b7d4f513d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Created attachment 772092 [details]
nova-compute core dump file compressed with xz
(gdb) frame 5 #5 0x00007f298c33b961 in launch_libvirt (g=0x3af3390, libvirt_uri=<optimized out>) at launch-libvirt.c:488 488 free (params.kernel); (gdb) print params.kernel $1 = 0x7f296400c2b0 "\020\027" (gdb) print params $2 = {kernel = 0x7f296400c2b0 "\020\027", initrd = 0x7f296400c2f0 "\020\027", appliance_overlay = 0x0, appliance_dev = '\000' <repeats 63 times>, appliance_index = 0, guestfsd_path = '\000' <repeats 107 times>, console_path = '\000' <repeats 107 times>, enable_svirt = false, is_kvm = false, current_proc_is_root = false} (gdb) print g $3 = (guestfs_h *) 0x3af3390 (gdb) print *g $4 = {next = 0x0, state = CONFIG, verbose = false, trace = false, autosync = true, direct_mode = false, recovery_proc = true, enable_network = false, selinux = false, pgroup = false, close_on_exit = true, smp = 1, memsize = 500, path = 0x3aaf9b0 "/usr/lib64/guestfs", qemu = 0x38a0810 "/usr/bin/qemu-kvm", append = 0x0, qemu_params = 0x0, program = 0x385f0e0 "python", drives = 0x7f2964001770, nr_drives = 1, backend = BACKEND_LIBVIRT, backend_arg = 0x7f29640016c0 "qemu:///system", backend_ops = 0x7f298c58b320 <backend_ops_libvirt>, last_error = 0x7f296400c370 "link: /var/tmp/.guestfs-1002/kernel /var/tmp/.guestfs-1002/kernel.29013: Operation not permitted", last_errnum = 1, tmpdir = 0x7f2964001a60 "/tmp/libguestfsYL1aIv", env_tmpdir = 0x0, int_tmpdir = 0x0, int_cachedir = 0x0, error_cb = 0x0, error_cb_data = 0x0, error_cb_stack = 0x0, abort_cb = 0x3b7d436fe0 <__GI_abort>, events = 0x0, nr_events = 0, fses = 0x0, nr_fses = 0, pda = 0x0, pda_next = 0x0, user_cancel = 0, launch_t = {tv_sec = 1373529780, tv_usec = 533233}, test_fp = 0x0, unique = 0, qemu_img_info_parser = 0, conn = 0x0, msg_next_serial = 1192960, localmountpoint = 0x0, fuse = 0x0, ml_dir_cache_timeout = 0, lsc_ht = 0x0, xac_ht = 0x0, rlc_ht = 0x0, ml_read_only = 0, ml_debug_calls = 0, nr_supported_credentials = 0, supported_credentials = {0, 0, 0, 0, 0, 0, 0, 0, 0}, saved_libvirt_uri = 0x0, nr_requested_credentials = 0, requested_credentials = 0x0, direct = {pid = 0, recoverypid = 0, qemu_help = 0x0, qemu_version = 0x0, qemu_devices = 0x0, qemu_version_major = 0, qemu_version_minor = 0, cmdline = 0x0, cmdline_size = 0, virtio_scsi = 0}, virt = {conn = 0x0, dom = 0x0}, virt_selinux_label = 0x0, virt_selinux_imagelabel = 0x0, virt_selinux_norelabel_disks = false} I can see how a double free could occur. The key observation is the last error message which was captured in the guestfs handle: (gdb) print g->last_error $5 = 0x7f296400c370 "link: /var/tmp/.guestfs-1002/kernel /var/tmp/.guestfs-1002/kernel.29013: Operation not permitted" This errors occurs in hard_link_to_cached_appliance: https://github.com/libguestfs/libguestfs/blob/af1c53d104180415a8584c48f19fd4ea7df224f5/src/appliance.c#L607 In hard_link_to_cached_appliance, along this error path params.kernel would be allocated and then free (but not set back to NULL). Two levels higher up in the stack, the libvirt backend returns from guestfs___build_appliance with an error: https://github.com/libguestfs/libguestfs/blob/823628d41f898982979ab7dd53656377bef8ce1d/src/launch-libvirt.c#L232 and jumps to cleanup: which frees params.kernel again. A double-free. The fix may be to set the kernel pointer to NULL after freeing it the first time, but I need to systematically check this code. Created attachment 772313 [details] Reproducer script. Attached is a reproducer (although I think what it may be reproducing is a related but ever so slightly different bug in the same area of code). It requires a working 'sudo' command in order to run the following: sudo chattr +i /some/temporary/file Install perl-Sys-Guestfs, download the attached script, chmod +x the script, and just run it. If it *segfaults* => bug reproduced. If it prints "bug 983218 appears to be fixed" => bug fixed. If it does/prints anything else, result is INVALID. Please let me know if this happens. Affects: - libguestfs 1.20: Fedora 18, RHEL 6.5 - libguestfs 1.22: Fedora 19, RHEL 7.0 - libguestfs 1.23: Fedora Rawhide Therefore cloning this bug. Created attachment 772356 [details]
Updated reproducer script.
Updated reproducer script.
Upstream fix: https://github.com/libguestfs/libguestfs/commit/ae78381287771a781f939f26a414fc8cfdc05fd6 Note that although this should fix the crasher, it probably won't fix the underlying problem which is that nova is doing something dumb/odd with sudo and causing the cache directory to be accessed by multiple UIDs. libguestfs-1.20.9-3.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/libguestfs-1.20.9-3.fc18 libguestfs-1.22.4-2.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/libguestfs-1.22.4-2.fc19 Package libguestfs-1.20.9-3.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing libguestfs-1.20.9-3.fc18' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-12876/libguestfs-1.20.9-3.fc18 then log in and leave karma (feedback). The double free corruption is fixed. Thank you. It was very fast :) But, I do not really understand the happenings. Can you help me ? I would like to know, which part of the story is done by who and why. Any hint or guess could be helpful. I tried the below command which triggered the permission issue: $ nova boot MyServer --flavor 1 --image cirros-0.3.1-x86_64-uec --file=test.txt=/etc/passwd I see the following in the nova compute log: 2013-07-14 11:33:02.730 ERROR nova.compute.manager [req-4a4295a6-4576-45a3-9ded-c659ef258b54 admin admin] [instance: c6074884-c22f-4f85-bf3d-beb8e6afb6c9] Error: ['Traceback (most recent call last):\n', ' File "/opt/stack/new/nova/nova/compute/manager.py", line 995, in _build_instance\n set_access_ip=set_access_ip)\n', ' File "/opt/stack/new/nova/nova/compute/manager.py", line 1249, in _spawn\n LOG.exception(_(\'Instance failed to spawn\'), instance=instance)\n', ' File "/opt/stack/new/nova/nova/compute/manager.py", line 1245, in _spawn\n block_device_info)\n', ' File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1594, in spawn\n admin_pass=admin_password)\n', ' File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2001, in _create_image\n instance=instance)\n', ' File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1995, in _create_image\n mandatory=(\'files\',))\n', ' File "/opt/stack/new/nova/nova/virt/disk/api.py", line 294, in inject_data\n fs.setup()\n', ' File "/opt/stack/new/nova/nova/virt/disk/vfs/guestfs.py", line 114, in setup\n {\'imgfile\': self.imgfile, \'e\': e})\n', 'NovaException: Error mounting /opt/stack/data/nova/instances/c6074884-c22f-4f85-bf3d-beb8e6afb6c9/disk with libguestfs (link: /var/tmp/.guestfs-1000/kernel /var/tmp/.guestfs-1000/kernel.17636: Operation not permitted)\n'] $ ls -il /var/tmp/.guestfs-1000 total 854256 658310 -rwxr-xr-x. 1 afazekas libvirtd 64 Jul 14 12:24 checksum 658309 -rw-r--r--. 2 root root 1282048 Jul 14 11:21 initrd 658309 -rw-r--r--. 2 root root 1282048 Jul 14 11:21 initrd.17636 658308 -rw-r--r--. 1 root root 5058520 Jul 14 11:21 kernel 658311 -rw-r--r--. 2 afazekas qemu 4294967296 Jul 14 11:21 root 658311 -rw-r--r--. 2 afazekas qemu 4294967296 Jul 14 11:21 root.17636 $ ls -ild /var/tmp/.guestfs-1000 658305 drwxr-xr-x. 2 afazekas libvirtd 4096 Jul 14 12:49 /var/tmp/.guestfs-1000 All OpenStack component expected to run as not root user and all of them has minimal filtering on the sudo commands, but the chown is allowed. The live disk images owner is the service user (now it is 'afazekas', normally it is 'nova'). (Without the file injection arguments the libguestfs code path does not reached.) ls -li /opt/stack/data/nova/instances/3c45c00f-06a2-4174-9a6a-5ac4e252e1ff total 18680 407254 -rw-rw----. 1 afazekas qemu 23058 Jul 14 11:41 console.log 407267 -rw-r--r--. 1 afazekas qemu 10485760 Jul 14 11:42 disk 407261 -rw-rw-r--. 1 afazekas qemu 4955792 Jul 14 11:37 kernel 407269 -rw-rw-r--. 1 afazekas libvirtd 1571 Jul 14 11:37 libvirt.xml 407266 -rw-rw-r--. 1 afazekas qemu 3714968 Jul 14 11:37 ramdisk It is based on cirros-uec image and it has 3 parts. - initramfs image - kernel - An empty root filesystem I guess the permissions also could be set by the libvirtd. Do you think is it possible libvirtd was the process who initiated the permission change ? At the moment, I do not know why the libguestfs tries to create a hard link. I have a write permission on the directory and write permission on the real disk file (root). A soft link would be possible. Now, the openstack-nova-compute just tried to add a file before boot. Why the 'kernel' needs to be hard linked ? Is the libguestfs ever changes the initrd or the kernel images ? Looks like the hard link is succeeded with the initrd, at least it has a hard linked pair. After the permission issue the libguestfs-test-tool says: ************************************************************ * IMPORTANT NOTICE * * When reporting bugs, include the COMPLETE, UNEDITED * output below in your bug report. * ************************************************************ PATH=/usr/lib64/ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/afazekas/.local/bin:/home/afazekas/bin SELinux: Permissive library version: 1.22.4fedora=19,release=2.fc19,libvirt guestfs_get_append: (null) guestfs_get_backend: libvirt guestfs_get_autosync: 1 guestfs_get_cachedir: /var/tmp guestfs_get_direct: 0 guestfs_get_memsize: 500 guestfs_get_network: 0 guestfs_get_path: /usr/lib64/guestfs guestfs_get_pgroup: 0 guestfs_get_program: libguestfs-test-tool guestfs_get_qemu: /usr/bin/qemu-kvm guestfs_get_recovery_proc: 1 guestfs_get_selinux: 0 guestfs_get_smp: 1 guestfs_get_tmpdir: /tmp guestfs_get_trace: 0 guestfs_get_verbose: 1 host_cpu: x86_64 Launching appliance, timeout set to 600 seconds. libguestfs: launch: backend=libvirt libguestfs: launch: tmpdir=/tmp/libguestfsnbkYb6 libguestfs: launch: umask=0002 libguestfs: launch: euid=1000 libguestfs: libvirt version = 1000005 (1.0.5) libguestfs: [00001ms] connect to libvirt libguestfs: opening libvirt handle: URI = NULL, auth = virConnectAuthPtrDefault, flags = 0 libguestfs: successfully opened libvirt handle: conn = 0x7fe95e94fbc0 libguestfs: [02888ms] get libvirt capabilities libguestfs: [02900ms] parsing capabilities XML libguestfs: [02902ms] build appliance libguestfs: command: run: supermin-helper libguestfs: command: run: \ --verbose libguestfs: command: run: \ -f checksum libguestfs: command: run: \ /usr/lib64/guestfs/supermin.d libguestfs: command: run: \ x86_64 supermin helper [00000ms] whitelist = (not specified), host_cpu = x86_64, kernel = (null), initrd = (null), appliance = (null) supermin helper [00000ms] inputs[0] = /usr/lib64/guestfs/supermin.d checking modpath /lib/modules/3.9.5-301.fc19.x86_64 is a directory picked vmlinuz-3.9.5-301.fc19.x86_64 because modpath /lib/modules/3.9.5-301.fc19.x86_64 exists checking modpath /lib/modules/3.9.9-301.fc19.x86_64 is a directory picked vmlinuz-3.9.9-301.fc19.x86_64 because modpath /lib/modules/3.9.9-301.fc19.x86_64 exists supermin helper [00001ms] finished creating kernel supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/base.img supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/daemon.img supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/hostfiles supermin helper [00044ms] visiting /usr/lib64/guestfs/supermin.d/init.img supermin helper [00044ms] visiting /usr/lib64/guestfs/supermin.d/udev-rules.img supermin helper [00044ms] adding kernel modules supermin helper [00067ms] finished creating appliance libguestfs: checksum of existing appliance: f70642933db4d0be1ed3d2995f2028e98b9bd895dbbbf5494e0fb636d8694b52 libguestfs: error: link: /var/tmp/.guestfs-1000/kernel /var/tmp/.guestfs-1000/kernel.23445: Operation not permitted libguestfs-test-tool: failed to launch appliance libguestfs: closing guestfs handle 0x7fe95e94f550 (state 0) libguestfs: command: run: rm libguestfs: command: run: \ -rf /tmp/libguestfsnbkYb6 Another thing what is not clear to me: The nova-compute is a single process (thread group), so the thread group leader PID will be the same in his lifetime. The 'initrd','kernel','root' names does not seams to be unique, the '/var/tmp/.guestfs-1000' just contains my user ID. Is the currently used directory structure expected to be parallel safe ? The system is running in a VM and it is using the qemu softemu inside the VM, can I expect any difference on a physical machine or with kvm nested guest. ? This is another, separate issue (as I said in comment 9) so please open another bug about it. Thank you. I copied my comment here: https://bugzilla.redhat.com/show_bug.cgi?id=984409 libguestfs-1.20.9-3.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. libguestfs-1.22.4-2.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. |