Bug 871058
Summary: | lvmetad crashes when making many snaps of thinp origin | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Nenad Peric <nperic> | ||||||
Component: | lvm2 | Assignee: | Petr Rockai <prockai> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 6.4 | CC: | agk, cmarthal, dwysocha, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, thornber, tlavigne, zkabelac | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | lvm2-2.02.98-8.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: |
A race condition in lvmetad could have caused LVM commands to fail intermittently, failing to find a VG that was being updated at the same time by another command. The race condition has been fixed.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-02-21 08:14:50 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 902691 | ||||||||
Attachments: |
|
Description
Nenad Peric
2012-10-29 14:12:02 UTC
Any way I look at this, it appears to be an out of memory problem, in which case a lvmetad crash would be more or less the right thing to do (even though it might be a good thing to do this in a more controlled manner). Could you please check, if you can reproduce, how much memory is available at the time of the crash? Also, a full dump of lvmetad threads ("thread apply all bt full" in gdb) and maybe an lvmdump after restarting lvmetad would be useful. How many snapshots would be a good thing to know, too. Thanks! Petr The system memory was never used more than 750MiB. At the time lvmetad died it was at 680MiB system memory used (of 4GiB) Will test it now with debugger and attach what I manage to get ,.. The number where it crashes is around 300. (first time it crashed at 313, second at 301) Created attachment 664927 [details]
lvmdump after restart of lvmetad
It crashed on 159 snap, after the crash here is what it looked like: two restarts of dead lvmetad needed. (09:02:17) [root@r6-node01:~]$ vgs WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning. VG #PV #LV #SN Attr VSize VFree VolGroup 1 2 0 wz--n- 9.51g 0 snapper_thinp 2 166 0 wz--n- 19.01g 18.00g (09:09:07) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad status lvmetad dead but pid file exists (09:09:34) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad restart Signaling LVM metadata daemon to exit: [ OK ] Starting LVM metadata daemon: [ OK ] (09:09:38) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad status lvmetad dead but subsys locked (09:09:40) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad restart Signaling LVM metadata daemon to exit: [ OK ] Starting LVM metadata daemon: [ OK ] (09:09:44) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad status lvmetad (pid 30096) is running... --- created lvmdump (attached) ---- Created attachment 664935 [details]
lvmetad running under gdb
Attaching a debug log for lvmetad from gdb with
thread apply all bt full
as well.
Tested with lvm2-2.02.98-6.el6.x86_64 The memory usage is slightly higher and the addition of snapshots after 100 is slower, but the lvmetad crashes before the test ends (now at around 400th snapshot onwards). a very partial gdb output: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffeffff700 (LWP 22971)] dm_escaped_len (str=0x0) at libdm-string.c:224 224 _count_chars(str, &len, &count, '\"', '\\'); (gdb) bt #0 dm_escaped_len (str=0x0) at libdm-string.c:224 #1 0x00007ffff77b1a71 in _write_value (outline=0x7fffefffec20, v=0x7fffe40052a0) at libdm-config.c:253 #2 0x00007ffff77b1efb in _write_config (n=0x7fffe4005270, only_one=0, outline=0x7fffefffec20, level=0) at libdm-config.c:323 #3 0x00007ffff77b210a in _write_node (cn=0x7fffe4005238, only_one=0, putline=0x4071f0 <buffer_line>, baton=0x7fffefffedb0) at libdm-config.c:342 #4 0x0000000000405b87 in client_thread (baton=0xea8950) at daemon-server.c:395 #5 0x00007ffff757b851 in start_thread (arg=0x7fffeffff700) at pthread_create.c:301 #6 0x00007ffff72c990d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) bt full #0 dm_escaped_len (str=0x0) at libdm-string.c:224 len = 1 count = 0 #1 0x00007ffff77b1a71 in _write_value (outline=0x7fffefffec20, v=0x7fffe40052a0) at libdm-config.c:253 buf = <value optimized out> #2 0x00007ffff77b1efb in _write_config (n=0x7fffe4005270, only_one=0, outline=0x7fffefffec20, level=0) at libdm-config.c:323 v = 0x7fffe40052a0 space = "\000R\000\344\377\177\000\000H\000\000\000\000\000\000\000\000\004\000\000\000\000\000\000{7{\367\377\177\000\000", <incomplete sequence \360> i = <value optimized out> #3 0x00007ffff77b210a in _write_node (cn=0x7fffe4005238, only_one=0, putline=0x4071f0 <buffer_line>, baton=0x7fffefffedb0) at libdm-config.c:342 outline = {mem = 0x7fffe40094b0, putline = 0x4071f0 <buffer_line>, putline_baton = 0x7fffefffedb0} #4 0x0000000000405b87 in client_thread (baton=0xea8950) at daemon-server.c:395 b = 0xea8950 req = {cft = 0x7fffe40008e0, buffer = {allocated = 1056, used = 85, mem = 0x7fffe4004af0 "request=\"vg_lookup\"\nuuid =\"EqD8Yd-lFCQ-Ay22-jduQ-vPiB-zXnz-fjqRSM\"\ntoken =\"filter:0\"\n"}} res = {error = 0, cft = 0x7fffe4005210, buffer = {allocated = 28, used = 14, mem = 0x7fffe40092d0 "response=\"OK\"\n"}} #5 0x00007ffff757b851 in start_thread (arg=0x7fffeffff700) at pthread_create.c:301 __res = <value optimized out> pd = 0x7fffeffff700 now = <value optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737219917568, -2811678909559803517, 140737343148544, 140737219918272, 0, 3, 2811714094447128963, 2811660426340397443}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <value optimized out> pagesize_m1 = <value optimized out> sp = <value optimized out> freesize = <value optimized out> #6 0x00007ffff72c990d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 No locals. Thread 2134 (Thread 0x7ffff6d84700 (LWP 22980)): #0 __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:373 No locals. #1 0x00007ffff757e6f7 in _L_unlock_657 () from /lib64/libpthread.so.0 No symbol table info available. #2 0x00007ffff757e65f in __pthread_mutex_unlock_usercnt (mutex=0x7fffffffe388) at pthread_mutex_unlock.c:52 ignore = -7288 type = <value optimized out> #3 __pthread_mutex_unlock (mutex=0x7fffffffe388) at pthread_mutex_unlock.c:290 No locals. #4 0x000000000040362b in update_metadata (s=0x7fffffffe340, name=0x7ffff0001cf0 "snapper_thinp", _vgid=0x7ffff0001d98 "EqD8Yd-lFCQ-Ay22-jduQ-vPiB-zXnz-fjqRSM", metadata=<value optimized out>, oldseq=0x7fffffffe400) at lvmetad-core.c:763 cft = <value optimized out> old = <value optimized out> retval = <value optimized out> seq = <value optimized out> haveseq = 1011 oldname = <value optimized out> vgid = <value optimized out> cfgname = <value optimized out> __PRETTY_FUNCTION__ = "update_metadata" #5 0x0000000000404b91 in vg_update (s=..., h=..., r=...) at lvmetad-core.c:928 metadata = 0x7ffff0001d00 vgid = 0x7ffff0001d98 "EqD8Yd-lFCQ-Ay22-jduQ-vPiB-zXnz-fjqRSM" vgname = 0x7ffff0001cf0 "snapper_thinp" #6 handler (s=..., h=..., r=...) at lvmetad-core.c:1066 state = 0x7fffffffe340 rq = <value optimized out> token = <value optimized out> #7 0x0000000000405b18 in client_thread (baton=0xea8cd0) at daemon-server.c:392 b = 0xea8cd0 req = {cft = 0x7ffff0001bc0, buffer = {allocated = 34603008, used = 176481, mem = 0x7fffed4fe010 "request=\"vg_update\"\nvgname =\"snapper_thinp\"\nmetadata {\n\tid=\"EqD8Yd-lFCQ-Ay22-jduQ-vPiB-zXnz-fjqRSM\"\n\tseqno=1012\n\tformat=\"lvm2\"\n\tstatus=[\"RESIZEABLE\", \"READ\", \"WRITE\"]\n\tflags=[]\n\textent_size=8192\n\tmax"...}} res = {error = 71, cft = 0x0, buffer = {allocated = 0, used = 0, mem = 0x0}} #8 0x00007ffff757b851 in start_thread (arg=0x7ffff6d84700) at pthread_create.c:301 __res = <value optimized out> pd = 0x7ffff6d84700 now = <value optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737334757120, -2811678909559803517, 140737343148544, 140737334757824, 0, 3, 2811659442598900099, I have tracked this down to a race between metadata update and a metadata read on the same VG. The prospective patch is below, the idea is to avoid a period when an existing VG is not mapped to either the old or the new name. (Note that the brief "blackout" was present even if the name did not actually change.) We instead allow a brief overlap of a VG existing under both names, i.e. a query for a VG might succeed but before a lock is acquired the VG disappears. This can be encountered in vgremove situations as well, so it shouldn't lead to new problems. So far the patch is doing OK in upstream tests, I'll push it upstream tomorrow or so. diff --git a/daemons/lvmetad/lvmetad-core.c b/daemons/lvmetad/lvmetad-core.c index a951b30..242cbdf 100644 --- a/daemons/lvmetad/lvmetad-core.c +++ b/daemons/lvmetad/lvmetad-core.c @@ -447,7 +447,8 @@ static response vg_lookup(lvmetad_state *s, request r) DEBUGLOG(s, "vg_lookup: updated uuid = %s, name = %s", uuid, name); - if (!uuid) + /* Check the name here. */ + if (!uuid || !name) return reply_unknown("VG not found"); cft = lock_vg(s, uuid); @@ -675,16 +676,14 @@ static int update_metadata(lvmetad_state *s, const char *name, const char *_vgid lock_vgid_to_metadata(s); old = dm_hash_lookup(s->vgid_to_metadata, _vgid); + oldname = dm_hash_lookup(s->vgid_to_vgname, _vgid); unlock_vgid_to_metadata(s); lock_vg(s, _vgid); seq = dm_config_find_int(metadata, "metadata/seqno", -1); - if (old) { + if (old) haveseq = dm_config_find_int(old->root, "metadata/seqno", -1); - oldname = dm_hash_lookup(s->vgid_to_vgname, _vgid); - assert(oldname); - } if (seq < 0) goto out; @@ -736,7 +735,7 @@ static int update_metadata(lvmetad_state *s, const char *name, const char *_vgid if (haveseq >= 0 && haveseq < seq) { INFO(s, "Updating metadata for %s at %d to %d", _vgid, haveseq, seq); /* temporarily orphan all of our PVs */ - remove_metadata(s, vgid, 1); + update_pvid_to_vgid(s, old, "#orphan", 0); } lock_vgid_to_metadata(s); @@ -746,14 +745,17 @@ static int update_metadata(lvmetad_state *s, const char *name, const char *_vgid dm_hash_insert(s->vgid_to_metadata, vgid, cft) && dm_hash_insert(s->vgid_to_vgname, vgid, cfgname) && dm_hash_insert(s->vgname_to_vgid, name, (void*) vgid)) ? 1 : 0; + + if (retval && oldname && strcmp(name, oldname)) + dm_hash_remove(s->vgname_to_vgid, oldname); + unlock_vgid_to_metadata(s); if (retval) - /* FIXME: What should happen when update fails */ retval = update_pvid_to_vgid(s, cft, vgid, 1); unlock_pvid_to_vgid(s); -out: +out: /* FIXME: We should probably abort() on partial failures. */ if (!retval && cft) dm_config_destroy(cft); unlock_vg(s, _vgid); I did a scratch build including the patch above. Nenad, would you give it a try? Thanks. https://brewweb.devel.redhat.com/taskinfo?taskID=5273516 I ran 2x500 snapshots and once 600 snapshots of the thinp_origin without problems. lvmetad continued running without issues. Create VG snapper_thinp on r6-node01 ============================================================ Iteration 1 of 1 started at Tue Jan 15 11:45:37 CST 2013 ============================================================ SCENARIO - [many_snaps] Create 600 snapshots of an origin volume Recreating VG and PVs to increase metadata size Making origin volume Creating thinpool and corresponding thin virtual volumes (one to be used as an origin) lvcreate --thinpool POOL -L 1G snapper_thinp lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n origin lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n other1 lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n other2 lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n other3 lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n other4 lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n other5 Making 600 snapshots of origin volume 1 lvcreate -s /dev/snapper_thinp/origin -n 600_1 2 lvcreate -s /dev/snapper_thinp/origin -n 600_2 3 lvcreate -s /dev/snapper_thinp/origin -n 600_3 4 lvcreate -s /dev/snapper_thinp/origin -n 600_4 5 lvcreate -s /dev/snapper_thinp/origin -n 600_5 . . . 596 lvcreate -s /dev/snapper_thinp/origin -n 600_596 597 lvcreate -s /dev/snapper_thinp/origin -n 600_597 598 lvcreate -s /dev/snapper_thinp/origin -n 600_598 599 lvcreate -s /dev/snapper_thinp/origin -n 600_599 600 lvcreate -s /dev/snapper_thinp/origin -n 600_600 Removing volume snapper_thinp/600_1 Removing volume snapper_thinp/600_2 Removing volume snapper_thinp/600_3 Removing volume snapper_thinp/600_4 . . . Removing volume snapper_thinp/600_599 Removing volume snapper_thinp/600_600 Removing thin origin and other virtual thin volumes Removing thinpool snapper_thinp/POOL Restoring VG and PVs to default metadata format Deleting VG snapper_thinp [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad status lvmetad (pid 10678) is running... [root@r6-node01:~]$ The patch is upstream now: http://git.fedorahosted.org/cgit/lvm2.git/commit/?id=15fdd5c90dda7f00f691668c13d5401206d22021 Proposing this one as a blocker. New thinp snapshot support as well as lvmetad is changing from tech preview to full support in 6.4 and using many snapshots of one origin with thinp is the obvious use case that makes the difference between old snapshots and thinp snapshots and it's one of the main features that users are going to make use of. Marking verified with 600 snaps with and without lvmetad running. No problems appeared with 600 snaps. Tested with: lvm2-2.02.98-7.1.el6.x86_64 device-mapper-1.02.77-7.1.el6.x86_64 kernel-2.6.32-354.el6.x86_64 Correction, tested and verified with newest LVM packages upstream: lvm2-2.02.98-8.el6.x86_64 device-mapper-1.02.77-8.el6.x86_64 lvm2-libs-2.02.98-8.el6.x86_64 and kernel-2.6.32-354.el6.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0501.html |