Bug 871058

Summary:

lvmetad crashes when making many snaps of thinp origin

Product:

Red Hat Enterprise Linux 6

Reporter:

Nenad Peric <nperic>

Component:

lvm2

Assignee:

Petr Rockai <prockai>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

unspecified

Docs Contact:

Priority:

high

Version:

6.4

CC:

agk, cmarthal, dwysocha, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, thornber, tlavigne, zkabelac

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

lvm2-2.02.98-8.el6

Doc Type:

Bug Fix

Doc Text:

A race condition in lvmetad could have caused LVM commands to fail intermittently, failing to find a VG that was being updated at the same time by another command. The race condition has been fixed.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-02-21 08:14:50 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

902691

Attachments:

Description	Flags
lvmdump after restart of lvmetad	none
lvmetad running under gdb	none

Description Nenad Peric 2012-10-29 14:12:02 UTC

Description of problem:

When running many_snaps tests (creating many snapshots of a thinp origin) lvmetad segfaults. 

Here is the backtrace:

(ran lvmetad in foreground through gdb):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6384700 (LWP 3889)]
dm_escaped_len (str=0x0) at libdm-string.c:224
224             _count_chars(str, &len, &count, '\"', '\\');

(gdb) bt full
#0  dm_escaped_len (str=0x0) at libdm-string.c:224
        len = 1
        count = 0
#1  0x00007ffff77b2851 in _write_value (outline=0x7ffff6383c20, v=0x7ffff0011970) at libdm-config.c:253
        buf = <value optimized out>
#2  0x00007ffff77b2cdb in _write_config (n=0x7ffff0011940, only_one=0, outline=0x7ffff6383c20, level=0) at libdm-config.c:323
        v = 0x7ffff0011970
        space = "\000\031\001\360\377\177\000\000H\000\000\000\000\000\000\000\000\004\000\000\000\000\000\000[E{\367\377\177\000\000\300"
        i = <value optimized out>
#3  0x00007ffff77b2eea in _write_node (cn=0x7ffff0011908, only_one=0, putline=0x4071d0 <buffer_line>, baton=0x7ffff6383db0) at libdm-config.c:342
        outline = {mem = 0x7ffff0001d50, putline = 0x4071d0 <buffer_line>, putline_baton = 0x7ffff6383db0}
#4  0x0000000000405b67 in client_thread (baton=0xe74e20) at daemon-server.c:395
        b = 0xe74e20
        req = {cft = 0x7ffff00058b0, buffer = {allocated = 1056, used = 85, 
            mem = 0x7ffff00008c0 "request=\"vg_lookup\"\nuuid =\"VXVAXp-75nX-MfyN-Y2KJ-YKgQ-JPR2-sNNnlt\"\ntoken =\"filter:0\"\n"}}
        res = {error = 0, cft = 0x7ffff00118e0, buffer = {allocated = 28, used = 14, mem = 0x7ffff0001da0 "response=\"OK\"\n"}}
#5  0x00007ffff757c851 in start_thread (arg=0x7ffff6384700) at pthread_create.c:301
        __res = <value optimized out>
        pd = 0x7ffff6384700
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737324271360, -6445152368550991115, 140737343152640, 140737324272064, 0, 3, 6445166058729908981, 
                6445168653367257845}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
        freesize = <value optimized out>
#6  0x00007ffff72ca90d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
No locals.

Version-Release number of selected component (if applicable):


lvm2-2.02.98-2.el6.x86_64

How reproducible:

Every time I run many_snaps test. 

Steps to Reproduce:
1.
2.
3.
  
Actual results:

lvmetad crashes with segfault. 

Expected results:

lvmetad should not crash. 

Additional info:

ran on a virtual machine with 4GB of RAM.

Comment 2 Petr Rockai 2012-11-19 15:11:59 UTC

Any way I look at this, it appears to be an out of memory problem, in which case a lvmetad crash would be more or less the right thing to do (even though it might be a good thing to do this in a more controlled manner). Could you please check, if you can reproduce, how much memory is available at the time of the crash? Also, a full dump of lvmetad threads ("thread apply all bt full" in gdb) and maybe an lvmdump after restarting lvmetad would be useful. How many snapshots would be a good thing to know, too.

Thanks! Petr

Comment 3 Nenad Peric 2012-12-17 14:37:31 UTC

The system memory was never used more than 750MiB. 
At the time lvmetad died it was at 680MiB system memory used (of 4GiB)

Will test it now with debugger and attach what I manage to get ,.. 

The number where it crashes is around 300. (first time it crashed at 313, second at 301)

Comment 4 Nenad Peric 2012-12-17 15:10:30 UTC

Created attachment 664927 [details]
lvmdump after restart of lvmetad

Comment 5 Nenad Peric 2012-12-17 15:12:33 UTC

It crashed on 159 snap, after the crash here is what it looked like:

two restarts of dead lvmetad needed. 

(09:02:17) [root@r6-node01:~]$ vgs
  WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning.
  VG            #PV #LV #SN Attr   VSize  VFree 
  VolGroup        1   2   0 wz--n-  9.51g     0 
  snapper_thinp   2 166   0 wz--n- 19.01g 18.00g
(09:09:07) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad status
lvmetad dead but pid file exists
(09:09:34) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad restart
Signaling LVM metadata daemon to exit:                     [  OK  ]
Starting LVM metadata daemon:                              [  OK  ]
(09:09:38) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad status
lvmetad dead but subsys locked
(09:09:40) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad restart
Signaling LVM metadata daemon to exit:                     [  OK  ]
Starting LVM metadata daemon:                              [  OK  ]
(09:09:44) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad status
lvmetad (pid  30096) is running...

--- created lvmdump (attached) ----

Comment 6 Nenad Peric 2012-12-17 15:43:02 UTC

Created attachment 664935 [details]
lvmetad running under gdb

Attaching a debug log for lvmetad from gdb with 

thread apply all bt full

as well.

Comment 7 Nenad Peric 2012-12-17 17:45:50 UTC

Tested with  lvm2-2.02.98-6.el6.x86_64

The memory usage is slightly higher and the addition of snapshots after 100 is slower, but the lvmetad crashes before the test ends (now at around 400th snapshot onwards). 

a very partial gdb output:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffeffff700 (LWP 22971)]
dm_escaped_len (str=0x0) at libdm-string.c:224
224             _count_chars(str, &len, &count, '\"', '\\');
(gdb) bt
#0  dm_escaped_len (str=0x0) at libdm-string.c:224
#1  0x00007ffff77b1a71 in _write_value (outline=0x7fffefffec20, v=0x7fffe40052a0) at libdm-config.c:253
#2  0x00007ffff77b1efb in _write_config (n=0x7fffe4005270, only_one=0, outline=0x7fffefffec20, level=0) at libdm-config.c:323
#3  0x00007ffff77b210a in _write_node (cn=0x7fffe4005238, only_one=0, putline=0x4071f0 <buffer_line>, baton=0x7fffefffedb0) at libdm-config.c:342
#4  0x0000000000405b87 in client_thread (baton=0xea8950) at daemon-server.c:395
#5  0x00007ffff757b851 in start_thread (arg=0x7fffeffff700) at pthread_create.c:301
#6  0x00007ffff72c990d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) bt full
#0  dm_escaped_len (str=0x0) at libdm-string.c:224
        len = 1
        count = 0
#1  0x00007ffff77b1a71 in _write_value (outline=0x7fffefffec20, v=0x7fffe40052a0) at libdm-config.c:253
        buf = <value optimized out>
#2  0x00007ffff77b1efb in _write_config (n=0x7fffe4005270, only_one=0, outline=0x7fffefffec20, level=0) at libdm-config.c:323
        v = 0x7fffe40052a0
        space = "\000R\000\344\377\177\000\000H\000\000\000\000\000\000\000\000\004\000\000\000\000\000\000{7{\367\377\177\000\000", <incomplete sequence \360>
        i = <value optimized out>
#3  0x00007ffff77b210a in _write_node (cn=0x7fffe4005238, only_one=0, putline=0x4071f0 <buffer_line>, baton=0x7fffefffedb0) at libdm-config.c:342
        outline = {mem = 0x7fffe40094b0, putline = 0x4071f0 <buffer_line>, putline_baton = 0x7fffefffedb0}
#4  0x0000000000405b87 in client_thread (baton=0xea8950) at daemon-server.c:395
        b = 0xea8950
        req = {cft = 0x7fffe40008e0, buffer = {allocated = 1056, used = 85, 
            mem = 0x7fffe4004af0 "request=\"vg_lookup\"\nuuid =\"EqD8Yd-lFCQ-Ay22-jduQ-vPiB-zXnz-fjqRSM\"\ntoken =\"filter:0\"\n"}}
        res = {error = 0, cft = 0x7fffe4005210, buffer = {allocated = 28, used = 14, mem = 0x7fffe40092d0 "response=\"OK\"\n"}}
#5  0x00007ffff757b851 in start_thread (arg=0x7fffeffff700) at pthread_create.c:301
        __res = <value optimized out>
        pd = 0x7fffeffff700
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737219917568, -2811678909559803517, 140737343148544, 140737219918272, 0, 3, 2811714094447128963, 
                2811660426340397443}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
        freesize = <value optimized out>
#6  0x00007ffff72c990d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
No locals.






Thread 2134 (Thread 0x7ffff6d84700 (LWP 22980)):
#0  __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:373
No locals.
#1  0x00007ffff757e6f7 in _L_unlock_657 () from /lib64/libpthread.so.0
No symbol table info available.
#2  0x00007ffff757e65f in __pthread_mutex_unlock_usercnt (mutex=0x7fffffffe388) at pthread_mutex_unlock.c:52
        ignore = -7288
        type = <value optimized out>
#3  __pthread_mutex_unlock (mutex=0x7fffffffe388) at pthread_mutex_unlock.c:290
No locals.
#4  0x000000000040362b in update_metadata (s=0x7fffffffe340, name=0x7ffff0001cf0 "snapper_thinp", 
    _vgid=0x7ffff0001d98 "EqD8Yd-lFCQ-Ay22-jduQ-vPiB-zXnz-fjqRSM", metadata=<value optimized out>, oldseq=0x7fffffffe400) at lvmetad-core.c:763
        cft = <value optimized out>
        old = <value optimized out>
        retval = <value optimized out>
        seq = <value optimized out>
        haveseq = 1011
        oldname = <value optimized out>
        vgid = <value optimized out>
        cfgname = <value optimized out>
        __PRETTY_FUNCTION__ = "update_metadata"
#5  0x0000000000404b91 in vg_update (s=..., h=..., r=...) at lvmetad-core.c:928
        metadata = 0x7ffff0001d00
        vgid = 0x7ffff0001d98 "EqD8Yd-lFCQ-Ay22-jduQ-vPiB-zXnz-fjqRSM"
        vgname = 0x7ffff0001cf0 "snapper_thinp"
#6  handler (s=..., h=..., r=...) at lvmetad-core.c:1066
        state = 0x7fffffffe340
        rq = <value optimized out>
        token = <value optimized out>
#7  0x0000000000405b18 in client_thread (baton=0xea8cd0) at daemon-server.c:392
        b = 0xea8cd0
        req = {cft = 0x7ffff0001bc0, buffer = {allocated = 34603008, used = 176481, 
            mem = 0x7fffed4fe010 "request=\"vg_update\"\nvgname =\"snapper_thinp\"\nmetadata  {\n\tid=\"EqD8Yd-lFCQ-Ay22-jduQ-vPiB-zXnz-fjqRSM\"\n\tseqno=1012\n\tformat=\"lvm2\"\n\tstatus=[\"RESIZEABLE\", \"READ\", \"WRITE\"]\n\tflags=[]\n\textent_size=8192\n\tmax"...}}
        res = {error = 71, cft = 0x0, buffer = {allocated = 0, used = 0, mem = 0x0}}
#8  0x00007ffff757b851 in start_thread (arg=0x7ffff6d84700) at pthread_create.c:301
        __res = <value optimized out>
        pd = 0x7ffff6d84700
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737334757120, -2811678909559803517, 140737343148544, 140737334757824, 0, 3, 2811659442598900099,

Comment 8 Petr Rockai 2013-01-10 04:17:02 UTC

I have tracked this down to a race between metadata update and a metadata read on the same VG. The prospective patch is below, the idea is to avoid a period when an existing VG is not mapped to either the old or the new name. (Note that the brief "blackout" was present even if the name did not actually change.) We instead allow a brief overlap of a VG existing under both names, i.e. a query for a VG might succeed but before a lock is acquired the VG disappears. This can be encountered in vgremove situations as well, so it shouldn't lead to new problems. So far the patch is doing OK in upstream tests, I'll push it upstream tomorrow or so.

diff --git a/daemons/lvmetad/lvmetad-core.c b/daemons/lvmetad/lvmetad-core.c
index a951b30..242cbdf 100644
--- a/daemons/lvmetad/lvmetad-core.c
+++ b/daemons/lvmetad/lvmetad-core.c
@@ -447,7 +447,8 @@ static response vg_lookup(lvmetad_state *s, request r)
 
        DEBUGLOG(s, "vg_lookup: updated uuid = %s, name = %s", uuid, name);
 
-       if (!uuid)
+       /* Check the name here. */
+       if (!uuid || !name)
                return reply_unknown("VG not found");
 
        cft = lock_vg(s, uuid);
@@ -675,16 +676,14 @@ static int update_metadata(lvmetad_state *s, const char *name, const char *_vgid
 
        lock_vgid_to_metadata(s);
        old = dm_hash_lookup(s->vgid_to_metadata, _vgid);
+       oldname = dm_hash_lookup(s->vgid_to_vgname, _vgid);
        unlock_vgid_to_metadata(s);
        lock_vg(s, _vgid);
 
        seq = dm_config_find_int(metadata, "metadata/seqno", -1);
 
-       if (old) {
+       if (old)
                haveseq = dm_config_find_int(old->root, "metadata/seqno", -1);
-               oldname = dm_hash_lookup(s->vgid_to_vgname, _vgid);
-               assert(oldname);
-       }
 
        if (seq < 0)
                goto out;
@@ -736,7 +735,7 @@ static int update_metadata(lvmetad_state *s, const char *name, const char *_vgid
        if (haveseq >= 0 && haveseq < seq) {
                INFO(s, "Updating metadata for %s at %d to %d", _vgid, haveseq, seq);
                /* temporarily orphan all of our PVs */
-               remove_metadata(s, vgid, 1);
+               update_pvid_to_vgid(s, old, "#orphan", 0);
        }
 
        lock_vgid_to_metadata(s);
@@ -746,14 +745,17 @@ static int update_metadata(lvmetad_state *s, const char *name, const char *_vgid
                  dm_hash_insert(s->vgid_to_metadata, vgid, cft) &&
                  dm_hash_insert(s->vgid_to_vgname, vgid, cfgname) &&
                  dm_hash_insert(s->vgname_to_vgid, name, (void*) vgid)) ? 1 : 0;
+
+       if (retval && oldname && strcmp(name, oldname))
+               dm_hash_remove(s->vgname_to_vgid, oldname);
+
        unlock_vgid_to_metadata(s);
 
        if (retval)
-               /* FIXME: What should happen when update fails */
                retval = update_pvid_to_vgid(s, cft, vgid, 1);
 
        unlock_pvid_to_vgid(s);
-out:
+out: /* FIXME: We should probably abort() on partial failures. */
        if (!retval && cft)
                dm_config_destroy(cft);
        unlock_vg(s, _vgid);

Comment 9 Peter Rajnoha 2013-01-15 15:25:04 UTC

I did a scratch build including the patch above. Nenad, would you give it a try? Thanks.

https://brewweb.devel.redhat.com/taskinfo?taskID=5273516

Comment 10 Nenad Peric 2013-01-15 18:05:22 UTC

I ran 2x500 snapshots and once 600 snapshots of the thinp_origin without problems.
lvmetad continued running without issues. 

Create VG snapper_thinp on r6-node01
============================================================
Iteration 1 of 1 started at Tue Jan 15 11:45:37 CST 2013
============================================================
SCENARIO - [many_snaps]
Create 600 snapshots of an origin volume
Recreating VG and PVs to increase metadata size
Making origin volume
Creating thinpool and corresponding thin virtual volumes (one to be used as an origin)
lvcreate --thinpool POOL -L 1G snapper_thinp
lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n origin
lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n other1
lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n other2
lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n other3
lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n other4
lvcreate --virtualsize 1G --thinpool snapper_thinp/POOL -n other5
Making 600 snapshots of origin volume
1 lvcreate -s /dev/snapper_thinp/origin -n 600_1
2 lvcreate -s /dev/snapper_thinp/origin -n 600_2
3 lvcreate -s /dev/snapper_thinp/origin -n 600_3
4 lvcreate -s /dev/snapper_thinp/origin -n 600_4
5 lvcreate -s /dev/snapper_thinp/origin -n 600_5
.
.
.
596 lvcreate -s /dev/snapper_thinp/origin -n 600_596
597 lvcreate -s /dev/snapper_thinp/origin -n 600_597
598 lvcreate -s /dev/snapper_thinp/origin -n 600_598
599 lvcreate -s /dev/snapper_thinp/origin -n 600_599
600 lvcreate -s /dev/snapper_thinp/origin -n 600_600

Removing volume snapper_thinp/600_1
Removing volume snapper_thinp/600_2
Removing volume snapper_thinp/600_3
Removing volume snapper_thinp/600_4
.
.
.
Removing volume snapper_thinp/600_599
Removing volume snapper_thinp/600_600
Removing thin origin and other virtual thin volumes
Removing thinpool snapper_thinp/POOL
Restoring VG and PVs to default metadata format


Deleting VG snapper_thinp


[root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad status
lvmetad (pid  10678) is running...
[root@r6-node01:~]$

Comment 11 Peter Rajnoha 2013-01-16 11:42:18 UTC

The patch is upstream now:

http://git.fedorahosted.org/cgit/lvm2.git/commit/?id=15fdd5c90dda7f00f691668c13d5401206d22021

Proposing this one as a blocker. New thinp snapshot support as well as lvmetad is changing from tech preview to full support in 6.4 and using many snapshots of one origin with thinp is the obvious use case that makes the difference between old snapshots and thinp snapshots and it's one of the main features that users are going to make use of.

Comment 13 Nenad Peric 2013-01-17 13:15:45 UTC

Marking verified with 600 snaps with and without lvmetad running. 

No problems appeared with 600 snaps. 

Tested with:

lvm2-2.02.98-7.1.el6.x86_64
device-mapper-1.02.77-7.1.el6.x86_64
kernel-2.6.32-354.el6.x86_64

Comment 14 Nenad Peric 2013-01-17 13:59:18 UTC

Correction, tested and verified with newest LVM packages upstream:

lvm2-2.02.98-8.el6.x86_64
device-mapper-1.02.77-8.el6.x86_64
lvm2-libs-2.02.98-8.el6.x86_64
and 
kernel-2.6.32-354.el6.x86_64

Comment 15 errata-xmlrpc 2013-02-21 08:14:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0501.html