Bug 823918 - lvconvert segfault while polling for completion and lvmetad stopped at the same time
lvconvert segfault while polling for completion and lvmetad stopped at the sa...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.3
All Linux
high Severity medium
: rc
: ---
Assigned To: Petr Rockai
Cluster QE
:
Depends On:
Blocks: 817776
  Show dependency treegraph
 
Reported: 2012-05-22 08:57 EDT by Peter Rajnoha
Modified: 2013-02-21 03:10 EST (History)
12 users (show)

See Also:
Fixed In Version: lvm2-2.02.98-1.el6
Doc Type: Bug Fix
Doc Text:
When lvmetad was restarted while lvconvert polling was ongoing, the polling process would have crashed. The crash has been fixed. However, the process will still terminate and polling must be restarted manually to observe further progress.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 03:10:10 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Peter Rajnoha 2012-05-22 08:57:52 EDT
Description of problem:
I've just hit this by chance... The lvconvert cmd segfaults when waiting for the completion of the conversion and at the same time we stop lvmetad if it was used before in the process (though not a nice thing to do, but it should be handled somehow better, not ending up with a segfault).

Version-Release number of selected component (if applicable):
lvm2-2.02.95-10.el6.x86_64

How reproducible:
For example by converting linear to a mirror and then killing lvmetad in the meantime:

pvcreate /dev/sda /dev/sdb
vgcreate vg /dev/sda /dev/sdb
lvcreate -l50%FREE vg
lvconvert -m1 --alloc anywhere --corelog vg/lvol0
  vg/lvol0: Converted: 6.5%

--> service lvm2-lvmetad stop (or just kill the lvmetad)

Segmentation fault (core dumped)


Additional info:
May 22 14:50:40 node-a lvmetad[2231]: Failed to handle a client connection.
May 22 14:50:40 node-a lvmetad[2231]: lvmetad shutting down
May 22 14:50:50 node-a lvm[1980]: vg-lvol0 is now in-sync.
May 22 14:50:55 node-a kernel: lvconvert[2357]: segfault at 0 ip 00000000004bf628 sp 00007fff1b049fe0 error 4 in lvm[400000+107000]

(gdb) bt
#0  0x00000000004bf628 in daemon_reply_str (r=..., path=0x4f094f "response", def=0x4f0918 "") at ../include/daemon-client.h:98
#1  0x00000000004c014b in lvmetad_vg_lookup (cmd=0x92a3a0, vgname=0x145411a "vg", vgid=0x0) at cache/lvmetad.c:201
#2  0x0000000000446ef1 in lvmcache_get_vg (cmd=0x92a3a0, vgname=0x145411a "vg", vgid=0x0, precommitted=0) at cache/lvmcache.c:742
#3  0x0000000000488ddd in _vg_read (cmd=0x92a3a0, vgname=0x145411a "vg", vgid=0x0, warnings=1, consistent=0x7fff1b04a384, precommitted=0) at metadata/metadata.c:2953
#4  0x000000000048a28b in vg_read_internal (cmd=0x92a3a0, vgname=0x145411a "vg", vgid=0x0, warnings=1, consistent=0x7fff1b04a384) at metadata/metadata.c:3394
#5  0x000000000048bab4 in _vg_lock_and_read (cmd=0x92a3a0, vg_name=0x145411a "vg", vgid=0x0, lock_flags=36, status_flags=514, misc_flags=1048576) at metadata/metadata.c:4029
#6  0x000000000048bf2c in vg_read (cmd=0x92a3a0, vg_name=0x145411a "vg", vgid=0x0, flags=1048576) at metadata/metadata.c:4133
#7  0x000000000048bf6d in vg_read_for_update (cmd=0x92a3a0, vg_name=0x145411a "vg", vgid=0x0, flags=0) at metadata/metadata.c:4144
#8  0x000000000041833c in _get_lvconvert_vg (cmd=0x92a3a0, name=0x7fff1b04a5b0 "vg/lvol0", uuid=0x7fff1b04a5d0 "zQN4H9CElAXD7E9XJruMGmWqYdeo1Vf7fkRRL1xrtlsOdt6hcVkN24mM5Yrhi1Bl") at lvconvert.c:375
#9  0x000000000042b7fc in _wait_for_single_lv (cmd=0x92a3a0, name=0x7fff1b04a5b0 "vg/lvol0", uuid=0x7fff1b04a5d0 "zQN4H9CElAXD7E9XJruMGmWqYdeo1Vf7fkRRL1xrtlsOdt6hcVkN24mM5Yrhi1Bl", parms=0x7fff1b04a540) at polldaemon.c:205
#10 0x000000000042bdfa in poll_daemon (cmd=0x92a3a0, name=0x7fff1b04a5b0 "vg/lvol0", uuid=0x7fff1b04a5d0 "zQN4H9CElAXD7E9XJruMGmWqYdeo1Vf7fkRRL1xrtlsOdt6hcVkN24mM5Yrhi1Bl", background=0, lv_type=0, poll_fns=0x70a3c0, 
    progress_title=0x4cb2a8 "Converted") at polldaemon.c:353
#11 0x0000000000418b26 in lvconvert_poll (cmd=0x92a3a0, lv=0x1445468, background=0) at lvconvert.c:531
#12 0x000000000041c93d in poll_logical_volume (cmd=0x92a3a0, lv=0x1445468, wait_completion=1) at lvconvert.c:1879
#13 0x000000000041cba7 in lvconvert_single (cmd=0x92a3a0, lp=0x7fff1b04a760) at lvconvert.c:1919
#14 0x000000000041ceb3 in lvconvert (cmd=0x92a3a0, argc=1, argv=0x7fff1b04a9f0) at lvconvert.c:1995
#15 0x0000000000425e60 in lvm_run_command (cmd=0x92a3a0, argc=1, argv=0x7fff1b04a9f0) at lvmcmdline.c:1099
#16 0x0000000000426eae in lvm2_main (argc=6, argv=0x7fff1b04a9c8) at lvmcmdline.c:1468
#17 0x00000000004404d4 in main (argc=6, argv=0x7fff1b04a9c8) at lvm.c:21
Comment 1 RHEL Product and Program Management 2012-07-10 04:24:48 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 2 RHEL Product and Program Management 2012-07-10 19:58:23 EDT
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
Comment 3 Petr Rockai 2012-10-10 16:04:31 EDT
This should have been fixed by c731bb1ee13565763cc1ac77ed1a01ccea0337ac.
Comment 6 Nenad Peric 2012-12-20 11:07:31 EST
This seems not to work as expected:

(10:05:53) [root@r6-node01:~]$ vgcreate vg /dev/sda1 /dev/sdb1
  Volume group "vg" successfully created
(10:06:39) [root@r6-node01:~]$ lvcreate -l50%FREE vg -n lv
  Logical volume "lv" created
(10:06:46) [root@r6-node01:~]$ lvconvert -m1 --alloc anywhere --corelog vg/lv
  vg/lv: Converted: 0.0%
  vg/lv: Converted: 6.6%
  vg/lv: Converted: 13.3%

from another console : 
(10:06:28) [root@r6-node01:~]$ /etc/init.d/lvm2-lvmetad stop
Signaling LVM metadata daemon to exit:                     [  OK  ]

back to the main console:


  Volume group "vg" not found
(10:07:36) [root@r6-node01:~]$ 

(10:07:36) [root@r6-node01:~]$ lvs
  WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning.
  LV      VG       Attr      LSize Pool Origin Data%  Move Log Cpy%Sync Convert
  lv_root VolGroup -wi-ao--- 7.54g                                             
  lv_swap VolGroup -wi-ao--- 1.97g                                             
  lv      vg       mwi-a-m-- 9.99g                                30.73        


But sync seemed to have continued;

:08:54) [root@r6-node01:~]$ lvs
  WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning.
  LV      VG       Attr      LSize Pool Origin Data%  Move Log Cpy%Sync Convert
  lv_root VolGroup -wi-ao--- 7.54g                                             
  lv_swap VolGroup -wi-ao--- 1.97g                                             
  lv      vg       mwi-a-m-- 9.99g                                37.45    

(10:09:36) [root@r6-node01:~]$ lvs -a -o +devices
  WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning.
  LV            VG       Attr      LSize Pool Origin Data%  Move Log Cpy%Sync Convert Devices                      
  lv_root       VolGroup -wi-ao--- 7.54g                                              /dev/vda2(0)                 
  lv_swap       VolGroup -wi-ao--- 1.97g                                              /dev/vda2(1930)              
  lv            vg       mwi-a-m-- 9.99g                                39.29         lv_mimage_0(0),lv_mimage_1(0)
  [lv_mimage_0] vg       Iwi-aom-- 9.99g                                              /dev/sda1(0)                 
  [lv_mimage_1] vg       Iwi-aom-- 9.99g                                              /dev/sdb1(0)
Comment 7 Nenad Peric 2012-12-20 11:22:12 EST
Just for clarity what happened on the console running lvconvert:

(10:06:46) [root@r6-node01:~]$ lvconvert -m1 --alloc anywhere --corelog vg/lv
  vg/lv: Converted: 0.0%
  vg/lv: Converted: 6.6%
  vg/lv: Converted: 13.3%

  Volume group "vg" not found
(10:07:36) [root@r6-node01:~]$
Comment 8 Petr Rockai 2012-12-30 10:14:49 EST
Yes, this is a known problem, but different from the segfault. A running LVM process can't switch over between lvmetad and non-lvmetad mode of operation on the fly. This might actually be a problem (especially for pvmove), but is certainly not a segfault as the bug title says. I suggest a new bug (targeted for 6.5) is created for the ability to fall back to non-lvmetad operation on the fly and this one is kept for the segfault (which was a somewhat different problem).
Comment 9 Nenad Peric 2013-01-02 06:58:01 EST
the lvcreate does not segfault anymore, it just returns back to prompt with the message Volume group "vg_name" not found.

The converting process keeps going on in the background. 

Since this is not related to segmentation fault anymore a new BZ will be opened describing this situation. 

Verified that there is no segfault with:

lvm2-2.02.98-6.el6.x86_64
lvm2-libs-2.02.98-6.el6.x86_64
lvm2-devel-2.02.98-6.el6.x86_64
lvm2-debuginfo-2.02.98-6.el6.x86_64
Comment 10 Nenad Peric 2013-01-02 07:09:24 EST
Created a BZ for 6.5 (Bug 891271) mentioning the issue with switching over (or rather not switching over)
Comment 11 errata-xmlrpc 2013-02-21 03:10:10 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0501.html

Note You need to log in before you can comment on or make changes to this bug.