886695 – lvmetad causes issues when attempting to create thinp volumes

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 886695 - lvmetad causes issues when attempting to create thinp volumes

Summary: lvmetad causes issues when attempting to create thinp volumes

Keywords:
Status:	CLOSED DUPLICATE of bug 889454
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Zdenek Kabelac
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-12-12 22:05 UTC by Corey Marthaler
Modified:	2013-01-15 14:47 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-01-15 14:47:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
log from thin pool creation deadlock w/ lvmetad running (405.17 KB, text/plain) 2012-12-19 16:54 UTC, Corey Marthaler	no flags	Details
View All

Description Corey Marthaler 2012-12-12 22:05:35 UTC

Description of problem:
[cmarthal@silver bin]$ ./snapper_thinp -l /home/msp/cmarthal/work/sts/sts-root -r /usr/tests/sts-rhel6.4 -o hayes-01 -e poolmetadatasizes -i 10 

1 disk(s) to be used:
        hayes-01=/dev/etherd/e1.1

on hayes-01...
dicing /dev/etherd/e1.1 into 2... 
re-reading disks on hayes-01...
Zeroing out the new partitions.../dev/etherd/e1.1p1.../dev/etherd/e1.1p2...
creating lvm devices...
Create 2 PV(s) for snapper_thinp on hayes-01
Create VG snapper_thinp on hayes-01
============================================================
Iteration 1 of 10 started at Wed Dec 12 15:50:55 CST 2012
============================================================
SCENARIO - [poolmetadatasizes]
Create pool volumes of varying pool metadata sizes (--poolmetadatasize)

Creating [4.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot
lvcreate --thinpool POOL_4.00m --poolmetadatasize 4.00m -L 500M snapper_thinp
lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_4.00m -n origin_4.00m
lvcreate -s /dev/snapper_thinp/origin_4.00m -n snap_4.00m

[...]

Creating [256.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot
lvcreate --thinpool POOL_256.00m --poolmetadatasize 256.00m -L 500M snapper_thinp
lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_256.00m -n origin_256.00m
  Request to update VG snapper_thinp in lvmetad gave response Connection reset by peer.
couldn't create thin origin LV

real    1m52.533s
user    0m0.197s
sys     0m0.129s


Dec 12 15:51:36 hayes-01 qarshd[3072]: Running cmdline: lvcreate --thinpool POOL_256.00m --poolmetadatasize 256.00m -L 500M snapper_thinp
Dec 12 15:51:39 hayes-01 kernel: device-mapper: thin: Data device (dm-34) discard unsupported: Disabling discard passdown.
Dec 12 15:51:39 hayes-01 lvm[1921]: Monitoring thin snapper_thinp-POOL_256.00m-tpool.
Dec 12 15:51:40 hayes-01 xinetd[1597]: EXIT: qarsh status=0 pid=3072 duration=4(sec)
Dec 12 15:51:40 hayes-01 xinetd[1597]: START: qarsh pid=3106 from=::ffff:10.15.80.47
Dec 12 15:51:40 hayes-01 qarshd[3106]: Talking to peer 10.15.80.47:42631 (IPv4)
Dec 12 15:51:40 hayes-01 qarshd[3106]: Running cmdline: lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_256.00m -n origin_256.00m
Dec 12 15:51:40 hayes-01 kernel: lvmetad[3105] general protection ip:3d26076205 sp:7fd57abafa80 error:0 in libc-2.12.so[3d26000000+18a000]
Dec 12 15:51:41 hayes-01 abrtd: Directory 'ccpp-2012-12-12-15:51:40-1876' creation detected
Dec 12 15:51:41 hayes-01 abrt[3109]: Saved core dump of pid 1876 (/sbin/lvmetad) to /var/spool/abrt/ccpp-2012-12-12-15:51:40-1876 (47603712 bytes)
Dec 12 15:51:41 hayes-01 udevd-work[2157]: '/sbin/lvm pvscan --cache --activate ay --major 152 --minor 273' unexpected exit with status 0x000d
Dec 12 15:51:41 hayes-01 xinetd[1597]: EXIT: qarsh status=0 pid=3106 duration=1(sec)
Dec 12 15:51:42 hayes-01 abrtd: Package 'lvm2' isn't signed with proper key
Dec 12 15:51:42 hayes-01 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2012-12-12-15:51:40-1876' exited with 1
Dec 12 15:51:42 hayes-01 abrtd: Corrupted or bad directory /var/spool/abrt/ccpp-2012-12-12-15:51:40-1876, deleting


The core was not saved, I'll attempt this again and try to get a back trace.



Version-Release number of selected component (if applicable):
2.6.32-343.el6.x86_64
lvm2-2.02.98-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
lvm2-libs-2.02.98-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
lvm2-cluster-2.02.98-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
udev-147-2.43.el6    BUILT: Thu Oct 11 05:59:38 CDT 2012
device-mapper-1.02.77-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
device-mapper-libs-1.02.77-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
device-mapper-event-1.02.77-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
device-mapper-event-libs-1.02.77-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
cmirror-2.02.98-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012

Comment 1 Corey Marthaler 2012-12-12 22:06:30 UTC

This may be related to bug 886694.

Comment 2 Corey Marthaler 2012-12-13 01:01:10 UTC

Another create failure, no segfault however.

============================================================
Iteration 1 of 10000 started at Wed Dec 12 18:58:23 CST 2012
============================================================
SCENARIO - [poolmetadatasizes]
Create pool volumes of varying pool metadata sizes (--poolmetadatasize)

Creating [4.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot
lvcreate --thinpool POOL_4.00m --poolmetadatasize 4.00m -L 500M snapper_thinp
lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_4.00m -n origin_4.00m
lvcreate -s /dev/snapper_thinp/origin_4.00m -n snap_4.00m

Creating [16.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot
lvcreate --thinpool POOL_16.00m --poolmetadatasize 16.00m -L 500M snapper_thinp
lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_16.00m -n origin_16.00m
  Check of thin pool snapper_thinp/POOL_16.00m failed (status:1). Manual repair required (thin_dump --repair /dev/mapper/snapper_thinp-POOL_16.00m_tmeta)!
  Failed to resume POOL_16.00m.
couldn't create thin origin LV

Comment 3 Zdenek Kabelac 2012-12-13 14:19:26 UTC

Scary.

Comment 2 is likely showing something very bad must have happened
before failing lvcreate command.

Has any removed  thin pool kernel target on that device before ?
(Running  dmsetup remove_all out of context ?)

thin _check is executed only when thin pool is activated - in this case some had to either deactivate pool (lvchange -an) or remove its table line ?
(or the lvcreate command right above failed and created inactive pool ??)

Is the 'hw' you are using for testing ok ?

Comment 5 Zdenek Kabelac 2012-12-17 12:40:50 UTC

Can we get   gdb  backtrace of crashing  lvmetad  and  dmeventd  binaries from this test?

Comment 6 Corey Marthaler 2012-12-19 16:52:49 UTC

My latest attempt to reproduce this (with udev debugging on) ended up causing a deadlock, not a segfault. I'll post the log...

Comment 7 Corey Marthaler 2012-12-19 16:54:33 UTC

Created attachment 666223 [details]
log from thin pool creation deadlock w/ lvmetad running

Comment 8 Corey Marthaler 2012-12-19 21:16:00 UTC

Updated to a newer kernel: 2.6.32-348.el6.x86_64

lvm2-2.02.98-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
lvm2-libs-2.02.98-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
lvm2-cluster-2.02.98-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
udev-147-2.43.el6    BUILT: Thu Oct 11 05:59:38 CDT 2012
device-mapper-1.02.77-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
device-mapper-libs-1.02.77-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
device-mapper-event-1.02.77-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
device-mapper-event-libs-1.02.77-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012
cmirror-2.02.98-5.el6    BUILT: Wed Dec 12 09:18:07 CST 2012



More problems with lvmetad + thinp creation:

# taft-01
Creating [16.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot
lvcreate --thinpool POOL_16.00m --poolmetadatasize 16.00m -L 500M snapper_thinp
lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_16.00m -n origin_16.00m
lvcreate -s /dev/snapper_thinp/origin_16.00m -n snap_16.00m
  device-mapper: create ioctl on snapper_thinp-snap_16.00m failed: Device or resource busy
  Aborting. Failed to activate thin snap_16.00m.
couldn't create snapshot snap_16.00m from origin snapper_thinp/other_16.00m

[root@taft-01 ~]# lvs -a -o +devices
LV                  Attr      LSize   Pool        Origin      Data%   Devices
POOL_16.00m         twi-a-tz- 500.00m                          0.00  POOL_16.00m_tdata(0)
[POOL_16.00m_tdata] Twi-aot-- 500.00m                                /dev/sdc1(125)
[POOL_16.00m_tmeta] ewi-aot--  16.00m                                /dev/sdg1(1)
POOL_4.00m          twi-a-tz- 500.00m                          0.00  POOL_4.00m_tdata(0)
[POOL_4.00m_tdata]  Twi-aot-- 500.00m                                /dev/sdc1(0)
[POOL_4.00m_tmeta]  ewi-aot--   4.00m                                /dev/sdg1(0)
origin_16.00m       Vwi-a-tz- 500.00m POOL_16.00m              0.00
origin_4.00m        Vwi-a-tz- 500.00m POOL_4.00m               0.00
snap_4.00m          Vwi-a-tz- 500.00m POOL_4.00m  origin_4.00m 0.00


# hayes-01
SCENARIO - [poolmetadatasizes]
Create pool volumes of varying pool metadata sizes (--poolmetadatasize)

Creating [4.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot
lvcreate --thinpool POOL_4.00m --poolmetadatasize 4.00m -L 500M snapper_thinp
  device-mapper: resume ioctl on  failed: Invalid argument
  Unable to resume snapper_thinp-POOL_4.00m-tpool (253:5)
  Aborting. Failed to activate thin POOL_4.00m.
  _get_device_info: LVM-CT7bkVXO5vmRIzVxhaUGFonDcrGMWRlNi8UgWTKkwjsvZQtcOBLlYB4ZPHkDxxkw-tpool: device not found
  _get_device_info: LVM-CT7bkVXO5vmRIzVxhaUGFonDcrGMWRlNi8UgWTKkwjsvZQtcOBLlYB4ZPHkDxxkw-tpool: device not found
  snapper_thinp/POOL_4.00m: thin-pool segment monitoring function failed.
couldn't create thinpool

Dec 19 14:21:57 hayes-01 qarshd[2573]: Running cmdline: lvcreate --thinpool POOL_4.00m --poolmetadatasize 4.00m -L 500M snapper_thinp
Dec 19 14:21:59 hayes-01 kernel: device-mapper: thin: Data device (dm-4) discard unsupported: Disabling discard passdown.
Dec 19 14:21:59 hayes-01 kernel: device-mapper: table: 253:6: linear: dm-linear: Device lookup failed
Dec 19 14:22:00 hayes-01 kernel: device-mapper: ioctl: error adding target to table
Dec 19 14:22:00 hayes-01 xinetd[1655]: EXIT: qarsh status=0 pid=2573 duration=3(sec)
Dec 19 14:22:00 hayes-01 kernel: device-mapper: thin: Data device (dm-4) discard unsupported: Disabling discard passdown.
Dec 19 14:22:01 hayes-01 dmeventd[2641]: dmeventd ready for processing.
Dec 19 14:22:01 hayes-01 lvm[2641]: Monitoring thin snapper_thinp-POOL_4.00m-tpool.

Comment 9 Marian Csontos 2012-12-20 16:50:25 UTC

Re: Missing core files: abrt is throwing away core dumps from unsigned packages. Also the configuration option has recently moved. Use

    sed -i '/OpenGPGCheck/ s/yes/no/' /etc/abrt/abrt-action-save-package-data.conf

Comment 10 Zdenek Kabelac 2012-12-20 22:05:10 UTC

Then eliminate  abrtd 

And use this for sysctl


# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1

# set default dir for coredumps
kernel.core_pattern = /tmp/core.%e.%p


(in Fedora it's now located as separate file - i.e.:
/etc/sysctl.d/kernelsysctl.conf)

or just use directly via sysctl.



Also what would be helpful is attached dmesg
and full -vvvv trace of

lvcreate -s /dev/snapper_thinp/origin_16.00m -n snap_16.00m

Comment 11 Marian Csontos 2013-01-07 14:28:55 UTC

This may be a timing issue.

I have seen this in logs several times (not with recent builds) at was always accompanied with failure in dmeventd later on:

    segfault at 2d0 ip 00007f9eda16d670 sp 00007f9eda995dd8 error 4 in libpthread-2.12.so[7f9eda161000+17000]

I do not have the backtrace from lvm process failing, but the dmeventd is dying on timeout when creating new thread - reported as Bug 889651

Another issue is quite a few tests are failing on dmventd being slow to response right after it is spawned by lvconvert - this often takes 10+ seconds, lvconvert command is often reporting event server issue (considered a problem by STS):

    No input from event server.
    centipede-centi_base: event registration failed: Input/output error
    centipede/centi_base: mirror segment monitoring function failed.

The command itself succeeds and eventually the event registration is succesful soon after failure is reported, suggesting this may be a locking issue.

Comment 12 Corey Marthaler 2013-01-09 23:33:57 UTC

This test case is no longer causing any issues with the latest rpms.

2.6.32-353.el6.x86_64

lvm2-2.02.98-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
lvm2-libs-2.02.98-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
lvm2-cluster-2.02.98-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
udev-147-2.43.el6    BUILT: Thu Oct 11 05:59:38 CDT 2012
device-mapper-1.02.77-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
device-mapper-libs-1.02.77-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
device-mapper-event-1.02.77-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
device-mapper-event-libs-1.02.77-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013
cmirror-2.02.98-7.el6    BUILT: Wed Jan  9 03:34:27 CST 2013

Comment 13 Peter Rajnoha 2013-01-10 09:36:20 UTC

(In reply to comment #12)
> This test case is no longer causing any issues with the latest rpms.

Just a note: the only change in latest rpms (not taking into account one lvm1 vs. lvmetad fix) are the fixes for autoactivation to not fire when it should not and addition of proper udev synchronization with "pvscan --cache -aay" that is called from within udev rules to autoactivate volumes.

Comment 14 Peter Rajnoha 2013-01-15 14:47:44 UTC

Closing this bug as it can't be reproduced anymore with latest rpms which include fixes for autoactivation (bug #889454).

If still seen, please, reopen this bug report.

*** This bug has been marked as a duplicate of bug 889454 ***

Note You need to log in before you can comment on or make changes to this bug.