Bug 886695
| Summary: | lvmetad causes issues when attempting to create thinp volumes | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> | ||||
| Component: | lvm2 | Assignee: | Zdenek Kabelac <zkabelac> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 6.4 | CC: | agk, dwysocha, heinzm, jbrassow, mcsontos, msnitzer, prajnoha, prockai, thornber, zkabelac | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-01-15 14:47:44 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
This may be related to bug 886694. Another create failure, no segfault however. ============================================================ Iteration 1 of 10000 started at Wed Dec 12 18:58:23 CST 2012 ============================================================ SCENARIO - [poolmetadatasizes] Create pool volumes of varying pool metadata sizes (--poolmetadatasize) Creating [4.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot lvcreate --thinpool POOL_4.00m --poolmetadatasize 4.00m -L 500M snapper_thinp lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_4.00m -n origin_4.00m lvcreate -s /dev/snapper_thinp/origin_4.00m -n snap_4.00m Creating [16.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot lvcreate --thinpool POOL_16.00m --poolmetadatasize 16.00m -L 500M snapper_thinp lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_16.00m -n origin_16.00m Check of thin pool snapper_thinp/POOL_16.00m failed (status:1). Manual repair required (thin_dump --repair /dev/mapper/snapper_thinp-POOL_16.00m_tmeta)! Failed to resume POOL_16.00m. couldn't create thin origin LV Scary. Comment 2 is likely showing something very bad must have happened before failing lvcreate command. Has any removed thin pool kernel target on that device before ? (Running dmsetup remove_all out of context ?) thin _check is executed only when thin pool is activated - in this case some had to either deactivate pool (lvchange -an) or remove its table line ? (or the lvcreate command right above failed and created inactive pool ??) Is the 'hw' you are using for testing ok ? Can we get gdb backtrace of crashing lvmetad and dmeventd binaries from this test? My latest attempt to reproduce this (with udev debugging on) ended up causing a deadlock, not a segfault. I'll post the log... Created attachment 666223 [details]
log from thin pool creation deadlock w/ lvmetad running
Updated to a newer kernel: 2.6.32-348.el6.x86_64 lvm2-2.02.98-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 lvm2-libs-2.02.98-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 lvm2-cluster-2.02.98-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 udev-147-2.43.el6 BUILT: Thu Oct 11 05:59:38 CDT 2012 device-mapper-1.02.77-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 device-mapper-libs-1.02.77-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 device-mapper-event-1.02.77-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 device-mapper-event-libs-1.02.77-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 cmirror-2.02.98-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 More problems with lvmetad + thinp creation: # taft-01 Creating [16.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot lvcreate --thinpool POOL_16.00m --poolmetadatasize 16.00m -L 500M snapper_thinp lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_16.00m -n origin_16.00m lvcreate -s /dev/snapper_thinp/origin_16.00m -n snap_16.00m device-mapper: create ioctl on snapper_thinp-snap_16.00m failed: Device or resource busy Aborting. Failed to activate thin snap_16.00m. couldn't create snapshot snap_16.00m from origin snapper_thinp/other_16.00m [root@taft-01 ~]# lvs -a -o +devices LV Attr LSize Pool Origin Data% Devices POOL_16.00m twi-a-tz- 500.00m 0.00 POOL_16.00m_tdata(0) [POOL_16.00m_tdata] Twi-aot-- 500.00m /dev/sdc1(125) [POOL_16.00m_tmeta] ewi-aot-- 16.00m /dev/sdg1(1) POOL_4.00m twi-a-tz- 500.00m 0.00 POOL_4.00m_tdata(0) [POOL_4.00m_tdata] Twi-aot-- 500.00m /dev/sdc1(0) [POOL_4.00m_tmeta] ewi-aot-- 4.00m /dev/sdg1(0) origin_16.00m Vwi-a-tz- 500.00m POOL_16.00m 0.00 origin_4.00m Vwi-a-tz- 500.00m POOL_4.00m 0.00 snap_4.00m Vwi-a-tz- 500.00m POOL_4.00m origin_4.00m 0.00 # hayes-01 SCENARIO - [poolmetadatasizes] Create pool volumes of varying pool metadata sizes (--poolmetadatasize) Creating [4.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot lvcreate --thinpool POOL_4.00m --poolmetadatasize 4.00m -L 500M snapper_thinp device-mapper: resume ioctl on failed: Invalid argument Unable to resume snapper_thinp-POOL_4.00m-tpool (253:5) Aborting. Failed to activate thin POOL_4.00m. _get_device_info: LVM-CT7bkVXO5vmRIzVxhaUGFonDcrGMWRlNi8UgWTKkwjsvZQtcOBLlYB4ZPHkDxxkw-tpool: device not found _get_device_info: LVM-CT7bkVXO5vmRIzVxhaUGFonDcrGMWRlNi8UgWTKkwjsvZQtcOBLlYB4ZPHkDxxkw-tpool: device not found snapper_thinp/POOL_4.00m: thin-pool segment monitoring function failed. couldn't create thinpool Dec 19 14:21:57 hayes-01 qarshd[2573]: Running cmdline: lvcreate --thinpool POOL_4.00m --poolmetadatasize 4.00m -L 500M snapper_thinp Dec 19 14:21:59 hayes-01 kernel: device-mapper: thin: Data device (dm-4) discard unsupported: Disabling discard passdown. Dec 19 14:21:59 hayes-01 kernel: device-mapper: table: 253:6: linear: dm-linear: Device lookup failed Dec 19 14:22:00 hayes-01 kernel: device-mapper: ioctl: error adding target to table Dec 19 14:22:00 hayes-01 xinetd[1655]: EXIT: qarsh status=0 pid=2573 duration=3(sec) Dec 19 14:22:00 hayes-01 kernel: device-mapper: thin: Data device (dm-4) discard unsupported: Disabling discard passdown. Dec 19 14:22:01 hayes-01 dmeventd[2641]: dmeventd ready for processing. Dec 19 14:22:01 hayes-01 lvm[2641]: Monitoring thin snapper_thinp-POOL_4.00m-tpool. Re: Missing core files: abrt is throwing away core dumps from unsigned packages. Also the configuration option has recently moved. Use
sed -i '/OpenGPGCheck/ s/yes/no/' /etc/abrt/abrt-action-save-package-data.conf
Then eliminate abrtd And use this for sysctl # Controls whether core dumps will append the PID to the core filename. # Useful for debugging multi-threaded applications. kernel.core_uses_pid = 1 # set default dir for coredumps kernel.core_pattern = /tmp/core.%e.%p (in Fedora it's now located as separate file - i.e.: /etc/sysctl.d/kernelsysctl.conf) or just use directly via sysctl. Also what would be helpful is attached dmesg and full -vvvv trace of lvcreate -s /dev/snapper_thinp/origin_16.00m -n snap_16.00m This may be a timing issue.
I have seen this in logs several times (not with recent builds) at was always accompanied with failure in dmeventd later on:
segfault at 2d0 ip 00007f9eda16d670 sp 00007f9eda995dd8 error 4 in libpthread-2.12.so[7f9eda161000+17000]
I do not have the backtrace from lvm process failing, but the dmeventd is dying on timeout when creating new thread - reported as Bug 889651
Another issue is quite a few tests are failing on dmventd being slow to response right after it is spawned by lvconvert - this often takes 10+ seconds, lvconvert command is often reporting event server issue (considered a problem by STS):
No input from event server.
centipede-centi_base: event registration failed: Input/output error
centipede/centi_base: mirror segment monitoring function failed.
The command itself succeeds and eventually the event registration is succesful soon after failure is reported, suggesting this may be a locking issue.
This test case is no longer causing any issues with the latest rpms. 2.6.32-353.el6.x86_64 lvm2-2.02.98-7.el6 BUILT: Wed Jan 9 03:34:27 CST 2013 lvm2-libs-2.02.98-7.el6 BUILT: Wed Jan 9 03:34:27 CST 2013 lvm2-cluster-2.02.98-7.el6 BUILT: Wed Jan 9 03:34:27 CST 2013 udev-147-2.43.el6 BUILT: Thu Oct 11 05:59:38 CDT 2012 device-mapper-1.02.77-7.el6 BUILT: Wed Jan 9 03:34:27 CST 2013 device-mapper-libs-1.02.77-7.el6 BUILT: Wed Jan 9 03:34:27 CST 2013 device-mapper-event-1.02.77-7.el6 BUILT: Wed Jan 9 03:34:27 CST 2013 device-mapper-event-libs-1.02.77-7.el6 BUILT: Wed Jan 9 03:34:27 CST 2013 cmirror-2.02.98-7.el6 BUILT: Wed Jan 9 03:34:27 CST 2013 (In reply to comment #12) > This test case is no longer causing any issues with the latest rpms. Just a note: the only change in latest rpms (not taking into account one lvm1 vs. lvmetad fix) are the fixes for autoactivation to not fire when it should not and addition of proper udev synchronization with "pvscan --cache -aay" that is called from within udev rules to autoactivate volumes. Closing this bug as it can't be reproduced anymore with latest rpms which include fixes for autoactivation (bug #889454). If still seen, please, reopen this bug report. *** This bug has been marked as a duplicate of bug 889454 *** |
Description of problem: [cmarthal@silver bin]$ ./snapper_thinp -l /home/msp/cmarthal/work/sts/sts-root -r /usr/tests/sts-rhel6.4 -o hayes-01 -e poolmetadatasizes -i 10 1 disk(s) to be used: hayes-01=/dev/etherd/e1.1 on hayes-01... dicing /dev/etherd/e1.1 into 2... re-reading disks on hayes-01... Zeroing out the new partitions.../dev/etherd/e1.1p1.../dev/etherd/e1.1p2... creating lvm devices... Create 2 PV(s) for snapper_thinp on hayes-01 Create VG snapper_thinp on hayes-01 ============================================================ Iteration 1 of 10 started at Wed Dec 12 15:50:55 CST 2012 ============================================================ SCENARIO - [poolmetadatasizes] Create pool volumes of varying pool metadata sizes (--poolmetadatasize) Creating [4.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot lvcreate --thinpool POOL_4.00m --poolmetadatasize 4.00m -L 500M snapper_thinp lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_4.00m -n origin_4.00m lvcreate -s /dev/snapper_thinp/origin_4.00m -n snap_4.00m [...] Creating [256.00m] MDA thinpool and corresponding thin virtual volume and thin snapshot lvcreate --thinpool POOL_256.00m --poolmetadatasize 256.00m -L 500M snapper_thinp lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_256.00m -n origin_256.00m Request to update VG snapper_thinp in lvmetad gave response Connection reset by peer. couldn't create thin origin LV real 1m52.533s user 0m0.197s sys 0m0.129s Dec 12 15:51:36 hayes-01 qarshd[3072]: Running cmdline: lvcreate --thinpool POOL_256.00m --poolmetadatasize 256.00m -L 500M snapper_thinp Dec 12 15:51:39 hayes-01 kernel: device-mapper: thin: Data device (dm-34) discard unsupported: Disabling discard passdown. Dec 12 15:51:39 hayes-01 lvm[1921]: Monitoring thin snapper_thinp-POOL_256.00m-tpool. Dec 12 15:51:40 hayes-01 xinetd[1597]: EXIT: qarsh status=0 pid=3072 duration=4(sec) Dec 12 15:51:40 hayes-01 xinetd[1597]: START: qarsh pid=3106 from=::ffff:10.15.80.47 Dec 12 15:51:40 hayes-01 qarshd[3106]: Talking to peer 10.15.80.47:42631 (IPv4) Dec 12 15:51:40 hayes-01 qarshd[3106]: Running cmdline: lvcreate --virtualsize 500M --thinpool snapper_thinp/POOL_256.00m -n origin_256.00m Dec 12 15:51:40 hayes-01 kernel: lvmetad[3105] general protection ip:3d26076205 sp:7fd57abafa80 error:0 in libc-2.12.so[3d26000000+18a000] Dec 12 15:51:41 hayes-01 abrtd: Directory 'ccpp-2012-12-12-15:51:40-1876' creation detected Dec 12 15:51:41 hayes-01 abrt[3109]: Saved core dump of pid 1876 (/sbin/lvmetad) to /var/spool/abrt/ccpp-2012-12-12-15:51:40-1876 (47603712 bytes) Dec 12 15:51:41 hayes-01 udevd-work[2157]: '/sbin/lvm pvscan --cache --activate ay --major 152 --minor 273' unexpected exit with status 0x000d Dec 12 15:51:41 hayes-01 xinetd[1597]: EXIT: qarsh status=0 pid=3106 duration=1(sec) Dec 12 15:51:42 hayes-01 abrtd: Package 'lvm2' isn't signed with proper key Dec 12 15:51:42 hayes-01 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2012-12-12-15:51:40-1876' exited with 1 Dec 12 15:51:42 hayes-01 abrtd: Corrupted or bad directory /var/spool/abrt/ccpp-2012-12-12-15:51:40-1876, deleting The core was not saved, I'll attempt this again and try to get a back trace. Version-Release number of selected component (if applicable): 2.6.32-343.el6.x86_64 lvm2-2.02.98-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 lvm2-libs-2.02.98-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 lvm2-cluster-2.02.98-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 udev-147-2.43.el6 BUILT: Thu Oct 11 05:59:38 CDT 2012 device-mapper-1.02.77-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 device-mapper-libs-1.02.77-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 device-mapper-event-1.02.77-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 device-mapper-event-libs-1.02.77-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012 cmirror-2.02.98-5.el6 BUILT: Wed Dec 12 09:18:07 CST 2012