Bug 840831
Summary: | 3.1 - [vdsm] deadlock after prepareForShutdown is called due to several processes of lvm stuck in kernel (<D) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Haim <hateya> | ||||||
Component: | vdsm | Assignee: | Federico Simoncelli <fsimonce> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | vvyazmin <vvyazmin> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 6.3 | CC: | abaron, bazulay, ddumas, fsimonce, hateya, iheim, ilvovsky, lnatapov, lpeer, yeylon, ykaul | ||||||
Target Milestone: | rc | Keywords: | TestOnly | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | storage | ||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-12-05 07:40:59 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Haim
2012-07-17 10:40:17 UTC
Created attachment 598601 [details]
vdsm logs.
Haim. please attach also the /var/log/message log, I saw that it contained several errors related to multipath and iscsi in general. Thanks. Created attachment 598620 [details]
messages
some additional lvm info which might be handy if we'll move to lvm: [root@nott-vds2 ~]# ps -lw `pgrep lvm` F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 4 D 0 24256 24255 0 75 -5 - 8827 blockd ? 0:00 /sbin/lvm vgs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 f 4 D 0 24275 24274 0 75 -5 - 8827 blockd ? 0:00 /sbin/lvm vgs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 f [root@nott-vds2 ~]# lsof |grep lvm lvm 24256 root cwd DIR 253,0 4096 2 / lvm 24256 root rtd DIR 253,0 4096 2 / lvm 24256 root txt REG 253,0 971464 392064 /sbin/lvm lvm 24256 root mem REG 253,0 251112 392437 /lib64/libsepol.so.1 lvm 24256 root mem REG 253,0 229024 392479 /lib64/libdevmapper.so.1.02 lvm 24256 root mem REG 253,0 61496 392478 /lib64/libudev.so.0.5.1 lvm 24256 root mem REG 253,0 272008 392490 /lib64/libreadline.so.6.0 lvm 24256 root mem REG 253,0 156872 392355 /lib64/ld-2.12.so lvm 24256 root mem REG 253,0 22536 392363 /lib64/libdl-2.12.so lvm 24256 root mem REG 253,0 1918016 392356 /lib64/libc-2.12.so lvm 24256 root mem REG 253,0 24000 392158 /lib64/libdevmapper-event.so.1.02 lvm 24256 root mem REG 253,0 124624 392372 /lib64/libselinux.so.1 lvm 24256 root mem REG 253,0 138280 392204 /lib64/libtinfo.so.5.7 lvm 24256 root 0r FIFO 0,8 0t0 361938 pipe lvm 24256 root 1w FIFO 0,8 0t0 361939 pipe lvm 24256 root 2w FIFO 0,8 0t0 361940 pipe lvm 24256 root 3u CHR 10,58 0t0 5671 /dev/mapper/control lvm 24256 root 4rR REG 253,0 0 132526 /var/lock/lvm/V_aa0e6d6e-11ed-42f3-9ca7-d6a4ed86dfa0 lvm 24256 root 5u BLK 69,32 0t0 201979 /dev/sdce lvm 24275 root cwd DIR 253,0 4096 2 / lvm 24275 root rtd DIR 253,0 4096 2 / lvm 24275 root txt REG 253,0 971464 392064 /sbin/lvm lvm 24275 root mem REG 253,0 251112 392437 /lib64/libsepol.so.1 lvm 24275 root mem REG 253,0 229024 392479 /lib64/libdevmapper.so.1.02 lvm 24275 root mem REG 253,0 61496 392478 /lib64/libudev.so.0.5.1 lvm 24275 root mem REG 253,0 272008 392490 /lib64/libreadline.so.6.0 lvm 24275 root mem REG 253,0 156872 392355 /lib64/ld-2.12.so lvm 24275 root mem REG 253,0 22536 392363 /lib64/libdl-2.12.so lvm 24275 root mem REG 253,0 1918016 392356 /lib64/libc-2.12.so lvm 24275 root mem REG 253,0 24000 392158 /lib64/libdevmapper-event.so.1.02 lvm 24275 root mem REG 253,0 124624 392372 /lib64/libselinux.so.1 lvm 24275 root mem REG 253,0 138280 392204 /lib64/libtinfo.so.5.7 lvm 24275 root 0r FIFO 0,8 0t0 362331 pipe lvm 24275 root 1w FIFO 0,8 0t0 362332 pipe lvm 24275 root 2w FIFO 0,8 0t0 362333 pipe lvm 24275 root 3u CHR 10,58 0t0 5671 /dev/mapper/control lvm 24275 root 4rR REG 253,0 0 132526 /var/lock/lvm/V_aa0e6d6e-11ed-42f3-9ca7-d6a4ed86dfa0 lvm 24275 root 5u BLK 69,32 0t0 201979 /dev/sdce Fede, have you taken a look at this? do you know what the issue is? Something similar has been just reproduced putting offline the iscsi target (server side): # tgt-admin --offline iqn.1994-05.com.redhat:ovirt1 VDSM host side: root 22432 1.7 1.1 34592 13692 ? D< 10:46 0:00 /sbin/lvm pvs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ "a%1IET_00010001|1IET_00010002|1IET_00010003|1IET_00010004|1IET_00010005|1IET_00010006|1IET_00010007|1IET_00010008|1IET_00020001|1IET_00020002|1IET_00020003|1IET_00020004|1IET_00020005|1IET_00020006|1IET_00020007%", "r%.*%" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } --noheadings --units b --nosuffix --separator | -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size 0x0000003d1d0dac10 in __open_nocancel () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) Thread 1 (process 22432): #0 0x0000003d1d0dac10 in __open_nocancel () at ../sysdeps/unix/syscall-template.S:82 #1 0x00000000004477eb in open (dev=0x2322658, flags=278528, direct=1, quiet=1) at /usr/include/bits/fcntl2.h:60 #2 dev_open_flags (dev=0x2322658, flags=278528, direct=1, quiet=1) at device/dev-io.c:468 #3 0x000000000044db28 in _passes_lvm_type_device_filter (f=<value optimized out>, dev=0x2322658) at filters/filter.c:157 #4 0x000000000044b514 in _and_p (f=<value optimized out>, dev=0x2322658) at filters/filter-composite.c:26 #5 0x000000000044b93d in _lookup_p (f=<value optimized out>, dev=0x2322658) at filters/filter-persistent.c:308 #6 0x0000000000444b38 in dev_iter_get (iter=0x2314aa0) at device/dev-cache.c:1004 #7 0x000000000043dcb5 in lvmcache_label_scan (cmd=0x17863a0, full_scan=0) at cache/lvmcache.c:695 #8 0x0000000000470f09 in _get_pvs (cmd=0x17863a0, warnings=1, pvslist=0x7fffe9617ed8) at metadata/metadata.c:3705 #9 0x0000000000471473 in get_pvs (cmd=<value optimized out>) at metadata/metadata.c:3769 #10 0x0000000000430406 in process_each_pv (cmd=0x17863a0, argc=0, argv=0x7fffe96184a8, vg=0x0, flags=0, scan_label_only=0, handle=0x1f768e0, process_single_pv=0x42ccd0 <_pvs_single>) at toollib.c:853 #11 0x000000000042c436 in _report (cmd=0x17863a0, argc=0, argv=0x7fffe96184a8, report_type=PVS) at reporter.c:414 #12 0x000000000041fd92 in lvm_run_command (cmd=0x17863a0, argc=0, argv=0x7fffe96184a8) at lvmcmdline.c:1099 #13 0x00000000004228bd in lvm2_main (argc=11, argv=0x7fffe9618450) at lvmcmdline.c:1468 #14 0x0000003d1d01ecdd in __libc_start_main (main=0x438400 <main>, argc=12, ubp_av=0x7fffe9618448, init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=0x7fffe9618438) at libc-start.c:226 #15 0x0000000000413cb9 in _start () On my setup the lvm command got unstuck only after few minutes (~5). On the vdsm side I'm still investigating why the lvm command was executed in the MainThread: MainThread::DEBUG::2012-07-17 15:51:14,423::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\"a%1demo-storage-112338299|3514f0c5d6620000d|3514f0c5d6620000e|3514f0c5d6620000f|3514f0c5d66200010|3514f0c5d66200011|3514f0c5d66200012|3514f0c5d66200013|3514f0c5d66200014|3514f0c5d66200015|3514f0c5d66200016|3514f0c5d66200017|3514f0c5d66200018|3514f0c5d66200019|3514f0c5d6620001a|3514f0c5d6620001b|3514f0c5d6620001c|3514f0c5d6620001d|3514f0c5d6620001e|3514f0c5d6620001f|3514f0c5d66200020|3514f0c5d66200021|3514f0c5d66200022|3514f0c5d66200023|3514f0c5d66200024|3514f0c5d66200025|3514f0c5d66200026|3514f0c5d66200027|3514f0c5d66200028|3514f0c5d66200029|3514f0c5d6620002a|3514f0c5d6620002b|3514f0c5d6620002c|3514f0c5d6620002d|3514f0c5d6620002e|3514f0c5d6620002f|3514f0c5d66200030|3514f0c5d66200031|3514f0c5d66200032|3514f0c5d66200033|3514f0c5d66200034|3514f0c5d66200035|3514f0c5d66200036|3514f0c5d66200037|3514f0c5d66200038|3514f0c5d66200039|3514f0c5d6620003a|3514f0c5d6620003b|3514f0c5d6620003c|3514f0c5d6620003d|3514f0c5d6620003e|3514f0c5d6620003f|3514f0c5d66200040|3514f0c5d66200041|3514f0c5d66200042|3514f0c5d66200043|3514f0c5d66200044|3514f0c5d66200045|3514f0c5d66200046|3514f0c5d66200047|3514f0c5d66200048|3514f0c5d66200049|3514f0c5d6620004a|3514f0c5d6620004b|3514f0c5d6620004c|3514f0c5d6620004d|3514f0c5d6620004e|3514f0c5d6620004f|3514f0c5d66200050|3514f0c5d66200051|3514f0c5d66200052|3514f0c5d66200053|3514f0c5d66200054|3514f0c5d66200055|3514f0c5d66200056|3514f0c5d66200057|3514f0c5d66200058|3514f0c5d66200059|3514f0c5d6620005a|3514f0c5d6620005b|3514f0c5d6620005c|3514f0c5d6620005d|3514f0c5d6620005e|3514f0c5d6620005f|3514f0c5d66200060|3514f0c5d66200061|3514f0c5d66200062|3514f0c5d66200063|3514f0c5d66200064|3514f0c5d66200065|3514f0c5d66200066|3514f0c5d66200067|3514f0c5d66200068|3514f0c5d66200069|3514f0c5d6620006a|3514f0c5d6620006b|3514f0c5d6620006c|3514f0c5d6620006d|3514f0c5d6620006e|3514f0c5d6620006f|3514f0c5d66200070%\\", \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free aa0e6d6e-11ed-42f3-9ca7-d6a4ed86dfa0' (cwd None) On the vdsm side I didn't see any other lvm command in the MainThread since 4.10-0.366 (~August 28 2012), and they were very frequent before. Moving to ON_QA. the bug wasn't reproduced [root@purple-vds3 ~]# rpm -q lvm2 device-mapper-multipath vdsm kernel lvm2-2.02.95-10.el6.x86_64 device-mapper-multipath-0.4.9-56.el6.x86_64 vdsm-4.9.6-39.0.el6_3.x86_64 kernel-2.6.32-274.el6.x86_64 kernel-2.6.32-279.el6.x86_64 kernel-2.6.32-279.15.1.el6.x86_64 After blocking connection to SD prepare for shutdown called and succeeded. MainThread::DEBUG::2012-10-29 16:30:57,313::clientIF::148::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently MainThread::DEBUG::2012-10-29 16:30:58,313::clientIF::148::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently MainThread::DEBUG::2012-10-29 16:30:59,313::clientIF::148::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently MainThread::INFO::2012-10-29 16:31:00,446::vdsm::70::vds::(run) I am the actual vdsm 4.9-39.0 |