Hide Forgot
Description of problem: have a machine with Iscsi storage connected to several luns, and some lvm commands are stuck in S< for the last 4-5 hours now. this state stuck our hypervisor management system called 'VDSM'. from short debugging cycle held with mbroz, it appears that one process was blocked in semaphore (and other waiting for it in flock(), and udevcomplete_all unblocked it. it also appears that we had lots of cookies\semaphore under dmsetup udevcookies. [root@nott-vds3 ~]# dmsetup udevcookies cookie semid value last_semop_time 0xd4d1360 114720771 1 Wed Jul 6 18:13:44 2011 0xd4d91b8 114753540 1 Wed Jul 6 18:13:44 2011 0xd4de72c 321388549 2 Thu Jul 14 16:48:08 2011 0xd4d79ba 114917382 1 Wed Jul 6 18:14:00 2011 0xd4d421d 272072711 1 Mon Jul 11 14:43:46 2011 0xd4d952e 280821768 1 Mon Jul 11 15:47:15 2011 0xd4d8465 288587785 1 Mon Jul 11 16:36:44 2011 0xd4d8fa5 321421322 1 Thu Jul 14 18:21:06 2011 0xd4df60f 321454091 2 Sun Jul 17 11:42:52 2011 0xd4d8b03 321486860 2 Sun Jul 17 12:10:32 2011 0xd4d7c6d 321519629 26 Sun Jul 17 17:58:12 2011 root@nott-vds3 ~]# ps -w `pgrep lvm` PID TTY STAT TIME COMMAND 1619 ? S< 0:00 /sbin/lvm lvchange --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 filter = [ "a%1Daffi1309719|1Dafna-Export1308598|1EXPORT1310355|1EXP_Dom_Bckup|1Exp-Daf 1931 ? S< 0:00 /sbin/lvm vgck --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 filter = [ "a%1Daffi1309719|1Dafna-Export1308598|1EXPORT1310355|1EXP_Dom_Bckup|1Exp-Daf1310 8895 ? S< 0:00 /sbin/lvm lvchange --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 filter = [ "a%/dev/mapper/1Daffi1309719|/dev/mapper/1Dafna-Export1308598|/dev/mapper/1E 10774 ? S< 0:00 /sbin/lvm pvs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 filter = [ "a%1Daffi1309719|1Dafna-Export1308598|1EXPORT1310355|1EXP_Dom_Bckup|1Exp-Daf13100 11433 ? S< 0:00 /sbin/lvm pvs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 filter = [ "a%1Daffi1309719|1Dafna-Export1308598|1EXPORT1310355|1EXP_Dom_Bckup|1Exp-Daf13100 17550 ? S< 0:00 /sbin/lvm vgck --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 filter = [ "a%1Daffi1309719|1Dafna-Export1308598|1EXPORT1310355|1EXP_Dom_Bckup|1Exp-Daf1310 19435 ? S< 0:00 /sbin/lvm vgck --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 filter = [ "a%1Daffi1309719|1Dafna-Export1308598|1EXPORT1310355|1EXP_Dom_Bckup|1Exp-Daf1310 1619 [<ffffffff811bbd8d>] flock_lock_file_wait+0x18d/0x350 [<ffffffff811bc10b>] sys_flock+0x1bb/0x1d0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff 1931 [<ffffffff811bbd8d>] flock_lock_file_wait+0x18d/0x350 [<ffffffff811bc10b>] sys_flock+0x1bb/0x1d0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff 8895 [<ffffffff811f7c15>] sys_semtimedop+0x725/0x8b0 [<ffffffff811f7db0>] sys_semop+0x10/0x20 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff 10774 [<ffffffff811bbd8d>] flock_lock_file_wait+0x18d/0x350 [<ffffffff811bc10b>] sys_flock+0x1bb/0x1d0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff 11433 [<ffffffff811bbd8d>] flock_lock_file_wait+0x18d/0x350 [<ffffffff811bc10b>] sys_flock+0x1bb/0x1d0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff 17550 [<ffffffff811bbd8d>] flock_lock_file_wait+0x18d/0x350 [<ffffffff811bc10b>] sys_flock+0x1bb/0x1d0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff 19435 [<ffffffff811bbd8d>] flock_lock_file_wait+0x18d/0x350 [<ffffffff811bc10b>] sys_flock+0x1bb/0x1d0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff currently there is no reproducer, but i will try to create one (script).
(In reply to comment #0) > currently there is no reproducer, but i will try to create one (script). If you manage to reproduce that, please try to grab the -vvvv debug log as well. Thanks.
I think i'v managed to reproduce using the following script: #! /usr/bin/python import subprocess import commands import os def attachPid(): pids = commands.getoutput('pgrep lvm') if not pids: print "No LVM process found" attachPid() else: pids = pids.split('\n') for pid in pids: pid = int(pid) os.kill(pid, 19) print "LVM process found (%s)- SIGSTOP sent to it" % pid attachPid() please run this script, along with several heavy lvm commands, and leave it for some time to run. [root@rhev-a8c-02 tmp]# ps -ww `pgrep lvm` PID TTY STAT TIME COMMAND 20728 ? Z< 0:00 [lvm] <defunct>
This reproducer (comment #3) make no sense to me. You are using os.kill(pid, 19) - this send SIGSTOP/SIGCONT, and of course causes program to stop. This is not a bug. What do you want to try do to here? What do you expect from SIGSTOP sent to lvm? man 2 signal: SIGCONT 19,18,25 Cont Continue if stopped SIGSTOP 17,19,23 Stop Stop process The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.
(err, man 7 signal)
If it is problem we tried to debug some time agi, there were some active lvm semaphores. But I think this was relict of some previous testing or upgrade. (For reproducer above - it was stopped, so obviously some semaphores were still present, but after "pkill -SIGCONT lvm" it simply continues, so this was not the former observed problem.)