Bug 847083 - [vdsm] deadlock in supervdsm fails createStorageDomain command (process goes into defunct state)
Summary: [vdsm] deadlock in supervdsm fails createStorageDomain command (process goes ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Saggi Mizrahi
QA Contact: Haim
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-09 16:12 UTC by Haim
Modified: 2014-01-13 00:53 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-03-06 22:14:25 UTC
oVirt Team: ---
Embargoed:


Attachments (Terms of Use)
engine + vdsm logs (312.19 KB, application/zip)
2012-08-09 16:15 UTC, Haim
no flags Details

Description Haim 2012-08-09 16:12:17 UTC
Description of problem:

we have a case (http://lists.ovirt.org/pipermail/users/2012-August/003330.html) where an attempt to createStorageDomain using NFS connection always fail since there is some kind of lock around supervdsm which fails the creating of persistent dictionary (in the logs you see that thread goes to sleep for around 50 seconds) and later on dies within the oop handling.  

restart of vdsmd doesn't seem to solve the problem. 
NFS connection looks o.k as I manage to mount it and perform IO on it.

[root@host1 ~]# ps fax
  PID TTY      STAT   TIME COMMAND
    2 ?        S      0:00 [kthreadd]
    3 ?        S      0:00  \_ [ksoftirqd/0]
    6 ?        S      0:00  \_ [migration/0]
    7 ?        S      0:00  \_ [watchdog/0]
    8 ?        S      0:00  \_ [migration/1]
   10 ?        S      0:00  \_ [ksoftirqd/1]
   11 ?        S      0:00  \_ [watchdog/1]
   12 ?        S      0:00  \_ [migration/2]
   14 ?        S      0:00  \_ [ksoftirqd/2]
   15 ?        S      0:00  \_ [watchdog/2]
   16 ?        S      0:00  \_ [migration/3]
   18 ?        S      0:00  \_ [ksoftirqd/3]
   19 ?        S      0:00  \_ [watchdog/3]
   20 ?        S      0:00  \_ [migration/4]
   22 ?        S      0:00  \_ [ksoftirqd/4]
   23 ?        S      0:00  \_ [watchdog/4]
   24 ?        S      0:00  \_ [migration/5]
   26 ?        S      0:00  \_ [ksoftirqd/5]
   27 ?        S      0:00  \_ [watchdog/5]
   28 ?        S      0:00  \_ [migration/6]
   30 ?        S      0:00  \_ [ksoftirqd/6]
   31 ?        S      0:01  \_ [watchdog/6]
   32 ?        S      0:00  \_ [migration/7]
   34 ?        S      0:00  \_ [ksoftirqd/7]
   35 ?        S      0:00  \_ [watchdog/7]
   36 ?        S<     0:00  \_ [cpuset]
   37 ?        S<     0:00  \_ [khelper]
   38 ?        S      0:00  \_ [kdevtmpfs]
   39 ?        S<     0:00  \_ [netns]
   40 ?        S      0:00  \_ [sync_supers]
   41 ?        S      0:00  \_ [bdi-default]
   42 ?        S<     0:00  \_ [kintegrityd]
   43 ?        S<     0:00  \_ [kblockd]
   44 ?        S<     0:00  \_ [ata_sff]
   45 ?        S      0:00  \_ [khubd]
   46 ?        S<     0:00  \_ [md]
   52 ?        S      0:00  \_ [kworker/5:1]
   53 ?        S      0:00  \_ [kworker/6:1]
   54 ?        S      0:01  \_ [kworker/7:1]
   56 ?        S      0:00  \_ [kswapd0]
   57 ?        SN     0:00  \_ [ksmd]
   58 ?        SN     0:00  \_ [khugepaged]
   59 ?        S      0:00  \_ [fsnotify_mark]
   60 ?        S<     0:00  \_ [crypto]
   66 ?        S<     0:00  \_ [kthrotld]
   69 ?        S      0:00  \_ [scsi_eh_0]
   70 ?        S      0:00  \_ [scsi_eh_1]
   71 ?        S      0:00  \_ [scsi_eh_2]
   72 ?        S      0:00  \_ [scsi_eh_3]
   73 ?        S      0:00  \_ [scsi_eh_4]
   74 ?        S      0:00  \_ [scsi_eh_5]
   77 ?        S      0:02  \_ [kworker/u:4]
   78 ?        S      0:01  \_ [kworker/u:5]
   80 ?        S<     0:00  \_ [kpsmoused]
   82 ?        S<     0:00  \_ [deferwq]
  157 ?        S<     0:00  \_ [rpciod]
  175 ?        S<     0:00  \_ [kmpathd]
  177 ?        S<     0:00  \_ [kmpath_handlerd]
  188 ?        S<     0:00  \_ [kdmflush]
  211 ?        S<     0:00  \_ [ttm_swap]
  220 ?        S<     0:00  \_ [kdmflush]
  222 ?        S<     0:00  \_ [kdmflush]
  224 ?        S<     0:00  \_ [kdmflush]
  361 ?        S<     0:00  \_ [ext4-dio-unwrit]
  364 ?        S<     0:00  \_ [loop0]
  371 ?        S<     0:06  \_ [loop1]
  381 ?        S<     0:00  \_ [loop2]
  385 ?        S<     0:00  \_ [kdmflush]
  387 ?        S<     0:00  \_ [ksnaphd]
  388 ?        S<     0:00  \_ [kcopyd]
  400 ?        S<     0:00  \_ [ext4-dio-unwrit]
  452 ?        S      0:00  \_ [kauditd]
  487 ?        S<     0:00  \_ [kvm-irqfd-clean]
  609 ?        S<     0:00  \_ [edac-poller]
  711 ?        S<     0:00  \_ [kdmflush]
  712 ?        S<     0:00  \_ [kdmflush]
  714 ?        S<     0:00  \_ [kdmflush]
  716 ?        S<     0:00  \_ [kdmflush]
  829 ?        S      0:00  \_ [jbd2/dm-7-8]
  830 ?        S<     0:00  \_ [ext4-dio-unwrit]
 1006 ?        S<     0:00  \_ [iscsi_eh]
 1068 ?        S<     0:00  \_ [ib_mcast]
 1069 ?        S<     0:00  \_ [ib_cm]
 1094 ?        S<     0:00  \_ [iw_cm_wq]
 1098 ?        S<     0:00  \_ [ib_addr]
 1121 ?        S<     0:00  \_ [rdma_cm]
 1155 ?        S      0:00  \_ [jbd2/dm-8-8]
 1156 ?        S<     0:00  \_ [ext4-dio-unwrit]
 1157 ?        S<     0:00  \_ [cxgb4]
 1158 ?        S      0:00  \_ [jbd2/dm-9-8]
 1159 ?        S<     0:00  \_ [ext4-dio-unwrit]
 1179 ?        S<     0:00  \_ [cnic_wq]
 1188 ?        S<     0:00  \_ [bnx2i_thread/0]
 1189 ?        S<     0:00  \_ [bnx2i_thread/1]
 1190 ?        S<     0:00  \_ [bnx2i_thread/2]
 1191 ?        S<     0:00  \_ [bnx2i_thread/3]
 1193 ?        S<     0:00  \_ [bnx2i_thread/4]
 1194 ?        S<     0:00  \_ [bnx2i_thread/5]
 1195 ?        S<     0:00  \_ [bnx2i_thread/6]
 1196 ?        S<     0:00  \_ [bnx2i_thread/7]
 1618 ?        S      0:00  \_ [flush-253:8]
 2549 ?        S<     0:00  \_ [bond0]
 2553 ?        S<     0:00  \_ [bond1]
 2557 ?        S<     0:00  \_ [bond2]
 2563 ?        S<     0:00  \_ [bond3]
 2572 ?        S<     0:00  \_ [bond4]
 2895 ?        S<     0:00  \_ [kdmflush]
 3528 ?        S<     0:00  \_ [nfsiod]
 4247 ?        S      0:01  \_ [kworker/3:3]
 4463 ?        S      0:01  \_ [kworker/4:0]
18433 ?        S      0:00  \_ [kworker/0:1]
21092 ?        S      0:00  \_ [kworker/2:0]
22599 ?        S      0:00  \_ [kworker/2:1]
22739 ?        S      0:00  \_ [kworker/4:1]
23048 ?        S      0:00  \_ [kworker/1:1]
23054 ?        S      0:00  \_ [kworker/0:2]
23658 ?        S      0:00  \_ [kworker/3:1]
23660 ?        S      0:00  \_ [kworker/6:0]
23800 ?        S      0:00  \_ [kworker/1:2]
23944 ?        S      0:00  \_ [kworker/7:2]
24853 ?        S      0:00  \_ [kworker/5:2]
25998 ?        S      0:00  \_ [kworker/1:0]
26006 ?        S      0:00  \_ [kworker/2:2]
    1 ?        Ss     0:01 /usr/lib/systemd/systemd
  451 ?        Ss     0:00 /usr/lib/systemd/systemd-journald
  453 ?        Ss     0:00 /usr/lib/udev/udevd
 2749 ?        S      0:00  \_ /usr/lib/udev/udevd
 3530 ?        S      0:00  \_ /usr/lib/udev/udevd
  907 ?        S<sl   0:00 /sbin/auditd -n
  914 ?        Ssl    0:05 /usr/sbin/libvirtd --listen
  931 ?        Ss     0:00 /usr/lib/systemd/systemd-logind
  932 ?        Ssl    0:00 /sbin/rsyslogd -n -c 5
  936 ?        Ss     0:00 /usr/sbin/acpid
  955 ?        Ss     0:00 /usr/sbin/crond -n
  973 ?        Ss     0:01 /usr/sbin/irqbalance
  981 ?        Ssl    0:00 /bin/dbus-daemon --system --address=systemd: --nofork --systemd-activation
  986 ?        S      0:00 /bin/bash /usr/sbin/ksmtuned
26045 ?        S      0:00  \_ sleep 60
 1127 ?        Ssl    0:01 /usr/sbin/collectd -C /etc/collectd.conf -f
 1135 ?        Ss     0:20 /usr/sbin/snmpd -LS0-6d -f
 1331 ?        Ssl    0:00 iscsiuio
 1337 ?        Ss     0:00 iscsid
 1338 ?        S<Ls   0:01 iscsid
 1372 ?        Ss     0:00 /usr/sbin/anytermd -c sudo /usr/bin/virsh console %p -p 81 -u anyterm -s UTF8
 1941 ?        Ss     0:00 /usr/sbin/sshd -D
21468 ?        Ss     0:00  \_ sshd: root@pts/2    
21473 pts/2    Ss     0:00  |   \_ -bash
21548 pts/2    S+     0:00  |       \_ tail -20f vdsm.log
22214 ?        Ss     0:00  \_ sshd: root@pts/3    
22235 pts/3    Ss     0:00      \_ -bash
22918 pts/3    T      0:00          \_ less /var/log/vdsm/vdsm.log
26076 pts/3    R+     0:00          \_ ps fax
 2439 tty1     Ss+    0:00 /sbin/agetty --noclear tty1 38400
 2458 ?        Ss     0:00 /usr/sbin/ntpd -u ntp:ntp -g
 2486 ?        SLs    0:00 wdmd -G sanlock
 2855 ?        SLl    0:01 /sbin/multipathd
24929 ?        S<     0:00 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/respawn.pid /usr/share/vdsm/vdsm
24932 ?        S<l    0:05  \_ /usr/bin/python /usr/share/vdsm/vdsm
24952 ?        S<     0:00      \_ /usr/bin/sudo -n /usr/bin/python /usr/share/vdsm/supervdsmServer.pyc e531e66d-0621-412e-8ccf-8186050235ce 24932
24953 ?        S<l    0:00      |   \_ /usr/bin/python /usr/share/vdsm/supervdsmServer.pyc e531e66d-0621-412e-8ccf-8186050235ce 24932
25522 ?        Z<     0:00      |       \_ [python] <defunct>
25321 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25322 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25324 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25326 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25327 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25329 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25332 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25334 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25336 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25339 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25340 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25342 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25344 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25347 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25348 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25350 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25352 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25354 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25357 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25358 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25360 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25361 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25363 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25365 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25367 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25370 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25372 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25374 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25376 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25378 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25380 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25381 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25383 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25385 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25387 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25390 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25392 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25395 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25396 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25398 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25400 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25401 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25403 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25405 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25407 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25409 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25411 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25413 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25414 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25416 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25419 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25420 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25423 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25426 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25428 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25431 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25433 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25435 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25436 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25438 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25440 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25442 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25445 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25447 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25448 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25451 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25452 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25454 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25455 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25458 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25459 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25461 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25464 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25466 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25468 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25471 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25472 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25474 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25476 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25478 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25480 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25482 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25485 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25487 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25490 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25492 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25494 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25495 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25497 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25499 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25500 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25502 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25505 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25509 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25510 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25512 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25513 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25515 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm
25517 ?        S<     0:00      \_ /usr/bin/python /usr/share/vdsm/vdsm

vdsm-4.10.0-6.fc17.x86_64
vdsm-cli-4.10.0-6.fc17.noarch
vdsm-python-4.10.0-6.fc17.x86_64
vdsm-reg-4.10.0-6.fc17.noarch
vdsm-xmlrpc-4.10.0-6.fc17.noarch

Comment 1 Haim 2012-08-09 16:15:08 UTC
Created attachment 603301 [details]
engine + vdsm logs

Comment 2 Itamar Heim 2012-08-09 21:51:00 UTC
kernel version in log seems to match bug 845660?

Comment 3 Haim 2012-08-12 06:05:44 UTC
(In reply to comment #2)
> kernel version in log seems to match bug 845660?

I cannot really reproduce since I don't have the environment, maybe Justin can, in any case, what is the question ? do you imply that it's not vdsm fault but the kernel ? the mount point seem valid, I manage to perform I\O to it manually, so it doesn't seem right.

Comment 4 Justin Clift 2013-03-05 15:15:59 UTC
I think this can be closed as a dup of BZ #845660, which has been resolved.


Note You need to log in before you can comment on or make changes to this bug.