Bug 737129 - occasional lockup of kvm guests under heavy i/o load
Summary: occasional lockup of kvm guests under heavy i/o load
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Paolo Bonzini
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-09 17:13 UTC by Kevin Fenzi
Modified: 2014-05-09 15:00 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-05-09 15:00:43 UTC


Attachments (Terms of Use)

Description Kevin Fenzi 2011-09-09 17:13:43 UTC
Setup: 

rhel6.1 virtual host with disks in a raid5. 
kvm guests (also rhel6.1) using lvm on the raid5. 

Occasionally (we have seen this twice in the last few months), one of the guests will stop responding to disk access. You can login to it, but any command hangs. Other guests on the same host seem unaffected. Doing a destroy of the affected guest and restarting it results in it booting up and then behaving the same way. The entire virtual host needs to be rebooted before that guest can resume normal operations.

The affected guest is a log host and is often writing lots of logs. 
Additionally, it's always happened around when the 4am cron jobs run, so other guests are compressing their logs and such, resulting in more i/o for the host. 

On the host in dmesg we see: 

INFO: task qemu-kvm:16991 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-kvm      D 0000000000000006     0 16991      1 0x00000080
 ffff88065a267b08 0000000000000082 ffff88065a267ac8 ffffffffa00040bc
 ffff88065a267ad8 0000000094c638e2 ffff88065a267af8 ffff8805ebd19180
 ffff8808e95c5a78 ffff88065a267fd8 000000000000f598 ffff8808e95c5a78
Call Trace:
 [<ffffffffa00040bc>] ? dm_table_unplug_all+0x5c/0xd0 [dm_mod]
 [<ffffffff81098c99>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff814db3c3>] io_schedule+0x73/0xc0
 [<ffffffff811abfde>] __blockdev_direct_IO+0x70e/0xc40
 [<ffffffff814dac27>] ? thread_return+0x4e/0x777
 [<ffffffff81206052>] ? security_inode_getsecurity+0x22/0x30
 [<ffffffff811a9c27>] blkdev_direct_IO+0x57/0x60
 [<ffffffff811a8df0>] ? blkdev_get_blocks+0x0/0xc0
 [<ffffffff8110de72>] generic_file_direct_write+0xc2/0x190
 [<ffffffff8110f665>] __generic_file_aio_write+0x345/0x480
 [<ffffffff811a937c>] blkdev_aio_write+0x3c/0xa0
 [<ffffffff8117244a>] do_sync_write+0xfa/0x140
 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81211bef>] ? selinux_file_permission+0xbf/0x150
 [<ffffffff81205096>] ? security_file_permission+0x16/0x20
 [<ffffffff81172748>] vfs_write+0xb8/0x1a0
 [<ffffffff810d1ad2>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81173242>] sys_pwrite64+0x82/0xa0
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b

The host is in selinux enforcing mode.

Comment 2 RHEL Product and Program Management 2011-10-07 15:48:04 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 4 Paolo Bonzini 2012-06-11 07:45:11 UTC
Kevin, do you still see this?  Have you ever upgraded your host to 6.2?

Comment 5 Kevin Fenzi 2012-06-11 14:21:03 UTC
We have updated the host to 6.2 and have not seen it since then. 

As far as I am concerned you can close this out. ;) 

Thanks!

Comment 6 Kevin Fenzi 2012-08-20 16:22:59 UTC
Sadly, we are seeing this again now just in the last week or so. ;( 

Host is a fully updated 6.3 

When the guest dies we get: 


INFO: task jbd2/dm-0-8:414 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/dm-0-8   D 0000000000000002     0   414      2 0x00000000
 ffff8802153f9d20 0000000000000046 ffff8802153f9ca0 ffffffff811adba7
 ffff880215248c00 ffff8800282966e8 00000000000575b4 ffff880215307540
 ffff880215307af8 ffff8802153f9fd8 000000000000fb88 ffff880215307af8
Call Trace:
 [<ffffffff811adba7>] ? __set_page_dirty+0x87/0xf0
 [<ffffffff810923be>] ? prepare_to_wait+0x4e/0x80
 [<ffffffffa007481f>] jbd2_journal_commit_transaction+0x19f/0x14b0 [jbd2]
 [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
 [<ffffffff8107e00c>] ? lock_timer_base+0x3c/0x70
 [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa007af78>] kjournald2+0xb8/0x220 [jbd2]
 [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa007aec0>] ? kjournald2+0x0/0x220 [jbd2]
 [<ffffffff81091d66>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81091cd0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
(for tons of things)

The host trace is: 

Aug 20 09:45:46 virthost02 kernel: qemu-kvm      D 000000000000000e     0  2836      1 0x00000080
Aug 20 09:45:46 virthost02 kernel: ffff8808e25eda98 0000000000000086 0000000000000000 ffffffffa00041
fc
Aug 20 09:45:46 virthost02 kernel: ffff8808e25eda68 0000000021a0b90e ffff8808e25eda88 ffff8806debafe
00
Aug 20 09:45:46 virthost02 kernel: ffff8808f2f465f8 ffff8808e25edfd8 000000000000fb88 ffff8808f2f465
f8
Aug 20 09:45:46 virthost02 kernel: Call Trace:
Aug 20 09:45:46 virthost02 kernel: [<ffffffffa00041fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
Aug 20 09:45:46 virthost02 kernel: [<ffffffff814fe0f3>] io_schedule+0x73/0xc0
Aug 20 09:45:46 virthost02 kernel: [<ffffffff811b673e>] __blockdev_direct_IO_newtrunc+0x6fe/0xb90
Aug 20 09:45:46 virthost02 kernel: [<ffffffff812719f9>] ? cpumask_next_and+0x29/0x50
Aug 20 09:45:46 virthost02 kernel: [<ffffffff811b6c2e>] __blockdev_direct_IO+0x5e/0xd0
Aug 20 09:45:46 virthost02 kernel: [<ffffffff811b34e0>] ? blkdev_get_blocks+0x0/0xc0
Aug 20 09:45:46 virthost02 kernel: [<ffffffff812141f2>] ? security_inode_getsecurity+0x22/0x30
Aug 20 09:45:46 virthost02 kernel: [<ffffffff8119f82c>] ? xattr_getsecurity+0x3c/0xa0
Aug 20 09:45:46 virthost02 kernel: [<ffffffff811b4347>] blkdev_direct_IO+0x57/0x60
Aug 20 09:45:46 virthost02 kernel: [<ffffffff811b34e0>] ? blkdev_get_blocks+0x0/0xc0
Aug 20 09:45:46 virthost02 kernel: [<ffffffff81114e62>] generic_file_direct_write+0xc2/0x190
Aug 20 09:45:46 virthost02 kernel: [<ffffffff81116675>] __generic_file_aio_write+0x345/0x480
Aug 20 09:45:46 virthost02 kernel: [<ffffffff8104e309>] ? __wake_up_common+0x59/0x90
Aug 20 09:45:46 virthost02 kernel: [<ffffffff811b3adc>] blkdev_aio_write+0x3c/0xa0
Aug 20 09:45:46 virthost02 kernel: [<ffffffff8117ae9a>] do_sync_write+0xfa/0x140
Aug 20 09:45:46 virthost02 kernel: [<ffffffff81083a12>] ? send_signal+0x42/0x80
Aug 20 09:45:46 virthost02 kernel: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
Aug 20 09:45:46 virthost02 kernel: [<ffffffff810d358d>] ? audit_filter_rules+0x2d/0xdd0
Aug 20 09:45:46 virthost02 kernel: [<ffffffff8121fe8b>] ? selinux_file_permission+0xfb/0x150
Aug 20 09:45:46 virthost02 kernel: [<ffffffff81213236>] ? security_file_permission+0x16/0x20
Aug 20 09:45:46 virthost02 kernel: [<ffffffff8117b198>] vfs_write+0xb8/0x1a0
Aug 20 09:45:46 virthost02 kernel: [<ffffffff810d6b12>] ? audit_syscall_entry+0x272/0x2a0
Aug 20 09:45:46 virthost02 kernel: [<ffffffff8117bc72>] sys_pwrite64+0x82/0xa0
Aug 20 09:45:46 virthost02 kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

Happy to gather more info.

Comment 7 Ademar Reis 2012-08-20 16:53:44 UTC
Please provide the full qemu command line you're using (you can get it as the output from ps -auxw)

Comment 8 Kevin Fenzi 2012-08-20 16:54:51 UTC
qemu      2495 34.7 17.8 9932176 6253324 ?     Sl   10:29 133:48 /usr/libexec/qemu-kvm -S -M rhel6.0.0 -enable-kvm -m 8192 -smp 6,sockets=6,cores=1,threads=1 -name log02 -uuid 44f9d10b-4cdf-9ad0-65b7-a91417ca0c6e -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/log02.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VolGuests00/log02,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/VolGuests00/log02-data,if=none,id=drive-virtio-disk1,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2b:26:df,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:1 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Comment 11 Paolo Bonzini 2013-01-29 15:03:00 UTC
Kevin,

can you include the "dmtable setup" output?

Comment 12 Kevin Fenzi 2013-01-31 17:35:12 UTC
dmsetup table for this guest: 

VolGuests00-log02: 0 67108864 linear 9:2 134222848
VolGuests00-log02--data: 0 1048576000 linear 9:2 201331712
VolGuests00-log02--data: 1048576000 838860800 linear 9:2 1323308032

Is that what you wanted?

Comment 13 Paolo Bonzini 2013-02-06 12:22:44 UTC
Sorry no, I wanted those on the host.

Comment 14 Kevin Fenzi 2013-02-06 22:09:21 UTC
That was the output of dmsetup on the host that refered to the guest. 

Do you want all the dmsetup output from the host? Or inside the guest?

Comment 15 Paolo Bonzini 2013-02-07 15:00:22 UTC
All of the host setup, so that I can see the raid configuration.

Comment 16 Kevin Fenzi 2013-02-09 16:35:16 UTC
ok. 

'dmsetup table':

xenGuests-sign--bridge01: 0 62914560 linear 253:10 1075732864
virtWebGuests-fas01: 0 41943040 linear 253:11 2048
VolGuests00-fas02: 0 67108864 linear 9:2 5120
mpathap1: 0 2147504877 linear 253:7 63
VolGroup00-rootvol00: 0 24576000 linear 9:1 4199424
xenGuests-nfs01: 0 20971520 linear 253:10 1830707584
xenGuests-pkgs01: 0 157286400 linear 253:10 1851679104
xenGuests-pkgs01: 157286400 157286400 linear 253:10 2103337344
mpathc: 0 2147518464 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:128 128 
mpathbp1: 0 3774921507 linear 253:8 63
xenGuests-db02: 0 52428800 linear 253:10 740188544
xenGuests-db02: 52428800 41943040 linear 253:10 1788764544
xenGuests-db02: 94371840 62914560 linear 253:10 2008965504
mpathb: 0 3774937088 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:96 128 
VolGuests00-log02: 0 67108864 linear 9:2 134222848
xenGuests-koji2: 0 20971520 linear 253:10 1767793024
VolGuests00-log02--data: 0 1048576000 linear 9:2 201331712
VolGuests00-log02--data: 1048576000 838860800 linear 9:2 1323308032
virtWebGuests-ns04: 0 41943040 linear 253:11 566233088
virtWebGuests-secondary02: 0 41943040 linear 253:11 650119168
mpatha: 0 2147518464 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:112 128 
VolGuests00-memcached03: 0 31457280 linear 9:2 1291850752
virtWebGuests-secondary01: 0 41943040 linear 253:11 608176128
VolGuests00-noc01: 0 41943040 linear 9:2 1249907712
xenGuests-ns04: 0 31457280 linear 253:10 2071880064
xenGuests-relepel01: 0 62914560 linear 253:10 1012818304
xenGuests-cvs2: 0 31457280 linear 253:10 73294208
xenGuests-cvs1: 0 157286400 linear 253:10 855531904
xenGuests-compose--x86--02: 0 629145600 linear 253:10 1138647424
virtWebGuests-lockbox01: 0 104857600 linear 253:11 692062208
virtWebGuests-db05: 0 524288000 linear 253:11 41945088
xenGuests-relepel1--old: 0 62914560 linear 253:10 792617344
xenGuests-cvs--storage: 0 606822400 linear 253:10 104751488
xenGuests-cvs--storage: 606822400 22323200 linear 253:10 717865344
VolGroup00-swap00: 0 4194304 linear 9:1 5120
xenGuests-koji01: 0 41943040 linear 253:10 384

Note that many of these are iscsi volumes that are not in any way active on the host. 

/proc/mdstat: 

Personalities : [raid6] [raid5] [raid4] [raid1] 
md2 : active raid5 sdc3[2] sdf3[6] sda3[0] sde3[4] sdd3[3] sdb3[1]
      1359869440 blocks super 1.1 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
      bitmap: 2/3 pages [8KB], 65536KB chunk

md0 : active raid1 sda1[0] sdf1[5] sdb1[1] sdd1[3] sdc1[2] sde1[4]
      511988 blocks super 1.0 [6/6] [UUUUUU]
      
md1 : active raid5 sdb2[1] sdd2[3] sde2[4] sdf2[6] sdc2[2] sda2[0]
      102392320 blocks super 1.1 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>

Happy to provide anything further requested.

Comment 17 Robert Stroetgen 2013-09-20 13:35:55 UTC
Rather similar:

/var/log/messages

Sep 18 16:40:00 vmhost-dmz2 kernel: INFO: task qemu-kvm:14670 blocked for more than 120 seconds.
Sep 18 16:40:00 vmhost-dmz2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 18 16:40:00 vmhost-dmz2 kernel: qemu-kvm      D 0000000000000000     0 14670      1 0x00000080
Sep 18 16:40:00 vmhost-dmz2 kernel: ffff880c6d4019b8 0000000000000082 0000000000000000 ffffffffa00043fc
Sep 18 16:40:00 vmhost-dmz2 kernel: ffff8806753bc778 ffff880671bfd200 0000000000000001 000000000000000c
Sep 18 16:40:00 vmhost-dmz2 kernel: ffff880c7248faf8 ffff880c6d401fd8 000000000000fb88 ffff880c7248faf8
Sep 18 16:40:00 vmhost-dmz2 kernel: Call Trace:
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffffa00043fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff8150e8c3>] io_schedule+0x73/0xc0
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff811bed0e>] __blockdev_direct_IO_newtrunc+0x6de/0xb30
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff811304d6>] ? __pagevec_release+0x26/0x40
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff811bf1be>] __blockdev_direct_IO+0x5e/0xd0
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff811bb590>] ? blkdev_get_blocks+0x0/0xc0
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff811bc657>] blkdev_direct_IO+0x57/0x60
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff811bb590>] ? blkdev_get_blocks+0x0/0xc0
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff8111aa32>] generic_file_direct_write+0xc2/0x190
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff8111c351>] __generic_file_aio_write+0x3a1/0x490
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff811bdc27>] blkdev_aio_write+0x77/0x130
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff811bdbb0>] ? blkdev_aio_write+0x0/0x130
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff81180f2b>] do_sync_readv_writev+0xfb/0x140
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff81180d2a>] ? rw_copy_check_uvector+0x6a/0x120
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff81228ffb>] ? selinux_file_permission+0xfb/0x150
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff8121bed6>] ? security_file_permission+0x16/0x20
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff81181eb6>] do_readv_writev+0xd6/0x1f0
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff810874f6>] ? group_send_sig_info+0x56/0x70
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff8108754f>] ? kill_pid_info+0x3f/0x60
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff81182016>] vfs_writev+0x46/0x60
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff811820d2>] sys_pwritev+0xa2/0xc0
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
Sep 18 16:40:00 vmhost-dmz2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

dmsetup table

mpath-extern-win1: 0 104857600 multipath 0 1 emc 2 1 round-robin 0 2 1 65:144 1 67:0 1 round-robin 0 2 1 8:64 1 68:80 1
mpath-winvrep2: 0 1048365056 linear 253:4 206848
mpath-extern1: 0 104857600 multipath 0 1 emc 2 1 round-robin 0 2 1 8:32 1 68:48 1 round-robin 0 2 1 65:112 1 66:224 1
mpath-wwwp5: 0 45871104 linear 253:11 163844096
mpath-iservicesp5: 0 2923767 linear 253:3 101932488
mpath-winvrep1: 0 204800 linear 253:4 2048
mpath-repositoryp5: 0 2923767 linear 253:9 101932488
mpath-template-centos: 0 41943040 multipath 0 1 emc 2 1 round-robin 0 2 1 65:192 1 67:48 1 round-robin 0 2 1 8:112 1 68:128 1
mpath-extern-win1p2: 0 104648704 linear 253:5 206848
vg_www-lv_swap: 0 33554432 linear 253:32 36784128
vg_www-lv_root: 0 36782080 linear 253:32 2048
vg_www-lv_root: 36782080 120504320 linear 253:32 70338560
mpath-wwwp2: 0 2 linear 253:11 163842048
mpath-iservicesp2: 0 2 linear 253:3 101932425
mpath-extern-win1p1: 0 204800 linear 253:5 2048
mpath-www-neup2: 0 208689152 linear 253:10 1026048
mpath-wwwp1: 0 163840000 linear 253:11 2048
mpath-iservicesp1: 0 101932362 linear 253:3 63
mpath-vufindp5: 0 4093952 linear 253:13 37849088
mpath-repositoryp2: 0 2 linear 253:9 101932425
mpath-extern1p5: 0 12578816 linear 253:2 92276736
mpath-vufind: 0 41943040 multipath 0 1 emc 2 1 round-robin 0 2 1 66:112 1 67:224 1 round-robin 0 2 1 65:32 1 69:48 1
mpath-www-neup1: 0 1024000 linear 253:10 2048
www-typo3: 0 52428800 linear 253:15 384
mpath-repository: 0 209715200 multipath 0 1 emc 2 1 round-robin 0 2 1 65:48 1 69:64 1 round-robin 0 2 1 66:128 1 67:240 1
extern2data-owncloud: 0 314572800 linear 253:12 384
mpath-extern2p5: 0 10063872 linear 253:24 94793728
mpath-wwwdatap1: 0 524285952 linear 253:6 2048
mpath-repositoryp1: 0 101932362 linear 253:9 63
mpath-basisdienste-dmzp5: 0 4093952 linear 253:8 37849088
mpath-basisdienste-dmz: 0 41943040 multipath 0 1 emc 2 1 round-robin 0 2 1 65:16 1 69:32 1 round-robin 0 2 1 66:96 1 67:208 1
mpath-template-centosp2: 0 40916992 linear 253:7 1026048
mpath-extern2data: 0 524288000 multipath 0 1 emc 2 1 round-robin 0 2 1 66:80 1 67:192 1 round-robin 0 2 1 65:0 1 69:16 1
mpath-vufindp2: 0 2 linear 253:13 37847040
mpath-template-centosp1: 0 1024000 linear 253:7 2048
mpath-extern1p2: 0 2 linear 253:2 92276734
mpath-extern2p2: 0 2 linear 253:24 94791680
www-mysql: 0 104849408 linear 253:15 419430784
mpath-vufindp1: 0 37844992 linear 253:13 2048
mpath-winvre: 0 1048576000 multipath 0 1 emc 2 1 round-robin 0 2 1 8:192 1 68:208 1 round-robin 0 2 1 66:16 1 67:128 1
mpath-extern1p1: 0 92272640 linear 253:2 2048
www-www: 0 367001600 linear 253:15 52429184
mpath-basisdienste-dmzp2: 0 2 linear 253:8 37847040
vg_vmhostdmz2-lv_swap: 0 103317504 linear 8:2 104859648
mpath-extern2p1: 0 94789632 linear 253:24 2048
vg_vmhostdmz2-lv_root: 0 104857600 linear 8:2 2048
mpath-wwwdata: 0 524288000 multipath 0 1 emc 2 1 round-robin 0 2 1 65:176 1 67:32 1 round-robin 0 2 1 8:96 1 68:112 1
mpath-basisdienste-dmzp1: 0 37844992 linear 253:8 2048
mpath-www-neu: 0 209715200 multipath 0 1 emc 2 1 round-robin 0 2 1 66:32 1 67:144 1 round-robin 0 2 1 8:208 1 68:224 1
vg_vmhostdmz2-lv_var: 0 374775808 linear 8:2 208177152
mpath-www: 0 209715200 multipath 0 1 emc 2 1 round-robin 0 2 1 65:80 1 69:96 1 round-robin 0 2 1 66:160 1 68:16 1
mpath-iservices: 0 209715200 multipath 0 1 emc 2 1 round-robin 0 2 1 65:96 1 66:208 1 round-robin 0 2 1 8:16 1 68:32 1
mpath-extern2: 0 104857600 multipath 0 1 emc 2 1 round-robin 0 2 1 66:64 1 67:176 1 round-robin 0 2 1 8:240 1 69:0 1

multipath -ll
mpath-extern-win1 (360060160c4b12a008ef989ef7fb4e111) dm-5 DGC,RAID 5
size=50G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:1:3  sdz  65:144 active ready  running
| `- 2:0:0:3  sdaw 67:0   active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:0:3  sde  8:64   active ready  running
  `- 2:0:1:3  sdbr 68:80  active ready  running
mpath-extern1 (360060160c4b12a00b24f409162b4e111) dm-2 DGC,RAID 5
size=50G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:0:1  sdc  8:32   active ready  running
| `- 2:0:1:1  sdbp 68:48  active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:1:1  sdx  65:112 active ready  running
  `- 2:0:0:1  sdau 66:224 active ready  running
mpath-template-centos (360060160c4b12a0098fad7839a0be211) dm-7 DGC,RAID 5
size=20G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:1:6  sdac 65:192 active ready  running
| `- 2:0:0:6  sdaz 67:48  active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:0:6  sdh  8:112  active ready  running
  `- 2:0:1:6  sdbu 68:128 active ready  running
mpath-vufind (360060160c4b12a0048e8aebbbf66e211) dm-13 DGC,RAID 5
size=20G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:1:17 sdan 66:112 active ready  running
| `- 2:0:0:17 sdbk 67:224 active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:0:17 sds  65:32  active ready  running
  `- 2:0:1:17 sdcf 69:48  active ready  running
mpath-repository (360060160ceb12a00b8b077029712e011) dm-9 DGC,RAID 5
size=100G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:0:18 sdt  65:48  active ready  running
| `- 2:0:1:18 sdcg 69:64  active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:1:18 sdao 66:128 active ready  running
  `- 2:0:0:18 sdbl 67:240 active ready  running
mpath-basisdienste-dmz (360060160c4b12a000aa26e380267e211) dm-8 DGC,RAID 5
size=20G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:0:16 sdr  65:16  active ready  running
| `- 2:0:1:16 sdce 69:32  active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:1:16 sdam 66:96  active ready  running
  `- 2:0:0:16 sdbj 67:208 active ready  running
mpath-extern2data (360060160c4b12a00fae56f4cf124e211) dm-12 DGC,VRAID
size=250G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:1:15 sdal 66:80  active ready  running
| `- 2:0:0:15 sdbi 67:192 active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:0:15 sdq  65:0   active ready  running
  `- 2:0:1:15 sdcd 69:16  active ready  running
mpath-winvre (360060160c4b12a002851153cad21e211) dm-4 DGC,VRAID
size=500G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:0:11 sdm  8:192  active ready  running
| `- 2:0:1:11 sdbz 68:208 active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:1:11 sdah 66:16  active ready  running
  `- 2:0:0:11 sdbe 67:128 active ready  running
mpath-wwwdata (360060160c4b12a008a0239dd9a0be211) dm-6 DGC,VRAID
size=250G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:1:5  sdab 65:176 active ready  running
| `- 2:0:0:5  sday 67:32  active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:0:5  sdg  8:96   active ready  running
  `- 2:0:1:5  sdbt 68:112 active ready  running
mpath-www-neu (360060160c4b12a00103dde6c9d22e211) dm-10 DGC,RAID 5
size=100G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:1:12 sdai 66:32  active ready  running
| `- 2:0:0:12 sdbf 67:144 active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:0:12 sdn  8:208  active ready  running
  `- 2:0:1:12 sdca 68:224 active ready  running
mpath-www (360060160c4b12a00203d8252b50be311) dm-11 DGC,RAID 5
size=100G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:0:20 sdv  65:80  active ready  running
| `- 2:0:1:20 sdci 69:96  active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:1:20 sdaq 66:160 active ready  running
  `- 2:0:0:20 sdbn 68:16  active ready  running
mpath-iservices (360060160ceb12a00bce8ef6f9812e011) dm-3 DGC,RAID 5
size=100G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:1:0  sdw  65:96  active ready  running
| `- 2:0:0:0  sdat 66:208 active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:0:0  sdb  8:16   active ready  running
  `- 2:0:1:0  sdbo 68:32  active ready  running
mpath-extern2 (360060160c4b12a00b08040aef024e211) dm-24 DGC,RAID 5
size=50G features='0' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| |- 1:0:1:14 sdak 66:64  active ready  running
| `- 2:0:0:14 sdbh 67:176 active ready  running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 1:0:0:14 sdp  8:240  active ready  running
  `- 2:0:1:14 sdcc 69:0   active ready  running

Comment 18 Paolo Bonzini 2014-05-07 15:11:57 UTC
If you are still seeing it, and you are not using migration, would you mind changing your configuration to use cache='writeback' instead of cache='none'?  This would let us pinpoint the cause to device-mapper/md (if it still happens) or direct-IO (if it doesn't).

An alternative would be to catch a vmcore once it happens (echo c > /proc/sysrq-trigger), but perhaps you cannot shut down the host forcibly.

Comment 19 Kevin Fenzi 2014-05-08 20:26:08 UTC
We are actually no longer seeing this.

The virthost has been up 30days on 2.6.32-431.11.2.el6.x86_64 and nary a issue.

Comment 20 Paolo Bonzini 2014-05-09 15:00:43 UTC
Closing -- if you can reproduce, please try to follow the instructions in comment 18 to understand which kernel subsystem is causing the bug.  Thanks!


Note You need to log in before you can comment on or make changes to this bug.