RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1193222 - device failure on exclusively active raid w/ snapshot (on top of cluster VG) leads to deadlock
Summary: device failure on exclusively active raid w/ snapshot (on top of cluster VG) ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Heinz Mauelshagen
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1219222
TreeView+ depends on / blocked
 
Reported: 2015-02-16 22:10 UTC by Corey Marthaler
Modified: 2021-09-03 12:39 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1219222 (view as bug list)
Environment:
Last Closed: 2020-02-28 20:41:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log and dump of kern stack from cluster node host-127 (400.46 KB, text/plain)
2015-02-16 22:13 UTC, Corey Marthaler
no flags Details

Description Corey Marthaler 2015-02-16 22:10:09 UTC
Description of problem:
This appears to be 100% reproducible when the exclusively active raid volume contains a snaphot, and isn't reproducible when a snaphot is not present. This deadlock causes the CPU load to continue increase.


[root@host-127 ~]# pcs status
Cluster name: STSRHTS16897
Last updated: Mon Feb 16 16:07:07 2015
Last change: Mon Feb 16 15:11:31 2015
Stack: corosync
Current DC: host-129 (3) - partition with quorum
Version: 1.1.12-a14efad
3 Nodes configured
9 Resources configured


Online: [ host-127 host-128 host-129 ]

Full list of resources:

 fence-host-127 (stonith:fence_xvm):    Started host-127 
 fence-host-128 (stonith:fence_xvm):    Started host-128 
 fence-host-129 (stonith:fence_xvm):    Started host-129 
 Clone Set: dlm-clone [dlm]
     Started: [ host-127 host-128 host-129 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ host-127 host-128 host-129 ]

PCSD Status:
  host-127: Online
  host-128: Online
  host-129: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled




black_bird -R $sts_resource_file -E host-127.virt.lab.msp.redhat.com -i 1 -F -w EXT

creating lvm devices...
Create 7 PV(s) for black_bird on host-127
Create VG black_bird on host-127
Enabling raid allocate fault policies on: host-127
================================================================================
Iteration 0.1 started at Mon Feb 16 15:12:13 CST 2015
================================================================================
Scenario kill_random_synced_raid1_2legs: Kill random leg of synced 2 leg raid1 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_random_raid1_2legs_1
* sync:               1
* type:               raid1
* -m |-i value:       2
* leg devices:        /dev/sdd1 /dev/sda1 /dev/sdh1
* spanned legs:        0
* failpv(s):          /dev/sdh1
* additional snap:    /dev/sdd1
* failnode(s):        host-127
* lvmetad:            0
* raid fault policy:  allocate
******************************************************

Creating raids(s) on host-127...
host-127: lvcreate --type raid1 -m 2 -n synced_random_raid1_2legs_1 -L 500M black_bird /dev/sdd1:0-2400 /dev/sda1:0-2400 /dev/sdh1:0-2400
EXCLUSIVELY ACTIVATING RAID on host-127

Current mirror/raid device structure(s):
  LV                                     Attr       LSize   Cpy%Sync Devices
  synced_random_raid1_2legs_1            rwi-a-r--- 500.00m 0.80     synced_random_raid1_2legs_1_rimage_0(0),synced_random_raid1_2legs_1_rimage_1(0),synced_random_raid1_2legs_1_rimage_2(0)
  [synced_random_raid1_2legs_1_rimage_0] Iwi-aor--- 500.00m          /dev/sdd1(1)
  [synced_random_raid1_2legs_1_rimage_1] Iwi-aor--- 500.00m          /dev/sda1(1)
  [synced_random_raid1_2legs_1_rimage_2] Iwi-aor--- 500.00m          /dev/sdh1(1)
  [synced_random_raid1_2legs_1_rmeta_0]  ewi-aor---   4.00m          /dev/sdd1(0)
  [synced_random_raid1_2legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sda1(0)
  [synced_random_raid1_2legs_1_rmeta_2]  ewi-aor---   4.00m          /dev/sdh1(0)

Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating ext on top of mirror(s) on host-127...
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-127...

PV=/dev/sdh1
   synced_random_raid1_2legs_1_rimage_2: 1.0
   synced_random_raid1_2legs_1_rmeta_2: 1.0

Creating a snapshot volume of each of the raids
host-127: lvcreate -L 250M -n bb_snap1 -s black_bird/synced_random_raid1_2legs_1 /dev/sdd1
Writing verification files (checkit) to mirror(s) on...
   ---- host-127 ----

Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
   ---- host-127 ----

Disabling device sdh on host-127

simple pvs failed/segfaulted possible regression of BZ 571963
FI_engine: inject() method failed
[DEADLOCK]




Feb 16 15:12:58 host-127 qarshd[4627]: Running cmdline: echo offline > /sys/block/sdh/device/state
Feb 16 15:12:58 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 1.090000
Feb 16 15:13:01 host-127 kernel: sd 8:0:0:1: rejecting I/O to offline device
Feb 16 15:13:01 host-127 kernel: md: super_written gets error=-5, uptodate=0
Feb 16 15:13:01 host-127 kernel: md/raid1:mdX: Disk failure on dm-7, disabling device.
md/raid1:mdX: Operation continuing on 2 devices.
Feb 16 15:13:01 host-127 lvm[4293]: Device #2 of raid1 array, black_bird-synced_random_raid1_2legs_1-real, has failed.
Feb 16 15:13:01 host-127 kernel: sd 8:0:0:1: rejecting I/O to offline device
Feb 16 15:13:01 host-127 lvm[4293]: /dev/sdh1: read failed after 0 of 4096 at 26838958080: Input/output error
Feb 16 15:13:01 host-127 lvm[4293]: /dev/sdh1: read failed after 0 of 4096 at 26839048192: Input/output error
Feb 16 15:13:01 host-127 lvm[4293]: /dev/sdh1: read failed after 0 of 4096 at 0: Input/output error
Feb 16 15:13:01 host-127 lvm[4293]: /dev/sdh1: read failed after 0 of 4096 at 4096: Input/output error
Feb 16 15:13:01 host-127 lvm[4293]: Couldn't find device with uuid CvXsDs-3qf5-UQzm-UjPt-GbZc-evEe-QLYbp8.
Feb 16 15:13:01 host-127 kernel: sd 8:0:0:1: rejecting I/O to offline device
Feb 16 15:13:02 host-127 kernel: sd 8:0:0:1: rejecting I/O to offline device
Feb 16 15:13:02 host-127 kernel: sd 8:0:0:1: rejecting I/O to offline device
Feb 16 15:13:02 host-127 kernel: device-mapper: raid: Device 2 specified for rebuild: Clearing superblock
Feb 16 15:13:02 host-127 kernel: md/raid1:mdX: active with 2 out of 3 mirrors
Feb 16 15:13:02 host-127 kernel: created bitmap (1 pages) for device mdX
Feb 16 15:13:03 host-127 qarshd[4655]: Running cmdline: pvs -a
Feb 16 15:13:28 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 1.670000
Feb 16 15:13:58 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 2.480000
Feb 16 15:14:28 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 3.080000
Feb 16 15:14:58 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 3.440000
Feb 16 15:15:10 host-127 lvm[4293]: Error locking on node 1: Command timed out
Feb 16 15:15:22 host-127 kernel: INFO: task clvmd:3096 blocked for more than 120 seconds.
Feb 16 15:15:22 host-127 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 16 15:15:22 host-127 kernel: clvmd           D ffff88003fc13680     0  3096      1 0x00000080
Feb 16 15:15:22 host-127 kernel: ffff88002349fa70 0000000000000086 ffff8800234471c0 ffff88002349ffd8
Feb 16 15:15:22 host-127 kernel: ffff88002349ffd8 ffff88002349ffd8 ffff8800234471c0 ffff88003fc13f48
Feb 16 15:15:22 host-127 kernel: ffff88001edbaf00 ffff8800234471c0 0000000000000000 0000000000000000
Feb 16 15:15:22 host-127 kernel: Call Trace:
Feb 16 15:15:22 host-127 kernel: [<ffffffff8160955d>] io_schedule+0x9d/0x130
Feb 16 15:15:22 host-127 kernel: [<ffffffff81203ea3>] do_blockdev_direct_IO+0xc03/0x2620
Feb 16 15:15:22 host-127 kernel: [<ffffffff811ffa60>] ? I_BDEV+0x10/0x10
Feb 16 15:15:22 host-127 kernel: [<ffffffff81205915>] __blockdev_direct_IO+0x55/0x60
Feb 16 15:15:22 host-127 kernel: [<ffffffff811ffa60>] ? I_BDEV+0x10/0x10
Feb 16 15:15:22 host-127 kernel: [<ffffffff812000b7>] blkdev_direct_IO+0x57/0x60
Feb 16 15:15:22 host-127 kernel: [<ffffffff811ffa60>] ? I_BDEV+0x10/0x10
Feb 16 15:15:22 host-127 kernel: [<ffffffff81158213>] generic_file_aio_read+0x6d3/0x750
Feb 16 15:15:22 host-127 kernel: [<ffffffff811e61de>] ? mntput_no_expire+0x3e/0x120
Feb 16 15:15:22 host-127 kernel: [<ffffffff811e62e4>] ? mntput+0x24/0x40
Feb 16 15:15:22 host-127 kernel: [<ffffffff8120063c>] blkdev_aio_read+0x4c/0x70
Feb 16 15:15:22 host-127 kernel: [<ffffffff811c5d6d>] do_sync_read+0x8d/0xd0
Feb 16 15:15:22 host-127 kernel: [<ffffffff811c644c>] vfs_read+0x9c/0x170
Feb 16 15:15:22 host-127 kernel: [<ffffffff811c6f78>] SyS_read+0x58/0xb0
Feb 16 15:15:22 host-127 kernel: [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b
Feb 16 15:15:22 host-127 kernel: INFO: task xdoio:4598 blocked for more than 120 seconds.
Feb 16 15:15:22 host-127 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 16 15:15:22 host-127 kernel: xdoio           D ffff88003fc13680     0  4598   4597 0x00000080
Feb 16 15:15:22 host-127 kernel: ffff88001edf7978 0000000000000082 ffff88001f0ec440 ffff88001edf7fd8
Feb 16 15:15:22 host-127 kernel: ffff88001edf7fd8 ffff88001edf7fd8 ffff88001f0ec440 ffff88003fc13f48
Feb 16 15:15:22 host-127 kernel: ffff88001edf7a00 0000000000000002 ffffffff81155f10 ffff88001edf79f0
Feb 16 15:15:22 host-127 kernel: Call Trace:
Feb 16 15:15:22 host-127 kernel: [<ffffffff81155f10>] ? wait_on_page_read+0x60/0x60
Feb 16 15:15:22 host-127 kernel: [<ffffffff8160955d>] io_schedule+0x9d/0x130
Feb 16 15:15:22 host-127 kernel: [<ffffffff81155f1e>] sleep_on_page+0xe/0x20
Feb 16 15:15:22 host-127 kernel: [<ffffffff8160746b>] __wait_on_bit_lock+0x5b/0xc0
Feb 16 15:15:22 host-127 kernel: [<ffffffff81156038>] __lock_page+0x78/0xa0
Feb 16 15:15:22 host-127 kernel: [<ffffffff81098270>] ? autoremove_wake_function+0x40/0x40
Feb 16 15:15:22 host-127 kernel: [<ffffffff81156974>] __find_lock_page+0x54/0x70
Feb 16 15:15:22 host-127 kernel: [<ffffffff81157462>] grab_cache_page_write_begin+0x62/0xd0
Feb 16 15:15:22 host-127 kernel: [<ffffffffa035723c>] ext4_write_begin+0x9c/0x420 [ext4]
Feb 16 15:15:22 host-127 kernel: [<ffffffff8115648d>] generic_file_buffered_write+0x11d/0x290
Feb 16 15:15:22 host-127 kernel: [<ffffffff811585f5>] __generic_file_aio_write+0x1d5/0x3e0
Feb 16 15:15:22 host-127 kernel: [<ffffffff810abe58>] ? sched_clock_cpu+0xa8/0x100
Feb 16 15:15:22 host-127 kernel: [<ffffffff8115885d>] generic_file_aio_write+0x5d/0xc0
Feb 16 15:15:22 host-127 kernel: [<ffffffffa034cb75>] ext4_file_write+0xb5/0x460 [ext4]
Feb 16 15:15:22 host-127 kernel: [<ffffffff811e22fb>] ? iput+0x3b/0x180
Feb 16 15:15:22 host-127 kernel: [<ffffffff810d1e05>] ? drop_futex_key_refs.isra.13+0x35/0x70
Feb 16 15:15:22 host-127 kernel: [<ffffffff811c5ef9>] do_sync_readv_writev+0x79/0xd0
Feb 16 15:15:22 host-127 kernel: [<ffffffff811c73ee>] do_readv_writev+0xce/0x260
Feb 16 15:15:22 host-127 kernel: [<ffffffffa034cac0>] ? ext4_file_mmap+0x30/0x30 [ext4]
Feb 16 15:15:22 host-127 kernel: [<ffffffff811c5db0>] ? do_sync_read+0xd0/0xd0
Feb 16 15:15:22 host-127 kernel: [<ffffffff81052b0f>] ? kvm_clock_get_cycles+0x1f/0x30
Feb 16 15:15:22 host-127 kernel: [<ffffffff810c895a>] ? __getnstimeofday+0x3a/0xd0
Feb 16 15:15:22 host-127 kernel: [<ffffffff811c7615>] vfs_writev+0x35/0x60
Feb 16 15:15:22 host-127 kernel: [<ffffffff811c776c>] SyS_writev+0x5c/0xd0
Feb 16 15:15:22 host-127 kernel: [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b
Feb 16 15:15:27 host-127 systemd: Starting Session 3 of user root.
Feb 16 15:15:27 host-127 systemd: Started Session 3 of user root.
Feb 16 15:15:27 host-127 systemd-logind: New session 3 of user root.
Feb 16 15:15:28 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 3.660000
Feb 16 15:15:58 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 3.790000
Feb 16 15:16:11 host-127 lvm[4293]: Error locking on node 1: Command timed out
Feb 16 15:16:11 host-127 lvm[4293]: Problem reactivating logical volume black_bird/synced_random_raid1_2legs_1.
Feb 16 15:16:28 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 3.880000
Feb 16 15:16:58 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 3.920000
Feb 16 15:17:22 host-127 kernel: INFO: task kworker/u2:2:263 blocked for more than 120 seconds.
Feb 16 15:17:22 host-127 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 16 15:17:22 host-127 kernel: kworker/u2:2    D ffff88003fc13680     0   263      2 0x00000000
Feb 16 15:17:22 host-127 kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:8)
Feb 16 15:17:22 host-127 kernel: ffff880036cab8d8 0000000000000046 ffff880036c8c440 ffff880036cabfd8
Feb 16 15:17:22 host-127 kernel: ffff880036cabfd8 ffff880036cabfd8 ffff880036c8c440 ffff88003fc13f48
Feb 16 15:17:22 host-127 kernel: ffff880036cab960 0000000000000002 ffffffff81155f10 ffff880036cab950
Feb 16 15:17:22 host-127 kernel: Call Trace:
Feb 16 15:17:22 host-127 kernel: [<ffffffff81155f10>] ? wait_on_page_read+0x60/0x60
Feb 16 15:17:22 host-127 kernel: [<ffffffff8160955d>] io_schedule+0x9d/0x130
Feb 16 15:17:22 host-127 kernel: [<ffffffff81155f1e>] sleep_on_page+0xe/0x20
Feb 16 15:17:22 host-127 kernel: [<ffffffff8160746b>] __wait_on_bit_lock+0x5b/0xc0
Feb 16 15:17:22 host-127 kernel: [<ffffffff81156038>] __lock_page+0x78/0xa0
Feb 16 15:17:22 host-127 kernel: [<ffffffff81098270>] ? autoremove_wake_function+0x40/0x40
Feb 16 15:17:22 host-127 kernel: [<ffffffffa0351f90>] mpage_prepare_extent_to_map+0x2d0/0x2e0 [ext4]
Feb 16 15:17:22 host-127 kernel: [<ffffffffa0356043>] ext4_writepages+0x463/0xd60 [ext4]
Feb 16 15:17:22 host-127 kernel: [<ffffffff81161ac8>] ? generic_writepages+0x58/0x80
Feb 16 15:17:22 host-127 kernel: [<ffffffff81162b6e>] do_writepages+0x1e/0x40
Feb 16 15:17:22 host-127 kernel: [<ffffffff811efe10>] __writeback_single_inode+0x40/0x220
Feb 16 15:17:22 host-127 kernel: [<ffffffff811f0b0e>] writeback_sb_inodes+0x25e/0x420
Feb 16 15:17:22 host-127 kernel: [<ffffffff811f0d6f>] __writeback_inodes_wb+0x9f/0xd0
Feb 16 15:17:22 host-127 kernel: [<ffffffff811f15b3>] wb_writeback+0x263/0x2f0
Feb 16 15:17:22 host-127 kernel: [<ffffffff811e027c>] ? get_nr_inodes+0x4c/0x70
Feb 16 15:17:22 host-127 kernel: [<ffffffff811f2beb>] bdi_writeback_workfn+0x2cb/0x460
Feb 16 15:17:22 host-127 kernel: [<ffffffff8108f0ab>] process_one_work+0x17b/0x470
Feb 16 15:17:22 host-127 kernel: [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
Feb 16 15:17:22 host-127 kernel: [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
Feb 16 15:17:22 host-127 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0
Feb 16 15:17:22 host-127 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Feb 16 15:17:22 host-127 kernel: [<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0
Feb 16 15:17:22 host-127 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
Feb 16 15:17:22 host-127 kernel: INFO: task clvmd:3096 blocked for more than 120 seconds.
Feb 16 15:17:22 host-127 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 16 15:17:22 host-127 kernel: clvmd           D ffff88003fc13680     0  3096      1 0x00000080
Feb 16 15:17:22 host-127 kernel: ffff88002349fa70 0000000000000086 ffff8800234471c0 ffff88002349ffd8
Feb 16 15:17:22 host-127 kernel: ffff88002349ffd8 ffff88002349ffd8 ffff8800234471c0 ffff88003fc13f48
Feb 16 15:17:22 host-127 kernel: ffff88001edbaf00 ffff8800234471c0 0000000000000000 0000000000000000
Feb 16 15:17:22 host-127 kernel: Call Trace:
Feb 16 15:17:22 host-127 kernel: [<ffffffff8160955d>] io_schedule+0x9d/0x130
Feb 16 15:17:22 host-127 kernel: [<ffffffff81203ea3>] do_blockdev_direct_IO+0xc03/0x2620
Feb 16 15:17:22 host-127 kernel: [<ffffffff811ffa60>] ? I_BDEV+0x10/0x10
Feb 16 15:17:22 host-127 kernel: [<ffffffff81205915>] __blockdev_direct_IO+0x55/0x60
Feb 16 15:17:22 host-127 kernel: [<ffffffff811ffa60>] ? I_BDEV+0x10/0x10
Feb 16 15:17:22 host-127 kernel: [<ffffffff812000b7>] blkdev_direct_IO+0x57/0x60
Feb 16 15:17:22 host-127 kernel: [<ffffffff811ffa60>] ? I_BDEV+0x10/0x10
Feb 16 15:17:22 host-127 kernel: [<ffffffff81158213>] generic_file_aio_read+0x6d3/0x750
Feb 16 15:17:22 host-127 kernel: [<ffffffff811e61de>] ? mntput_no_expire+0x3e/0x120
Feb 16 15:17:22 host-127 kernel: [<ffffffff811e62e4>] ? mntput+0x24/0x40
Feb 16 15:17:22 host-127 kernel: [<ffffffff8120063c>] blkdev_aio_read+0x4c/0x70
Feb 16 15:17:22 host-127 kernel: [<ffffffff811c5d6d>] do_sync_read+0x8d/0xd0
Feb 16 15:17:22 host-127 kernel: [<ffffffff811c644c>] vfs_read+0x9c/0x170
Feb 16 15:17:22 host-127 kernel: [<ffffffff811c6f78>] SyS_read+0x58/0xb0
Feb 16 15:17:22 host-127 kernel: [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b
Feb 16 15:17:28 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 3.950000
Feb 16 15:17:58 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 3.970000
Feb 16 15:18:11 host-127 lvm[4293]: Error locking on node 1: Command timed out
Feb 16 15:18:11 host-127 lvm[4293]: Failed to replace faulty devices in black_bird/synced_random_raid1_2legs_1.
Feb 16 15:18:28 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 3.980000
Feb 16 15:18:58 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 3.990000
Feb 16 15:19:28 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 3.990000
Feb 16 15:19:58 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 4.000000
Feb 16 15:20:11 host-127 lvm[4293]: Error locking on node 1: Command timed out
Feb 16 15:20:28 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 4.000000
Feb 16 15:20:58 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 4.000000

[...]

Feb 16 15:41:59 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 5.000000
Feb 16 15:42:29 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 5.000000
Feb 16 15:42:59 host-127 crmd[2523]: notice: throttle_handle_load: High CPU load detected: 5.000000




Version-Release number of selected component (if applicable):
3.10.0-229.el7.x86_64
lvm2-2.02.115-3.el7    BUILT: Wed Jan 28 09:59:01 CST 2015
lvm2-libs-2.02.115-3.el7    BUILT: Wed Jan 28 09:59:01 CST 2015
lvm2-cluster-2.02.115-3.el7    BUILT: Wed Jan 28 09:59:01 CST 2015
device-mapper-1.02.93-3.el7    BUILT: Wed Jan 28 09:59:01 CST 2015
device-mapper-libs-1.02.93-3.el7    BUILT: Wed Jan 28 09:59:01 CST 2015
device-mapper-event-1.02.93-3.el7    BUILT: Wed Jan 28 09:59:01 CST 2015
device-mapper-event-libs-1.02.93-3.el7    BUILT: Wed Jan 28 09:59:01 CST 2015
device-mapper-persistent-data-0.4.1-2.el7    BUILT: Wed Nov 12 12:39:46 CST 2014
cmirror-2.02.115-3.el7    BUILT: Wed Jan 28 09:59:01 CST 2015


How reproducible:
Everytime

Comment 1 Corey Marthaler 2015-02-16 22:13:39 UTC
Created attachment 992381 [details]
log and dump of kern stack from cluster node host-127


Note You need to log in before you can comment on or make changes to this bug.