Bug 1380833 - mirror allocation/repair deadlock
Summary: mirror allocation/repair deadlock
Keywords:
Status: CLOSED DUPLICATE of bug 1380521
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-30 17:13 UTC by Corey Marthaler
Modified: 2021-09-03 12:38 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-03 18:38:39 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Corey Marthaler 2016-09-30 17:13:08 UTC
Description of problem:
================================================================================
Iteration 0.6 started at Thu Sep 29 17:17:49 CDT 2016
================================================================================
ACTUAL LEG ORDER: /dev/sdc1 /dev/sdb1
Scenario kill_non_primary_leg: Kill non primary leg
********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/sdc1 /dev/sdb1
* log devices:        
* failpv(s):          /dev/sdb1
* failnode(s):        host-117
* lvmetad:            1
* leg fault policy:   allocate
* log fault policy:   allocate
*************************************************


Current mirror/raid device structure(s):
  LV                  Attr       LSize   Cpy%Sync Devices                                  
   mirror_1            mwi-aom---   2.00g 100.00   mirror_1_mimage_0(0),mirror_1_mimage_1(0)
   [mirror_1_mimage_0] iwi-aom---   2.00g          /dev/sdc1(0)                             
   [mirror_1_mimage_1] iwi-aom---   2.00g          /dev/sdb1(0)                             
   root                -wi-ao----   6.20g          /dev/vda2(205)                           
   swap                -wi-ao---- 820.00m          /dev/vda2(0)                             


PV=/dev/sdb1
        mirror_1_mimage_1: 5.1
PV=/dev/sdb1
        mirror_1_mimage_1: 5.1

Writing verification files (checkit) to mirror(s) on...
        ---- host-117 ----

<start name="host-117_mirror_1"  pid="15277" time="Thu Sep 29 17:17:54 2016" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-117 ----

Name             GrpID RgID ObjType ArID ArStart ArSize  RMrg/s WMrg/s R/s  W/s  RSz/s WSz/s   AvgRqSz QSize Util% AWait RdAWait WrAWait
host-117_load        0    0 group      0 284.00m 980.00k   0.00   0.00 0.00 2.00     0 980.00k 490.00k  0.13  6.60 10.50    0.00   10.50
Name             GrpID RgID ObjType RgStart RgSize  #Areas ArSize  ProgID 
host-117_load        0    0 group   284.00m 980.00k      1 980.00k dmstats

Disabling device sdb on host-117rescan device...
  /dev/sdb1: read failed after 0 of 512 at 21467824128: Input/output error
  /dev/sdb1: read failed after 0 of 512 at 21467938816: Input/output error
  /dev/sdb1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdb1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sdb1: read failed after 0 of 2048 at 0: Input/output error


Getting recovery check start time from /var/log/messages: Sep 29 17:18
Attempting I/O to cause mirror down conversion(s) on host-117
dd if=/dev/zero of=/mnt/mirror_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.265292 s, 158 MB/s

[deadlock]


  LV                  VG            Attr       LSize   Cpy%Sync Devices                                  
  mirror_1            revolution_9  mwi-aom---   2.00g 0.00     mirror_1_mimage_0(0),mirror_1_mimage_1(0)
  [mirror_1_mimage_0] revolution_9  Iwi-aom---   2.00g          /dev/sdc1(0)                             
  [mirror_1_mimage_1] revolution_9  Iwi-aom---   2.00g          /dev/sdb1(0)                             




Sep 29 17:18:19 host-117 lvm[3828]: Log device 253:2 has failed (D).
Sep 29 17:18:19 host-117 lvm[3828]: Device failure in revolution_9-mirror_1.
Sep 29 17:18:19 host-117 lvm[3828]: WARNING: Device for PV 3Ho10n-LX0i-OVfp-48NY-l1Md-1is9-mUfiuT not found or rejected by a filter.
Sep 29 17:18:19 host-117 kernel: sd 2:0:0:1: rejecting I/O to offline device
Sep 29 17:18:19 host-117 kernel: sd 2:0:0:1: rejecting I/O to offline device
Sep 29 17:18:19 host-117 lvm[3828]: /dev/sdb1: read failed after 0 of 512 at 4096: Input/output error
Sep 29 17:18:19 host-117 kernel: sd 2:0:0:1: rejecting I/O to offline device
Sep 29 17:18:19 host-117 lvm[3828]: /dev/sdb1: read failed after 0 of 512 at 4096: Input/output error
Sep 29 17:18:19 host-117 lvm[3828]: WARNING: Failed to write an MDA of VG revolution_9.
Sep 29 17:18:19 host-117 kernel: sd 2:0:0:1: rejecting I/O to offline device
Sep 29 17:18:19 host-117 lvm[3828]: WARNING: Failed to write an MDA of VG revolution_9.
Sep 29 17:18:19 host-117 kernel: sd 2:0:0:1: rejecting I/O to offline device
Sep 29 17:18:19 host-117 lvm[3828]: WARNING: Failed to write an MDA of VG revolution_9.
Sep 29 17:18:19 host-117 multipathd: dm-3: remove map (uevent)
Sep 29 17:18:19 host-117 multipathd: dm-3: devmap not registered, can't remove
Sep 29 17:18:19 host-117 kernel: sd 2:0:0:1: rejecting I/O to offline device
Sep 29 17:18:19 host-117 lvm[3828]: WARNING: Failed to write an MDA of VG revolution_9.
Sep 29 17:18:19 host-117 multipathd: dm-3: remove map (uevent)
Sep 29 17:18:19 host-117 multipathd: dm-2: remove map (uevent)
Sep 29 17:18:19 host-117 multipathd: dm-2: devmap not registered, can't remove
Sep 29 17:18:19 host-117 kernel: sd 2:0:0:1: rejecting I/O to offline device
Sep 29 17:18:19 host-117 lvm[3828]: Mirror status: 1 of 2 images failed.
Sep 29 17:18:19 host-117 lvm[3828]: Trying to up-convert to 2 images, 0 logs.
Sep 29 17:18:19 host-117 lvm[3828]: WARNING: Failed to write an MDA of VG revolution_9.
Sep 29 17:18:19 host-117 multipathd: dm-2: remove map (uevent)
Sep 29 17:18:19 host-117 kernel: sd 2:0:0:1: rejecting I/O to offline device
Sep 29 17:18:19 host-117 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Sep 29 17:18:19 host-117 kernel: sd 2:0:0:1: rejecting I/O to offline device
Sep 29 17:18:19 host-117 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Sep 29 17:18:19 host-117 kernel: sd 2:0:0:1: rejecting I/O to offline device


Sep 29 17:18:34 host-117 lvm[3828]: revolution_9/mirror_1: Converted: 0.00%
Sep 29 17:18:49 host-117 lvm[3828]: revolution_9/mirror_1: Converted: 0.00%
Sep 29 17:19:04 host-117 lvm[3828]: revolution_9/mirror_1: Converted: 0.00%
Sep 29 17:19:19 host-117 lvm[3828]: revolution_9/mirror_1: Converted: 0.00%
Sep 29 17:19:34 host-117 lvm[3828]: revolution_9/mirror_1: Converted: 0.00%
Sep 29 17:19:49 host-117 lvm[3828]: revolution_9/mirror_1: Converted: 0.00%
Sep 29 17:20:01 host-117 systemd: Started Session 3 of user root.
Sep 29 17:20:01 host-117 systemd: Starting Session 3 of user root.
Sep 29 17:20:04 host-117 lvm[3828]: revolution_9/mirror_1: Converted: 0.00%
Sep 29 17:20:19 host-117 lvm[3828]: revolution_9/mirror_1: Converted: 0.00%
Sep 29 17:20:24 host-117 kernel: INFO: task jbd2/dm-6-8:3964 blocked for more than 120 seconds.
Sep 29 17:20:24 host-117 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 29 17:20:24 host-117 kernel: jbd2/dm-6-8     D ffff880028a99400     0  3964      2 0x00000080
Sep 29 17:20:24 host-117 kernel: ffff88003abefc90 0000000000000046 ffff880028081f60 ffff88003abeffd8
Sep 29 17:20:24 host-117 kernel: ffff88003abeffd8 ffff88003abeffd8 ffff880028081f60 ffff88003abefda0
Sep 29 17:20:24 host-117 kernel: ffff88003d3de8c0 ffff880028081f60 ffff88003abefd88 ffff880028a99400
Sep 29 17:20:24 host-117 kernel: Call Trace:
Sep 29 17:20:24 host-117 kernel: [<ffffffff8168b399>] schedule+0x29/0x70
Sep 29 17:20:24 host-117 kernel: [<ffffffffa037d238>] jbd2_journal_commit_transaction+0x248/0x19a0 [jbd2]
Sep 29 17:20:24 host-117 kernel: [<ffffffff810ca86e>] ? account_entity_dequeue+0xae/0xd0
Sep 29 17:20:24 host-117 kernel: [<ffffffff810ce35c>] ? dequeue_entity+0x11c/0x5d0
Sep 29 17:20:24 host-117 kernel: [<ffffffff81060abf>] ? kvm_clock_read+0x1f/0x30
Sep 29 17:20:24 host-117 kernel: [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0
Sep 29 17:20:24 host-117 kernel: [<ffffffff810b1600>] ? wake_up_atomic_t+0x30/0x30
Sep 29 17:20:24 host-117 kernel: [<ffffffff8109807e>] ? try_to_del_timer_sync+0x5e/0x90
Sep 29 17:20:24 host-117 kernel: [<ffffffffa0382e99>] kjournald2+0xc9/0x260 [jbd2]
Sep 29 17:20:24 host-117 kernel: [<ffffffff810b1600>] ? wake_up_atomic_t+0x30/0x30
Sep 29 17:20:24 host-117 kernel: [<ffffffffa0382dd0>] ? commit_timeout+0x10/0x10 [jbd2]
Sep 29 17:20:24 host-117 kernel: [<ffffffff810b052f>] kthread+0xcf/0xe0
Sep 29 17:20:24 host-117 kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
Sep 29 17:20:24 host-117 kernel: [<ffffffff81696218>] ret_from_fork+0x58/0x90
Sep 29 17:20:24 host-117 kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
Sep 29 17:20:24 host-117 kernel: INFO: task kworker/u2:2:4786 blocked for more than 120 seconds.
Sep 29 17:20:24 host-117 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 29 17:20:24 host-117 kernel: kworker/u2:2    D ffffffff81689410     0  4786      2 0x00000080
Sep 29 17:20:24 host-117 kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:6)
Sep 29 17:20:24 host-117 kernel: ffff8800224eb830 0000000000000046 ffff880020d48000 ffff8800224ebfd8
Sep 29 17:20:24 host-117 kernel: ffff8800224ebfd8 ffff8800224ebfd8 ffff880020d48000 ffff88003fc16c40
Sep 29 17:20:24 host-117 kernel: 0000000000000000 7fffffffffffffff ffff88003ff5e6e8 ffffffff81689410
Sep 29 17:20:24 host-117 kernel: Call Trace:
Sep 29 17:20:24 host-117 kernel: [<ffffffff81689410>] ? bit_wait+0x50/0x50
Sep 29 17:20:24 host-117 kernel: [<ffffffff8168b399>] schedule+0x29/0x70
Sep 29 17:20:24 host-117 kernel: [<ffffffff81688de9>] schedule_timeout+0x239/0x2d0
Sep 29 17:20:24 host-117 kernel: [<ffffffff815b6513>] ? ip_output+0x73/0xe0
Sep 29 17:20:24 host-117 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30
Sep 29 17:20:24 host-117 kernel: [<ffffffff810eaa6c>] ? ktime_get_ts64+0x4c/0xf0
Sep 29 17:20:24 host-117 kernel: [<ffffffff81689410>] ? bit_wait+0x50/0x50
Sep 29 17:20:24 host-117 kernel: [<ffffffff8168a93e>] io_schedule_timeout+0xae/0x130
Sep 29 17:20:24 host-117 kernel: [<ffffffff8168a9d8>] io_schedule+0x18/0x20
Sep 29 17:20:24 host-117 kernel: [<ffffffff81689421>] bit_wait_io+0x11/0x50
Sep 29 17:20:24 host-117 kernel: [<ffffffff81688f45>] __wait_on_bit+0x65/0x90
Sep 29 17:20:24 host-117 kernel: [<ffffffff8117fb61>] wait_on_page_bit+0x81/0xa0
Sep 29 17:20:24 host-117 kernel: [<ffffffff810b16c0>] ? wake_bit_function+0x40/0x40
Sep 29 17:20:24 host-117 kernel: [<ffffffffa039d273>] mpage_prepare_extent_to_map+0x1e3/0x2e0 [ext4]
Sep 29 17:20:24 host-117 kernel: [<ffffffffa03a187b>] ext4_writepages+0x45b/0xd60 [ext4]
Sep 29 17:20:24 host-117 kernel: [<ffffffff812ab683>] ? selinux_socket_sendmsg+0x23/0x30
Sep 29 17:20:24 host-117 kernel: [<ffffffff810c95a8>] ? __enqueue_entity+0x78/0x80
Sep 29 17:20:24 host-117 kernel: [<ffffffff8118d24e>] do_writepages+0x1e/0x40
Sep 29 17:20:24 host-117 kernel: [<ffffffff81228670>] __writeback_single_inode+0x40/0x210
Sep 29 17:20:24 host-117 kernel: [<ffffffff8122935e>] writeback_sb_inodes+0x25e/0x420
Sep 29 17:20:24 host-117 kernel: [<ffffffff81229c9f>] wb_writeback+0xff/0x2f0
Sep 29 17:20:24 host-117 kernel: [<ffffffff810ab776>] ? set_worker_desc+0x86/0xb0
Sep 29 17:20:24 host-117 kernel: [<ffffffff8122bc45>] bdi_writeback_workfn+0x115/0x460
Sep 29 17:20:24 host-117 kernel: [<ffffffff810a7f3b>] process_one_work+0x17b/0x470
Sep 29 17:20:24 host-117 kernel: [<ffffffff810a8d76>] worker_thread+0x126/0x410


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Corey Marthaler 2016-09-30 17:13:40 UTC
3.10.0-510.el7.x86_64

lvm2-2.02.166-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
lvm2-libs-2.02.166-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
lvm2-cluster-2.02.166-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-libs-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-event-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-event-libs-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016

Comment 3 Corey Marthaler 2016-10-03 16:10:49 UTC
This is reproducible easily.  It appears that lvm may be using the failed device for allocation of the new device?

================================================================================
Iteration 0.2 started at Mon Oct  3 10:52:58 CDT 2016
================================================================================
ACTUAL LEG ORDER: /dev/sda1 /dev/sdc1 /dev/sde1
Scenario kill_primary_leg: Kill primary leg
********* Mirror info for this scenario *********
* mirrors:            mirror_1
* leg devices:        /dev/sda1 /dev/sdc1 /dev/sde1
* log devices:        /dev/sdg1
* failpv(s):          /dev/sda1
* failnode(s):        host-122
* lvmetad:            1
* leg fault policy:   allocate
* log fault policy:   allocate
*************************************************
                                                                                                                                                                                                                  
Current mirror/raid device structure(s):                                                                                                                                                                          
  LV                  Attr       LSize Cpy%Sync Devices                                                                                                                                                           
  mirror_1            mwi-aom--- 2.00g 100.00   mirror_1_mimage_0(0),mirror_1_mimage_1(0),mirror_1_mimage_2(0)                                                                                                    
  [mirror_1_mimage_0] iwi-aom--- 2.00g          /dev/sda1(0)                                                                                                                                                      
  [mirror_1_mimage_1] iwi-aom--- 2.00g          /dev/sdc1(0)                                                                                                                                                      
  [mirror_1_mimage_2] iwi-aom--- 2.00g          /dev/sde1(0)                                                                                                                                                      
  [mirror_1_mlog]     lwi-aom--- 4.00m          /dev/sdg1(0)

PV=/dev/sda1
        mirror_1_mimage_0: 4.1
PV=/dev/sda1
        mirror_1_mimage_0: 4.1

Writing verification files (checkit) to mirror(s) on...
        ---- host-122 ----

<start name="host-122_mirror_1"  pid="9561" time="Mon Oct  3 10:53:05 2016" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-122 ----


Disabling device sda on host-122rescan device...
  /dev/sda1: read failed after 0 of 512 at 21467824128: Input/output error
  /dev/sda1: read failed after 0 of 512 at 21467938816: Input/output error
  /dev/sda1: read failed after 0 of 512 at 0: Input/output error
  /dev/sda1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sda1: read failed after 0 of 2048 at 0: Input/output error


Getting recovery check start time from /var/log/messages: Oct  3 10:53
Attempting I/O to cause mirror down conversion(s) on host-122
dd if=/dev/zero of=/mnt/mirror_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.139669 s, 300 MB/s

[root@host-122 ~]# lvs -a -o +devices
  LV                     VG            Attr       LSize    Log             Cpy%Sync Convert                Devices
  mirror_1               revolution_9  cwi-aom---   2.00g                  0.00     [mirror_1_mimagetmp_3] mirror_1_mimagetmp_3(0),mirror_1_mimage_3(0)
  [mirror_1_mimage_1]    revolution_9  iwi-aom---   2.00g                                                  /dev/sdc1(0)
  [mirror_1_mimage_2]    revolution_9  iwi-aom---   2.00g                                                  /dev/sde1(0)
  [mirror_1_mimage_3]    revolution_9  Iwi-aom---   2.00g                                                  /dev/sda1(0)
  [mirror_1_mimagetmp_3] revolution_9  mwi-aom---   2.00g  [mirror_1_mlog] 100.00                          mirror_1_mimage_1(0),mirror_1_mimage_2(0)   
  [mirror_1_mlog]        revolution_9  lwi-aom---   4.00m                                                  /dev/sdg1(0)


# LOG

Oct  3 10:53:27 host-122 qarshd[4743]: Running cmdline: echo offline > /sys/block/sda/device/state
Oct  3 10:53:28 host-122 qarshd[4747]: Running cmdline: pvscan --cache /dev/sda1
Oct  3 10:53:28 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:28 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:28 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:28 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:28 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:28 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
[...]
Oct  3 10:53:28 host-122 lvm[1278]: Log device 253:2 has failed (D).
Oct  3 10:53:28 host-122 lvm[1278]: Device failure in revolution_9-mirror_1.
Oct  3 10:53:28 host-122 lvm[1278]: WARNING: Device for PV EvH7eU-edwm-qhxJ-w3LI-PgGP-s3xE-pt0sGE not found or rejected by a filter.
Oct  3 10:53:28 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:28 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:28 host-122 lvm[1278]: /dev/sda1: read failed after 0 of 512 at 4096: Input/output error
Oct  3 10:53:29 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:29 host-122 lvm[1278]: /dev/sda1: read failed after 0 of 512 at 4096: Input/output error
Oct  3 10:53:29 host-122 lvm[1278]: WARNING: Failed to write an MDA of VG revolution_9.
Oct  3 10:53:29 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:29 host-122 lvm[1278]: WARNING: Failed to write an MDA of VG revolution_9.
Oct  3 10:53:29 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:29 host-122 lvm[1278]: WARNING: Failed to write an MDA of VG revolution_9.
Oct  3 10:53:29 host-122 multipathd: dm-3: remove map (uevent)
Oct  3 10:53:29 host-122 multipathd: dm-3: devmap not registered, can't remove
Oct  3 10:53:29 host-122 multipathd: dm-3: remove map (uevent)
Oct  3 10:53:29 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:29 host-122 lvm[1278]: Mirror status: 1 of 3 images failed.
Oct  3 10:53:29 host-122 lvm[1278]: Trying to up-convert to 3 images, 1 logs.
Oct  3 10:53:29 host-122 lvm[1278]: WARNING: Failed to write an MDA of VG revolution_9.
Oct  3 10:53:29 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:29 host-122 lvm[1278]: WARNING: Failed to write an MDA of VG revolution_9.
Oct  3 10:53:29 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:29 host-122 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Oct  3 10:53:30 host-122 lvm[1278]: WARNING: Failed to replace 1 of 3 images in volume revolution_9/mirror_1.
Oct  3 10:53:30 host-122 lvm[1278]: WARNING: Failed to replace 1 of 1 logs in volume revolution_9/mirror_1.
Oct  3 10:53:30 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:30 host-122 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Oct  3 10:53:30 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:30 host-122 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Oct  3 10:53:30 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:30 host-122 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Oct  3 10:53:30 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:30 host-122 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Oct  3 10:53:30 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:30 host-122 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Oct  3 10:53:30 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:30 host-122 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Oct  3 10:53:30 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:30 host-122 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Oct  3 10:53:30 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:30 host-122 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
Oct  3 10:53:30 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:30 host-122 kernel: device-mapper: raid1: Write error during recovery (error = 0x1)
[...]
Oct  3 10:53:41 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:41 host-122 kernel: sd 3:0:0:1: rejecting I/O to offline device
Oct  3 10:53:45 host-122 lvm[1278]: revolution_9/mirror_1: Converted: 0.00%
Oct  3 10:54:00 host-122 lvm[1278]: revolution_9/mirror_1: Converted: 0.00%
Oct  3 10:54:15 host-122 lvm[1278]: revolution_9/mirror_1: Converted: 0.00%
Oct  3 10:54:30 host-122 lvm[1278]: revolution_9/mirror_1: Converted: 0.00%
Oct  3 10:54:45 host-122 lvm[1278]: revolution_9/mirror_1: Converted: 0.00%
Oct  3 10:55:00 host-122 lvm[1278]: revolution_9/mirror_1: Converted: 0.00%
Oct  3 10:55:15 host-122 lvm[1278]: revolution_9/mirror_1: Converted: 0.00%
Oct  3 10:55:30 host-122 lvm[1278]: revolution_9/mirror_1: Converted: 0.00%
Oct  3 10:55:42 host-122 kernel: INFO: task jbd2/dm-6-8:4143 blocked for more than 120 seconds.
Oct  3 10:55:42 host-122 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  3 10:55:42 host-122 kernel: jbd2/dm-6-8     D ffff880027d54400     0  4143      2 0x00000080
Oct  3 10:55:42 host-122 kernel: ffff88003b403c90 0000000000000046 ffff88003aa94e70 ffff88003b403fd8
Oct  3 10:55:42 host-122 kernel: ffff88003b403fd8 ffff88003b403fd8 ffff88003aa94e70 ffff88003b403da0
Oct  3 10:55:42 host-122 kernel: ffff88003bd4a8c0 ffff88003aa94e70 ffff88003b403d88 ffff880027d54400
Oct  3 10:55:42 host-122 kernel: Call Trace:
Oct  3 10:55:42 host-122 kernel: [<ffffffff8168b399>] schedule+0x29/0x70
Oct  3 10:55:42 host-122 kernel: [<ffffffffa03a4238>] jbd2_journal_commit_transaction+0x248/0x19a0 [jbd2]
Oct  3 10:55:42 host-122 kernel: [<ffffffff810ca86e>] ? account_entity_dequeue+0xae/0xd0
Oct  3 10:55:42 host-122 kernel: [<ffffffff810ce35c>] ? dequeue_entity+0x11c/0x5d0
Oct  3 10:55:42 host-122 kernel: [<ffffffff81060abf>] ? kvm_clock_read+0x1f/0x30
Oct  3 10:55:42 host-122 kernel: [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0
Oct  3 10:55:42 host-122 kernel: [<ffffffff810b1600>] ? wake_up_atomic_t+0x30/0x30
Oct  3 10:55:42 host-122 kernel: [<ffffffff8109807e>] ? try_to_del_timer_sync+0x5e/0x90
Oct  3 10:55:42 host-122 kernel: [<ffffffffa03a9e99>] kjournald2+0xc9/0x260 [jbd2]
Oct  3 10:55:42 host-122 kernel: [<ffffffff810b1600>] ? wake_up_atomic_t+0x30/0x30
Oct  3 10:55:42 host-122 kernel: [<ffffffffa03a9dd0>] ? commit_timeout+0x10/0x10 [jbd2]
Oct  3 10:55:42 host-122 kernel: [<ffffffff810b052f>] kthread+0xcf/0xe0
Oct  3 10:55:42 host-122 kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
Oct  3 10:55:42 host-122 kernel: [<ffffffff81696218>] ret_from_fork+0x58/0x90
Oct  3 10:55:42 host-122 kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
Oct  3 10:55:42 host-122 kernel: INFO: task kworker/u2:4:4224 blocked for more than 120 seconds.
Oct  3 10:55:42 host-122 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  3 10:55:42 host-122 kernel: kworker/u2:4    D ffffffff81689410     0  4224      2 0x00000080
Oct  3 10:55:42 host-122 kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:6)
Oct  3 10:55:42 host-122 kernel: ffff88001f48f830 0000000000000046 ffff88003aa96dd0 ffff88001f48ffd8
Oct  3 10:55:42 host-122 kernel: ffff88001f48ffd8 ffff88001f48ffd8 ffff88003aa96dd0 ffff88003fc16c40
Oct  3 10:55:42 host-122 kernel: 0000000000000000 7fffffffffffffff ffff88003ff5b6e8 ffffffff81689410
Oct  3 10:55:42 host-122 kernel: Call Trace:
Oct  3 10:55:42 host-122 kernel: [<ffffffff81689410>] ? bit_wait+0x50/0x50
Oct  3 10:55:42 host-122 kernel: [<ffffffff8168b399>] schedule+0x29/0x70
Oct  3 10:55:42 host-122 kernel: [<ffffffff81688de9>] schedule_timeout+0x239/0x2d0
Oct  3 10:55:42 host-122 kernel: [<ffffffff815b6513>] ? ip_output+0x73/0xe0
Oct  3 10:55:42 host-122 kernel: [<ffffffff81060aef>] ? kvm_clock_get_cycles+0x1f/0x30
Oct  3 10:55:42 host-122 kernel: [<ffffffff810eaa6c>] ? ktime_get_ts64+0x4c/0xf0
Oct  3 10:55:42 host-122 kernel: [<ffffffff81689410>] ? bit_wait+0x50/0x50
Oct  3 10:55:42 host-122 kernel: [<ffffffff8168a93e>] io_schedule_timeout+0xae/0x130
Oct  3 10:55:42 host-122 kernel: [<ffffffff8168a9d8>] io_schedule+0x18/0x20
Oct  3 10:55:42 host-122 kernel: [<ffffffff81689421>] bit_wait_io+0x11/0x50
Oct  3 10:55:42 host-122 kernel: [<ffffffff81688f45>] __wait_on_bit+0x65/0x90
Oct  3 10:55:42 host-122 kernel: [<ffffffff8117fb61>] wait_on_page_bit+0x81/0xa0
Oct  3 10:55:42 host-122 kernel: [<ffffffff810b16c0>] ? wake_bit_function+0x40/0x40
Oct  3 10:55:42 host-122 kernel: [<ffffffffa03c4273>] mpage_prepare_extent_to_map+0x1e3/0x2e0 [ext4]
Oct  3 10:55:42 host-122 kernel: [<ffffffffa03c887b>] ext4_writepages+0x45b/0xd60 [ext4]
Oct  3 10:55:42 host-122 kernel: [<ffffffff812ab683>] ? selinux_socket_sendmsg+0x23/0x30
Oct  3 10:55:42 host-122 kernel: [<ffffffff810c95a8>] ? __enqueue_entity+0x78/0x80
Oct  3 10:55:42 host-122 kernel: [<ffffffff8118d24e>] do_writepages+0x1e/0x40
Oct  3 10:55:42 host-122 kernel: [<ffffffff81228670>] __writeback_single_inode+0x40/0x210
Oct  3 10:55:42 host-122 kernel: [<ffffffff8122935e>] writeback_sb_inodes+0x25e/0x420
Oct  3 10:55:42 host-122 kernel: [<ffffffff81229c9f>] wb_writeback+0xff/0x2f0
Oct  3 10:55:42 host-122 kernel: [<ffffffff810ab776>] ? set_worker_desc+0x86/0xb0
Oct  3 10:55:42 host-122 kernel: [<ffffffff8122bc45>] bdi_writeback_workfn+0x115/0x460
Oct  3 10:55:42 host-122 kernel: [<ffffffff810a7f3b>] process_one_work+0x17b/0x470
Oct  3 10:55:42 host-122 kernel: [<ffffffff810a8d76>] worker_thread+0x126/0x410
Oct  3 10:55:42 host-122 kernel: [<ffffffff810a8c50>] ? rescuer_thread+0x460/0x460
Oct  3 10:55:42 host-122 kernel: [<ffffffff810b052f>] kthread+0xcf/0xe0
Oct  3 10:55:42 host-122 kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
Oct  3 10:55:42 host-122 kernel: [<ffffffff81696218>] ret_from_fork+0x58/0x90
Oct  3 10:55:42 host-122 kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140

Comment 4 Corey Marthaler 2016-10-03 18:38:39 UTC
same scenarios and appears to be the same issue, marking as dup of 1380521.

*** This bug has been marked as a duplicate of bug 1380521 ***


Note You need to log in before you can comment on or make changes to this bug.