Bug 1292481 - device mapper hung tasks on an openshift/docker system
device mapper hung tasks on an openshift/docker system
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.3
All Linux
urgent Severity urgent
: rc
: ---
Assigned To: Mike Snitzer
Bruno Goncalves
: ZStream
: 1267966 (view as bug list)
Depends On:
Blocks: 1296566
  Show dependency treegraph
 
Reported: 2015-12-17 09:48 EST by Jeremy Eder
Modified: 2016-11-03 10:44 EDT (History)
26 users (show)

See Also:
Fixed In Version: kernel-3.10.0-337.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1296566 (view as bug list)
Environment:
Last Closed: 2016-11-03 10:44:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jeremy Eder 2015-12-17 09:48:43 EST
Description of problem:

Several recent reports of hung tasks when running many containers (several dozen, up to 100).  This seems to be a new issue, as we've been doing that sort of scale for a long time.

http://post-office.corp.redhat.com/archives/aos-devel/2015-December/msg00571.html

The relevant components are:

RHEL7.2 kernel-3.10.0-327
docker-1.8.2-10
thin-lvm via docker-storage-setup
Comment 1 Jeremy Eder 2015-12-17 09:51:12 EST
Here is the last container that was attempted to be started before the issue occurred:

Dec 15 07:17:09 ose3-master.example.com systemd[1]: Started docker container 74850fb6e89c5e021eab904829d8539973389cac3683c79bde5e3f34793b1bfc.
Dec 15 07:17:09 ose3-master.example.com systemd[1]: Starting docker container 74850fb6e89c5e021eab904829d8539973389cac3683c79bde5e3f34793b1bfc.
Dec 15 07:17:09 ose3-master.example.com kernel: XFS (dm-77): Unmounting Filesystem
Dec 15 07:17:09 ose3-master.example.com kernel: SELinux: initialized (dev tmpfs, type tmpfs), uses mountpoint labeling
Dec 15 07:17:09 ose3-master.example.com kernel: SELinux: initialized (dev devpts, type devpts), uses mountpoint labeling
Dec 15 07:17:09 ose3-master.example.com kernel: SELinux: initialized (dev tmpfs, type tmpfs), uses mountpoint labeling
Dec 15 07:17:09 ose3-master.example.com kernel: SELinux: initialized (dev tmpfs, type tmpfs), uses mountpoint labeling
Dec 15 07:17:09 ose3-master.example.com docker[7714]: time="2015-12-15T07:17:09.413642640-05:00" level=error msg="Handler for GET /containers/{name:.*}/json returned error: Unknown device 713377cde7341cb5536ec15b9ada9a21bc41ad8e4a18d75b500f63f63c74c4c9"
Dec 15 07:17:09 ose3-master.example.com docker[7714]: time="2015-12-15T07:17:09.413678122-05:00" level=error msg="HTTP Error" err="Unknown device 713377cde7341cb5536ec15b9ada9a21bc41ad8e4a18d75b500f63f63c74c4c9" statusCode=500
Dec 15 07:17:09 ose3-master.example.com kernel: XFS (dm-80): Unmounting Filesystem

We don't see the "Unknown device" error above on any normal/healthy system.

A few minutes later, we see the following tracebacks in dmesg:

[ 1922.389502] INFO: task kworker/u66:0:81 blocked for more than 120 seconds.
[ 1922.396371] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1922.404206] kworker/u66:0   D ffff8805bd5c28e0     0    81      2 0x00000000
[ 1922.411293] Workqueue: kdmremove do_deferred_remove [dm_mod]
[ 1922.416963]  ffff88060e937cf0 0000000000000046 ffff88060e917300 ffff88060e937fd8
[ 1922.424387]  ffff88060e937fd8 ffff88060e937fd8 ffff88060e917300 ffff8805bd5c28d8
[ 1922.431809]  ffff8805bd5c28dc ffff88060e917300 00000000ffffffff ffff8805bd5c28e0
[ 1922.439230] Call Trace:
[ 1922.441676]  [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70
[ 1922.448120]  [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0
[ 1922.454290]  [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f
[ 1922.459423]  [<ffffffffa0002e9d>] __dm_destroy+0xad/0x340 [dm_mod]
[ 1922.465595]  [<ffffffffa00047e3>] dm_destroy+0x13/0x20 [dm_mod]
[ 1922.471509]  [<ffffffffa0008d6d>] dm_hash_remove_all+0x6d/0x130 [dm_mod]
[ 1922.478203]  [<ffffffffa000b50a>] dm_deferred_remove+0x1a/0x20 [dm_mod]
[ 1922.484806]  [<ffffffffa0000dae>] do_deferred_remove+0xe/0x10 [dm_mod]
[ 1922.491321]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[ 1922.497143]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[ 1922.502703]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[ 1922.508528]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
[ 1922.513439]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 1922.520018]  [<ffffffff81645858>] ret_from_fork+0x58/0x90
[ 1922.525427]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 1922.532040] INFO: task docker:77805 blocked for more than 120 seconds.
[ 1922.538610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1922.546456] docker          D 0000000000000000     0 77805      1 0x00000080
[ 1922.553635]  ffff8805bf213ab0 0000000000000086 ffff880bbb116780 ffff8805bf213fd8
[ 1922.561164]  ffff8805bf213fd8 ffff8805bf213fd8 ffff880bbb116780 ffff8805bf213bf0
[ 1922.568636]  ffff8805bf213bf8 7fffffffffffffff ffff880bbb116780 0000000000000000
[ 1922.576138] Call Trace:
[ 1922.578608]  [<ffffffff8163a909>] schedule+0x29/0x70
[ 1922.583563]  [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[ 1922.589386]  [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240
[ 1922.594774]  [<ffffffff8163acd6>] wait_for_completion+0x116/0x170
[ 1922.600858]  [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20
[ 1922.606430]  [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0
[ 1922.612459]  [<ffffffff810ab190>] ? call_srcu+0x70/0x70
[ 1922.617728]  [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40
[ 1922.623479]  [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20
[ 1922.629157]  [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod]
[ 1922.635346]  [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod]
[ 1922.641269]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 1922.647560]  [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod]
[ 1922.653743]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 1922.660025]  [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod]
[ 1922.666055]  [<ffffffff8112483d>] ? call_rcu_sched+0x1d/0x20
[ 1922.671720]  [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 1922.677827]  [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0
[ 1922.683333]  [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0
[ 1922.688909]  [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0
[ 1922.693950]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b
[ 2042.704387] INFO: task kworker/u66:0:81 blocked for more than 120 seconds.
[ 2042.711258] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2042.719072] kworker/u66:0   D ffff8805bd5c28e0     0    81      2 0x00000000
[ 2042.726146] Workqueue: kdmremove do_deferred_remove [dm_mod]
[ 2042.731799]  ffff88060e937cf0 0000000000000046 ffff88060e917300 ffff88060e937fd8
[ 2042.739218]  ffff88060e937fd8 ffff88060e937fd8 ffff88060e917300 ffff8805bd5c28d8
[ 2042.746635]  ffff8805bd5c28dc ffff88060e917300 00000000ffffffff ffff8805bd5c28e0
[ 2042.754052] Call Trace:
[ 2042.756496]  [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70
[ 2042.762923]  [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0
[ 2042.769092]  [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f
[ 2042.774223]  [<ffffffffa0002e9d>] __dm_destroy+0xad/0x340 [dm_mod]
[ 2042.780401]  [<ffffffffa00047e3>] dm_destroy+0x13/0x20 [dm_mod]
[ 2042.786317]  [<ffffffffa0008d6d>] dm_hash_remove_all+0x6d/0x130 [dm_mod]
[ 2042.793009]  [<ffffffffa000b50a>] dm_deferred_remove+0x1a/0x20 [dm_mod]
[ 2042.799613]  [<ffffffffa0000dae>] do_deferred_remove+0xe/0x10 [dm_mod]
[ 2042.806129]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[ 2042.811950]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[ 2042.817511]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[ 2042.823333]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
[ 2042.828200]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 2042.834717]  [<ffffffff81645858>] ret_from_fork+0x58/0x90
[ 2042.840105]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 2042.846690] INFO: task docker:77805 blocked for more than 120 seconds.
[ 2042.853203] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2042.861019] docker          D 0000000000000000     0 77805      1 0x00000080
[ 2042.868084]  ffff8805bf213ab0 0000000000000086 ffff880bbb116780 ffff8805bf213fd8
[ 2042.875502]  ffff8805bf213fd8 ffff8805bf213fd8 ffff880bbb116780 ffff8805bf213bf0
[ 2042.882922]  ffff8805bf213bf8 7fffffffffffffff ffff880bbb116780 0000000000000000
[ 2042.890344] Call Trace:
[ 2042.892786]  [<ffffffff8163a909>] schedule+0x29/0x70
[ 2042.897741]  [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[ 2042.903567]  [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240
[ 2042.908961]  [<ffffffff8163acd6>] wait_for_completion+0x116/0x170
[ 2042.915047]  [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20
[ 2042.920610]  [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0
[ 2042.926604]  [<ffffffff810ab190>] ? call_srcu+0x70/0x70
[ 2042.931820]  [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40
[ 2042.937555]  [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20
[ 2042.943222]  [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod]
[ 2042.949395]  [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod]
[ 2042.955320]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 2042.961596]  [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod]
[ 2042.967782]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 2042.974049]  [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod]
[ 2042.980057]  [<ffffffff8112483d>] ? call_rcu_sched+0x1d/0x20
[ 2042.985726]  [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 2042.991817]  [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0
[ 2042.997304]  [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0
[ 2043.002875]  [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0
[ 2043.007930]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b
[ 2163.018203] INFO: task kworker/u66:0:81 blocked for more than 120 seconds.
[ 2163.025074] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2163.032890] kworker/u66:0   D ffff8805bd5c28e0     0    81      2 0x00000000
[ 2163.039964] Workqueue: kdmremove do_deferred_remove [dm_mod]
[ 2163.045617]  ffff88060e937cf0 0000000000000046 ffff88060e917300 ffff88060e937fd8
[ 2163.053035]  ffff88060e937fd8 ffff88060e937fd8 ffff88060e917300 ffff8805bd5c28d8
[ 2163.060453]  ffff8805bd5c28dc ffff88060e917300 00000000ffffffff ffff8805bd5c28e0
[ 2163.067870] Call Trace:
[ 2163.070318]  [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70
[ 2163.076747]  [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0
[ 2163.082915]  [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f
[ 2163.088045]  [<ffffffffa0002e9d>] __dm_destroy+0xad/0x340 [dm_mod]
[ 2163.094266]  [<ffffffffa00047e3>] dm_destroy+0x13/0x20 [dm_mod]
[ 2163.100209]  [<ffffffffa0008d6d>] dm_hash_remove_all+0x6d/0x130 [dm_mod]
[ 2163.106948]  [<ffffffffa000b50a>] dm_deferred_remove+0x1a/0x20 [dm_mod]
[ 2163.113553]  [<ffffffffa0000dae>] do_deferred_remove+0xe/0x10 [dm_mod]
[ 2163.120071]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[ 2163.125892]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[ 2163.131462]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[ 2163.137291]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
[ 2163.142169]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 2163.148693]  [<ffffffff81645858>] ret_from_fork+0x58/0x90
[ 2163.154085]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 2163.160681] INFO: task docker:77805 blocked for more than 120 seconds.
[ 2163.167200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2163.175017] docker          D 0000000000000000     0 77805      1 0x00000080
[ 2163.182118]  ffff8805bf213ab0 0000000000000086 ffff880bbb116780 ffff8805bf213fd8
[ 2163.189533]  ffff8805bf213fd8 ffff8805bf213fd8 ffff880bbb116780 ffff8805bf213bf0
[ 2163.196987]  ffff8805bf213bf8 7fffffffffffffff ffff880bbb116780 0000000000000000
[ 2163.204468] Call Trace:
[ 2163.206941]  [<ffffffff8163a909>] schedule+0x29/0x70
[ 2163.211896]  [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[ 2163.217749]  [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240
[ 2163.223136]  [<ffffffff8163acd6>] wait_for_completion+0x116/0x170
[ 2163.229251]  [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20
[ 2163.234816]  [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0
[ 2163.240840]  [<ffffffff810ab190>] ? call_srcu+0x70/0x70
[ 2163.246060]  [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40
[ 2163.251825]  [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20
[ 2163.257523]  [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod]
[ 2163.263736]  [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod]
[ 2163.269691]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 2163.275982]  [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod]
[ 2163.282158]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 2163.288456]  [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod]
[ 2163.294461]  [<ffffffff8112483d>] ? call_rcu_sched+0x1d/0x20
[ 2163.300149]  [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 2163.306265]  [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0
[ 2163.311772]  [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0
[ 2163.317363]  [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0
[ 2163.322435]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b
[ 2283.332984] INFO: task kworker/u66:0:81 blocked for more than 120 seconds.
[ 2283.339855] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2283.347669] kworker/u66:0   D ffff8805bd5c28e0     0    81      2 0x00000000
[ 2283.354741] Workqueue: kdmremove do_deferred_remove [dm_mod]
[ 2283.360396]  ffff88060e937cf0 0000000000000046 ffff88060e917300 ffff88060e937fd8
[ 2283.367816]  ffff88060e937fd8 ffff88060e937fd8 ffff88060e917300 ffff8805bd5c28d8
[ 2283.375240]  ffff8805bd5c28dc ffff88060e917300 00000000ffffffff ffff8805bd5c28e0
[ 2283.382659] Call Trace:
[ 2283.385105]  [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70
[ 2283.391534]  [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0
[ 2283.397704]  [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f
[ 2283.402838]  [<ffffffffa0002e9d>] __dm_destroy+0xad/0x340 [dm_mod]
[ 2283.409013]  [<ffffffffa00047e3>] dm_destroy+0x13/0x20 [dm_mod]
[ 2283.414925]  [<ffffffffa0008d6d>] dm_hash_remove_all+0x6d/0x130 [dm_mod]
[ 2283.421617]  [<ffffffffa000b50a>] dm_deferred_remove+0x1a/0x20 [dm_mod]
[ 2283.428220]  [<ffffffffa0000dae>] do_deferred_remove+0xe/0x10 [dm_mod]
[ 2283.434736]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[ 2283.440557]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[ 2283.446116]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[ 2283.451938]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
[ 2283.456806]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 2283.463324]  [<ffffffff81645858>] ret_from_fork+0x58/0x90
[ 2283.468713]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 2283.475295] INFO: task docker:77805 blocked for more than 120 seconds.
[ 2283.481811] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2283.489625] docker          D 0000000000000000     0 77805      1 0x00000080
[ 2283.496681]  ffff8805bf213ab0 0000000000000086 ffff880bbb116780 ffff8805bf213fd8
[ 2283.504208]  ffff8805bf213fd8 ffff8805bf213fd8 ffff880bbb116780 ffff8805bf213bf0
[ 2283.511781]  ffff8805bf213bf8 7fffffffffffffff ffff880bbb116780 0000000000000000
[ 2283.519340] Call Trace:
[ 2283.521815]  [<ffffffff8163a909>] schedule+0x29/0x70
[ 2283.526805]  [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[ 2283.532663]  [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240
[ 2283.538091]  [<ffffffff8163acd6>] wait_for_completion+0x116/0x170
[ 2283.544212]  [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20
[ 2283.549809]  [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0
[ 2283.555841]  [<ffffffff810ab190>] ? call_srcu+0x70/0x70
[ 2283.561092]  [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40
[ 2283.566877]  [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20
[ 2283.572535]  [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod]
[ 2283.578729]  [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod]
[ 2283.584683]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 2283.590986]  [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod]
[ 2283.597200]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 2283.603502]  [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod]
[ 2283.609546]  [<ffffffff8112483d>] ? call_rcu_sched+0x1d/0x20
[ 2283.615248]  [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 2283.621372]  [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0
[ 2283.626891]  [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0
[ 2283.632495]  [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0
[ 2283.637570]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b
[ 2403.647866] INFO: task kworker/u66:0:81 blocked for more than 120 seconds.
[ 2403.654736] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2403.662549] kworker/u66:0   D ffff8805bd5c28e0     0    81      2 0x00000000
[ 2403.669622] Workqueue: kdmremove do_deferred_remove [dm_mod]
[ 2403.675276]  ffff88060e937cf0 0000000000000046 ffff88060e917300 ffff88060e937fd8
[ 2403.682696]  ffff88060e937fd8 ffff88060e937fd8 ffff88060e917300 ffff8805bd5c28d8
[ 2403.690138]  ffff8805bd5c28dc ffff88060e917300 00000000ffffffff ffff8805bd5c28e0
[ 2403.697722] Call Trace:
[ 2403.700209]  [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70
[ 2403.706637]  [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0
[ 2403.712806]  [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f
[ 2403.717942]  [<ffffffffa0002e9d>] __dm_destroy+0xad/0x340 [dm_mod]
[ 2403.724114]  [<ffffffffa00047e3>] dm_destroy+0x13/0x20 [dm_mod]
[ 2403.730029]  [<ffffffffa0008d6d>] dm_hash_remove_all+0x6d/0x130 [dm_mod]
[ 2403.736722]  [<ffffffffa000b50a>] dm_deferred_remove+0x1a/0x20 [dm_mod]
[ 2403.743367]  [<ffffffffa0000dae>] do_deferred_remove+0xe/0x10 [dm_mod]
[ 2403.749926]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[ 2403.755747]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[ 2403.761354]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[ 2403.767217]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
[ 2403.772128]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 2403.778658]  [<ffffffff81645858>] ret_from_fork+0x58/0x90
[ 2403.784083]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 2403.790707] INFO: task docker:77805 blocked for more than 120 seconds.
[ 2403.797220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2403.805035] docker          D 0000000000000000     0 77805      1 0x00000080
[ 2403.812203]  ffff8805bf213ab0 0000000000000086 ffff880bbb116780 ffff8805bf213fd8
[ 2403.819787]  ffff8805bf213fd8 ffff8805bf213fd8 ffff880bbb116780 ffff8805bf213bf0
[ 2403.827237]  ffff8805bf213bf8 7fffffffffffffff ffff880bbb116780 0000000000000000
[ 2403.834721] Call Trace:
[ 2403.837213]  [<ffffffff8163a909>] schedule+0x29/0x70
[ 2403.842209]  [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[ 2403.848034]  [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240
[ 2403.853421]  [<ffffffff8163acd6>] wait_for_completion+0x116/0x170
[ 2403.859546]  [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20
[ 2403.865150]  [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0
[ 2403.871144]  [<ffffffff810ab190>] ? call_srcu+0x70/0x70
[ 2403.876368]  [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40
[ 2403.882112]  [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20
[ 2403.887817]  [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod]
[ 2403.894033]  [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod]
[ 2403.899957]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 2403.906260]  [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod]
[ 2403.912481]  [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod]
[ 2403.918790]  [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod]
[ 2403.924861]  [<ffffffff8112483d>] ? call_rcu_sched+0x1d/0x20
[ 2403.930577]  [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 2403.936682]  [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0
[ 2403.942160]  [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0
[ 2403.947724]  [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0
[ 2403.952806]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b
Comment 2 Mike Snitzer 2015-12-17 12:01:55 EST
We need this fix (already upstream): http://git.kernel.org/linus/2a708cff93f1
Comment 5 Mike Snitzer 2015-12-18 14:38:18 EST
*** Bug 1267966 has been marked as a duplicate of this bug. ***
Comment 10 Jeremy Eder 2015-12-22 06:53:55 EST
Test kernel seems to have resolved our deadlock issue.  Cluster is up and stable with the same container count that we had problems with last week (500 on 4 nodes).

3.10.0-327.4.4.el7.bz1292481.x86_64


** PM:  We would like to nominate this for 7.2.z
Comment 12 Jeremy Eder 2016-01-04 08:32:03 EST
Can this be evaluated for 7.2.z please?
Comment 17 Rafael Aquini 2016-01-08 09:12:36 EST
Patch(es) available on kernel-3.10.0-337.el7
Comment 25 Gerard Ryan 2016-02-18 06:34:38 EST
Hi, is there a known date when we can expect this to be pushed to stable repos?
Comment 26 Jeremy Eder 2016-02-18 06:57:19 EST
Hi Gerard -- we shipped this fix on Tuesday Feb 16.

Do a yum update and make sure you get kernel-3.10.0-327.10.1.el7 or later.

There was a different BZ tracking the RHEL7.2.z kernel release:
https://bugzilla.redhat.com/show_bug.cgi?id=1296566

This BZ is to track the fix into RHEL7.3.
Comment 30 Bruno Goncalves 2016-07-04 04:17:48 EDT
Tested kernel-3.10.0-445.el7 and it did pass on our Regression test and the tickets are closed.

Setting SanityOnly as we are not able to reproduce the problem in our environment.
Comment 32 errata-xmlrpc 2016-11-03 10:44:51 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2574.html

Note You need to log in before you can comment on or make changes to this bug.