Bug 1292481
Summary: | device mapper hung tasks on an openshift/docker system | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jeremy Eder <jeder> | |
Component: | kernel | Assignee: | Mike Snitzer <msnitzer> | |
kernel sub component: | Thin Provisioning | QA Contact: | Bruno Goncalves <bgoncalv> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | agk, akokshar, aquini, bdonahue, bgoncalv, dhoward, dshaw, dwalsh, ederevea, fedora, hklein, javier.ramirez, jeder, jmagrini, mpatocka, msnitzer, nmurray, pep, pm-rhel, sdodson, stalexan, thornber, tlavigne, uobergfe, vgoyal, xhe | |
Version: | 7.3 | Keywords: | ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | kernel-3.10.0-337.el7 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1296566 (view as bug list) | Environment: | ||
Last Closed: | 2016-11-03 14:44:51 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1296566 |
Description
Jeremy Eder
2015-12-17 14:48:43 UTC
Here is the last container that was attempted to be started before the issue occurred: Dec 15 07:17:09 ose3-master.example.com systemd[1]: Started docker container 74850fb6e89c5e021eab904829d8539973389cac3683c79bde5e3f34793b1bfc. Dec 15 07:17:09 ose3-master.example.com systemd[1]: Starting docker container 74850fb6e89c5e021eab904829d8539973389cac3683c79bde5e3f34793b1bfc. Dec 15 07:17:09 ose3-master.example.com kernel: XFS (dm-77): Unmounting Filesystem Dec 15 07:17:09 ose3-master.example.com kernel: SELinux: initialized (dev tmpfs, type tmpfs), uses mountpoint labeling Dec 15 07:17:09 ose3-master.example.com kernel: SELinux: initialized (dev devpts, type devpts), uses mountpoint labeling Dec 15 07:17:09 ose3-master.example.com kernel: SELinux: initialized (dev tmpfs, type tmpfs), uses mountpoint labeling Dec 15 07:17:09 ose3-master.example.com kernel: SELinux: initialized (dev tmpfs, type tmpfs), uses mountpoint labeling Dec 15 07:17:09 ose3-master.example.com docker[7714]: time="2015-12-15T07:17:09.413642640-05:00" level=error msg="Handler for GET /containers/{name:.*}/json returned error: Unknown device 713377cde7341cb5536ec15b9ada9a21bc41ad8e4a18d75b500f63f63c74c4c9" Dec 15 07:17:09 ose3-master.example.com docker[7714]: time="2015-12-15T07:17:09.413678122-05:00" level=error msg="HTTP Error" err="Unknown device 713377cde7341cb5536ec15b9ada9a21bc41ad8e4a18d75b500f63f63c74c4c9" statusCode=500 Dec 15 07:17:09 ose3-master.example.com kernel: XFS (dm-80): Unmounting Filesystem We don't see the "Unknown device" error above on any normal/healthy system. A few minutes later, we see the following tracebacks in dmesg: [ 1922.389502] INFO: task kworker/u66:0:81 blocked for more than 120 seconds. [ 1922.396371] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1922.404206] kworker/u66:0 D ffff8805bd5c28e0 0 81 2 0x00000000 [ 1922.411293] Workqueue: kdmremove do_deferred_remove [dm_mod] [ 1922.416963] ffff88060e937cf0 0000000000000046 ffff88060e917300 ffff88060e937fd8 [ 1922.424387] ffff88060e937fd8 ffff88060e937fd8 ffff88060e917300 ffff8805bd5c28d8 [ 1922.431809] ffff8805bd5c28dc ffff88060e917300 00000000ffffffff ffff8805bd5c28e0 [ 1922.439230] Call Trace: [ 1922.441676] [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70 [ 1922.448120] [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0 [ 1922.454290] [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f [ 1922.459423] [<ffffffffa0002e9d>] __dm_destroy+0xad/0x340 [dm_mod] [ 1922.465595] [<ffffffffa00047e3>] dm_destroy+0x13/0x20 [dm_mod] [ 1922.471509] [<ffffffffa0008d6d>] dm_hash_remove_all+0x6d/0x130 [dm_mod] [ 1922.478203] [<ffffffffa000b50a>] dm_deferred_remove+0x1a/0x20 [dm_mod] [ 1922.484806] [<ffffffffa0000dae>] do_deferred_remove+0xe/0x10 [dm_mod] [ 1922.491321] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [ 1922.497143] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [ 1922.502703] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [ 1922.508528] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [ 1922.513439] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 1922.520018] [<ffffffff81645858>] ret_from_fork+0x58/0x90 [ 1922.525427] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 1922.532040] INFO: task docker:77805 blocked for more than 120 seconds. [ 1922.538610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1922.546456] docker D 0000000000000000 0 77805 1 0x00000080 [ 1922.553635] ffff8805bf213ab0 0000000000000086 ffff880bbb116780 ffff8805bf213fd8 [ 1922.561164] ffff8805bf213fd8 ffff8805bf213fd8 ffff880bbb116780 ffff8805bf213bf0 [ 1922.568636] ffff8805bf213bf8 7fffffffffffffff ffff880bbb116780 0000000000000000 [ 1922.576138] Call Trace: [ 1922.578608] [<ffffffff8163a909>] schedule+0x29/0x70 [ 1922.583563] [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0 [ 1922.589386] [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240 [ 1922.594774] [<ffffffff8163acd6>] wait_for_completion+0x116/0x170 [ 1922.600858] [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20 [ 1922.606430] [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0 [ 1922.612459] [<ffffffff810ab190>] ? call_srcu+0x70/0x70 [ 1922.617728] [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40 [ 1922.623479] [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20 [ 1922.629157] [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod] [ 1922.635346] [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod] [ 1922.641269] [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod] [ 1922.647560] [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod] [ 1922.653743] [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod] [ 1922.660025] [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod] [ 1922.666055] [<ffffffff8112483d>] ? call_rcu_sched+0x1d/0x20 [ 1922.671720] [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 1922.677827] [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0 [ 1922.683333] [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0 [ 1922.688909] [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0 [ 1922.693950] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b [ 2042.704387] INFO: task kworker/u66:0:81 blocked for more than 120 seconds. [ 2042.711258] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2042.719072] kworker/u66:0 D ffff8805bd5c28e0 0 81 2 0x00000000 [ 2042.726146] Workqueue: kdmremove do_deferred_remove [dm_mod] [ 2042.731799] ffff88060e937cf0 0000000000000046 ffff88060e917300 ffff88060e937fd8 [ 2042.739218] ffff88060e937fd8 ffff88060e937fd8 ffff88060e917300 ffff8805bd5c28d8 [ 2042.746635] ffff8805bd5c28dc ffff88060e917300 00000000ffffffff ffff8805bd5c28e0 [ 2042.754052] Call Trace: [ 2042.756496] [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70 [ 2042.762923] [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0 [ 2042.769092] [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f [ 2042.774223] [<ffffffffa0002e9d>] __dm_destroy+0xad/0x340 [dm_mod] [ 2042.780401] [<ffffffffa00047e3>] dm_destroy+0x13/0x20 [dm_mod] [ 2042.786317] [<ffffffffa0008d6d>] dm_hash_remove_all+0x6d/0x130 [dm_mod] [ 2042.793009] [<ffffffffa000b50a>] dm_deferred_remove+0x1a/0x20 [dm_mod] [ 2042.799613] [<ffffffffa0000dae>] do_deferred_remove+0xe/0x10 [dm_mod] [ 2042.806129] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [ 2042.811950] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [ 2042.817511] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [ 2042.823333] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [ 2042.828200] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 2042.834717] [<ffffffff81645858>] ret_from_fork+0x58/0x90 [ 2042.840105] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 2042.846690] INFO: task docker:77805 blocked for more than 120 seconds. [ 2042.853203] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2042.861019] docker D 0000000000000000 0 77805 1 0x00000080 [ 2042.868084] ffff8805bf213ab0 0000000000000086 ffff880bbb116780 ffff8805bf213fd8 [ 2042.875502] ffff8805bf213fd8 ffff8805bf213fd8 ffff880bbb116780 ffff8805bf213bf0 [ 2042.882922] ffff8805bf213bf8 7fffffffffffffff ffff880bbb116780 0000000000000000 [ 2042.890344] Call Trace: [ 2042.892786] [<ffffffff8163a909>] schedule+0x29/0x70 [ 2042.897741] [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0 [ 2042.903567] [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240 [ 2042.908961] [<ffffffff8163acd6>] wait_for_completion+0x116/0x170 [ 2042.915047] [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20 [ 2042.920610] [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0 [ 2042.926604] [<ffffffff810ab190>] ? call_srcu+0x70/0x70 [ 2042.931820] [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40 [ 2042.937555] [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20 [ 2042.943222] [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod] [ 2042.949395] [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod] [ 2042.955320] [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod] [ 2042.961596] [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod] [ 2042.967782] [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod] [ 2042.974049] [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod] [ 2042.980057] [<ffffffff8112483d>] ? call_rcu_sched+0x1d/0x20 [ 2042.985726] [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 2042.991817] [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0 [ 2042.997304] [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0 [ 2043.002875] [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0 [ 2043.007930] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b [ 2163.018203] INFO: task kworker/u66:0:81 blocked for more than 120 seconds. [ 2163.025074] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2163.032890] kworker/u66:0 D ffff8805bd5c28e0 0 81 2 0x00000000 [ 2163.039964] Workqueue: kdmremove do_deferred_remove [dm_mod] [ 2163.045617] ffff88060e937cf0 0000000000000046 ffff88060e917300 ffff88060e937fd8 [ 2163.053035] ffff88060e937fd8 ffff88060e937fd8 ffff88060e917300 ffff8805bd5c28d8 [ 2163.060453] ffff8805bd5c28dc ffff88060e917300 00000000ffffffff ffff8805bd5c28e0 [ 2163.067870] Call Trace: [ 2163.070318] [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70 [ 2163.076747] [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0 [ 2163.082915] [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f [ 2163.088045] [<ffffffffa0002e9d>] __dm_destroy+0xad/0x340 [dm_mod] [ 2163.094266] [<ffffffffa00047e3>] dm_destroy+0x13/0x20 [dm_mod] [ 2163.100209] [<ffffffffa0008d6d>] dm_hash_remove_all+0x6d/0x130 [dm_mod] [ 2163.106948] [<ffffffffa000b50a>] dm_deferred_remove+0x1a/0x20 [dm_mod] [ 2163.113553] [<ffffffffa0000dae>] do_deferred_remove+0xe/0x10 [dm_mod] [ 2163.120071] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [ 2163.125892] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [ 2163.131462] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [ 2163.137291] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [ 2163.142169] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 2163.148693] [<ffffffff81645858>] ret_from_fork+0x58/0x90 [ 2163.154085] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 2163.160681] INFO: task docker:77805 blocked for more than 120 seconds. [ 2163.167200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2163.175017] docker D 0000000000000000 0 77805 1 0x00000080 [ 2163.182118] ffff8805bf213ab0 0000000000000086 ffff880bbb116780 ffff8805bf213fd8 [ 2163.189533] ffff8805bf213fd8 ffff8805bf213fd8 ffff880bbb116780 ffff8805bf213bf0 [ 2163.196987] ffff8805bf213bf8 7fffffffffffffff ffff880bbb116780 0000000000000000 [ 2163.204468] Call Trace: [ 2163.206941] [<ffffffff8163a909>] schedule+0x29/0x70 [ 2163.211896] [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0 [ 2163.217749] [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240 [ 2163.223136] [<ffffffff8163acd6>] wait_for_completion+0x116/0x170 [ 2163.229251] [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20 [ 2163.234816] [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0 [ 2163.240840] [<ffffffff810ab190>] ? call_srcu+0x70/0x70 [ 2163.246060] [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40 [ 2163.251825] [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20 [ 2163.257523] [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod] [ 2163.263736] [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod] [ 2163.269691] [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod] [ 2163.275982] [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod] [ 2163.282158] [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod] [ 2163.288456] [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod] [ 2163.294461] [<ffffffff8112483d>] ? call_rcu_sched+0x1d/0x20 [ 2163.300149] [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 2163.306265] [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0 [ 2163.311772] [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0 [ 2163.317363] [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0 [ 2163.322435] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b [ 2283.332984] INFO: task kworker/u66:0:81 blocked for more than 120 seconds. [ 2283.339855] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2283.347669] kworker/u66:0 D ffff8805bd5c28e0 0 81 2 0x00000000 [ 2283.354741] Workqueue: kdmremove do_deferred_remove [dm_mod] [ 2283.360396] ffff88060e937cf0 0000000000000046 ffff88060e917300 ffff88060e937fd8 [ 2283.367816] ffff88060e937fd8 ffff88060e937fd8 ffff88060e917300 ffff8805bd5c28d8 [ 2283.375240] ffff8805bd5c28dc ffff88060e917300 00000000ffffffff ffff8805bd5c28e0 [ 2283.382659] Call Trace: [ 2283.385105] [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70 [ 2283.391534] [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0 [ 2283.397704] [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f [ 2283.402838] [<ffffffffa0002e9d>] __dm_destroy+0xad/0x340 [dm_mod] [ 2283.409013] [<ffffffffa00047e3>] dm_destroy+0x13/0x20 [dm_mod] [ 2283.414925] [<ffffffffa0008d6d>] dm_hash_remove_all+0x6d/0x130 [dm_mod] [ 2283.421617] [<ffffffffa000b50a>] dm_deferred_remove+0x1a/0x20 [dm_mod] [ 2283.428220] [<ffffffffa0000dae>] do_deferred_remove+0xe/0x10 [dm_mod] [ 2283.434736] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [ 2283.440557] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [ 2283.446116] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [ 2283.451938] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [ 2283.456806] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 2283.463324] [<ffffffff81645858>] ret_from_fork+0x58/0x90 [ 2283.468713] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 2283.475295] INFO: task docker:77805 blocked for more than 120 seconds. [ 2283.481811] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2283.489625] docker D 0000000000000000 0 77805 1 0x00000080 [ 2283.496681] ffff8805bf213ab0 0000000000000086 ffff880bbb116780 ffff8805bf213fd8 [ 2283.504208] ffff8805bf213fd8 ffff8805bf213fd8 ffff880bbb116780 ffff8805bf213bf0 [ 2283.511781] ffff8805bf213bf8 7fffffffffffffff ffff880bbb116780 0000000000000000 [ 2283.519340] Call Trace: [ 2283.521815] [<ffffffff8163a909>] schedule+0x29/0x70 [ 2283.526805] [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0 [ 2283.532663] [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240 [ 2283.538091] [<ffffffff8163acd6>] wait_for_completion+0x116/0x170 [ 2283.544212] [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20 [ 2283.549809] [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0 [ 2283.555841] [<ffffffff810ab190>] ? call_srcu+0x70/0x70 [ 2283.561092] [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40 [ 2283.566877] [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20 [ 2283.572535] [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod] [ 2283.578729] [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod] [ 2283.584683] [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod] [ 2283.590986] [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod] [ 2283.597200] [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod] [ 2283.603502] [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod] [ 2283.609546] [<ffffffff8112483d>] ? call_rcu_sched+0x1d/0x20 [ 2283.615248] [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 2283.621372] [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0 [ 2283.626891] [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0 [ 2283.632495] [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0 [ 2283.637570] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b [ 2403.647866] INFO: task kworker/u66:0:81 blocked for more than 120 seconds. [ 2403.654736] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2403.662549] kworker/u66:0 D ffff8805bd5c28e0 0 81 2 0x00000000 [ 2403.669622] Workqueue: kdmremove do_deferred_remove [dm_mod] [ 2403.675276] ffff88060e937cf0 0000000000000046 ffff88060e917300 ffff88060e937fd8 [ 2403.682696] ffff88060e937fd8 ffff88060e937fd8 ffff88060e917300 ffff8805bd5c28d8 [ 2403.690138] ffff8805bd5c28dc ffff88060e917300 00000000ffffffff ffff8805bd5c28e0 [ 2403.697722] Call Trace: [ 2403.700209] [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70 [ 2403.706637] [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0 [ 2403.712806] [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f [ 2403.717942] [<ffffffffa0002e9d>] __dm_destroy+0xad/0x340 [dm_mod] [ 2403.724114] [<ffffffffa00047e3>] dm_destroy+0x13/0x20 [dm_mod] [ 2403.730029] [<ffffffffa0008d6d>] dm_hash_remove_all+0x6d/0x130 [dm_mod] [ 2403.736722] [<ffffffffa000b50a>] dm_deferred_remove+0x1a/0x20 [dm_mod] [ 2403.743367] [<ffffffffa0000dae>] do_deferred_remove+0xe/0x10 [dm_mod] [ 2403.749926] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [ 2403.755747] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [ 2403.761354] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [ 2403.767217] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [ 2403.772128] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 2403.778658] [<ffffffff81645858>] ret_from_fork+0x58/0x90 [ 2403.784083] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 2403.790707] INFO: task docker:77805 blocked for more than 120 seconds. [ 2403.797220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2403.805035] docker D 0000000000000000 0 77805 1 0x00000080 [ 2403.812203] ffff8805bf213ab0 0000000000000086 ffff880bbb116780 ffff8805bf213fd8 [ 2403.819787] ffff8805bf213fd8 ffff8805bf213fd8 ffff880bbb116780 ffff8805bf213bf0 [ 2403.827237] ffff8805bf213bf8 7fffffffffffffff ffff880bbb116780 0000000000000000 [ 2403.834721] Call Trace: [ 2403.837213] [<ffffffff8163a909>] schedule+0x29/0x70 [ 2403.842209] [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0 [ 2403.848034] [<ffffffff8108e4cd>] ? mod_timer+0x11d/0x240 [ 2403.853421] [<ffffffff8163acd6>] wait_for_completion+0x116/0x170 [ 2403.859546] [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20 [ 2403.865150] [<ffffffff810ab676>] __synchronize_srcu+0x106/0x1a0 [ 2403.871144] [<ffffffff810ab190>] ? call_srcu+0x70/0x70 [ 2403.876368] [<ffffffff81219ebf>] ? __sync_blockdev+0x1f/0x40 [ 2403.882112] [<ffffffff810ab72d>] synchronize_srcu+0x1d/0x20 [ 2403.887817] [<ffffffffa000318d>] __dm_suspend+0x5d/0x220 [dm_mod] [ 2403.894033] [<ffffffffa0004c9a>] dm_suspend+0xca/0xf0 [dm_mod] [ 2403.899957] [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod] [ 2403.906260] [<ffffffffa000a174>] dev_suspend+0x194/0x250 [dm_mod] [ 2403.912481] [<ffffffffa0009fe0>] ? table_load+0x380/0x380 [dm_mod] [ 2403.918790] [<ffffffffa000aa25>] ctl_ioctl+0x255/0x500 [dm_mod] [ 2403.924861] [<ffffffff8112483d>] ? call_rcu_sched+0x1d/0x20 [ 2403.930577] [<ffffffffa000ace3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 2403.936682] [<ffffffff811f1ef5>] do_vfs_ioctl+0x2e5/0x4c0 [ 2403.942160] [<ffffffff8128bc6e>] ? file_has_perm+0xae/0xc0 [ 2403.947724] [<ffffffff811f2171>] SyS_ioctl+0xa1/0xc0 [ 2403.952806] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b We need this fix (already upstream): http://git.kernel.org/linus/2a708cff93f1 *** Bug 1267966 has been marked as a duplicate of this bug. *** Test kernel seems to have resolved our deadlock issue. Cluster is up and stable with the same container count that we had problems with last week (500 on 4 nodes). 3.10.0-327.4.4.el7.bz1292481.x86_64 ** PM: We would like to nominate this for 7.2.z Can this be evaluated for 7.2.z please? Patch(es) available on kernel-3.10.0-337.el7 Hi, is there a known date when we can expect this to be pushed to stable repos? Hi Gerard -- we shipped this fix on Tuesday Feb 16. Do a yum update and make sure you get kernel-3.10.0-327.10.1.el7 or later. There was a different BZ tracking the RHEL7.2.z kernel release: https://bugzilla.redhat.com/show_bug.cgi?id=1296566 This BZ is to track the fix into RHEL7.3. Tested kernel-3.10.0-445.el7 and it did pass on our Regression test and the tickets are closed. Setting SanityOnly as we are not able to reproduce the problem in our environment. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2574.html |