Bug 1310661
Summary: | BUG: unable to handle kernel paging request at 65642072 followed by kernel panic | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Marian Csontos <mcsontos> | |
Component: | kernel | Assignee: | Mike Snitzer <msnitzer> | |
kernel sub component: | Thin Provisioning | QA Contact: | Bruno Goncalves <bgoncalv> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | unspecified | |||
Priority: | unspecified | CC: | agk, bgoncalv, cmarthal, jbrassow, mcsontos, msnitzer, rbednar, thornber | |
Version: | 6.7 | Keywords: | Regression | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | i686 | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | kernel-2.6.32-633.el6 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1347789 (view as bug list) | Environment: | ||
Last Closed: | 2016-05-10 23:43:26 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 908792 |
Description
Marian Csontos
2016-02-22 12:46:51 UTC
FYI, reproducible with latest 6.8 build, 2.6.32-618.el6.i686 Similar behaviour seen on 64 bit system. It seems the difference is that it takes a while to fail after resuming wrong dm table. # dmesg | tail -n 30 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. dm-thin D 0000000000000000 0 3238 2 0x00000080 ffff880039d9fc58 0000000000000046 ffffffff81130d87 ffff880039d9fbc0 ffffffff81130ed5 ffff88003a223d80 ffff88003f378900 ffff880039d9fbe0 ffffffff811d6374 0000000100054053 ffff88003aa77ad8 ffff880039d9ffd8 Call Trace: [<ffffffff81130d87>] ? mempool_free_slab+0x17/0x20 [<ffffffff81130ed5>] ? mempool_free+0x95/0xa0 [<ffffffff811d6374>] ? bio_free+0x64/0x70 [<ffffffff81046f28>] ? pvclock_clocksource_read+0x58/0xd0 [<ffffffff815499b5>] rwsem_down_failed_common+0x95/0x1d0 [<ffffffffa0494440>] ? do_noflush_stop+0x0/0x20 [dm_thin_pool] [<ffffffff8106b4e3>] ? perf_event_task_sched_out+0x33/0x70 [<ffffffff81549b46>] rwsem_down_read_failed+0x26/0x30 [<ffffffff8100969d>] ? __switch_to+0x7d/0x340 [<ffffffff812a6df4>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff81549044>] ? down_read+0x24/0x30 [<ffffffffa049a337>] dm_pool_changed_this_transaction+0x27/0x90 [dm_thin_pool] [<ffffffffa0498ac0>] ? do_worker+0x0/0x790 [dm_thin_pool] [<ffffffffa04991f3>] do_worker+0x733/0x790 [dm_thin_pool] [<ffffffff81091388>] ? add_timer+0x18/0x30 [<ffffffffa0495d10>] ? do_waker+0x0/0x40 [dm_thin_pool] [<ffffffffa0498ac0>] ? do_worker+0x0/0x790 [dm_thin_pool] [<ffffffff8109fdc0>] worker_thread+0x170/0x2a0 [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8109fc50>] ? worker_thread+0x0/0x2a0 [<ffffffff810a662e>] kthread+0x9e/0xc0 [<ffffffff8100c28a>] child_rip+0xa/0x20 [<ffffffff810a6590>] ? kthread+0x0/0xc0 [<ffffffff8100c280>] ? child_rip+0x0/0x20 # dmsetup status vg-test: 0 2097152 linear ^^^^command hangs^^^^^ Packages: 2.6.32-616.el6.x86_64 lvm2-2.02.143-1.el6 BUILT: Wed Feb 24 14:59:50 CET 2016 lvm2-libs-2.02.143-1.el6 BUILT: Wed Feb 24 14:59:50 CET 2016 lvm2-cluster-2.02.143-1.el6 BUILT: Wed Feb 24 14:59:50 CET 2016 udev-147-2.71.el6 BUILT: Wed Feb 10 14:07:17 CET 2016 device-mapper-1.02.117-1.el6 BUILT: Wed Feb 24 14:59:50 CET 2016 device-mapper-libs-1.02.117-1.el6 BUILT: Wed Feb 24 14:59:50 CET 2016 device-mapper-event-1.02.117-1.el6 BUILT: Wed Feb 24 14:59:50 CET 2016 device-mapper-event-libs-1.02.117-1.el6 BUILT: Wed Feb 24 14:59:50 CET 2016 device-mapper-persistent-data-0.6.2-0.1.rc1.el6 BUILT: Wed Feb 10 16:52:15 CET 2016 cmirror-2.02.143-1.el6 BUILT: Wed Feb 24 14:59:50 CET 2016 So here comes an Ooops on latest 4.5.0-0.rc5.git0.1.fc24.x86_64 running on my rawhide (64bit C2D T61 bare metal) (esentially needs just some sleep after metadata device is errored (2 sectors starting with 6th sector of metadata device are replaced with error target) [ 31.947648] device-mapper: thin: 253:4: metadata operation 'dm_pool_commit_metadata' failed: error = -5 [ 31.958999] device-mapper: thin: 253:4: aborting current metadata transaction [ 31.968484] device-mapper: thin: 253:4: failed to abort metadata transaction [ 31.977595] device-mapper: thin: 253:4: switching pool to failure mode [ 31.986222] device-mapper: thin metadata: couldn't read superblock [ 31.994410] device-mapper: thin: 253:4: failed to set 'needs_check' flag in metadata [ 32.004230] device-mapper: thin: 253:4: dm_pool_get_metadata_transaction_id returned -22 [ 70.894044] BUG: unable to handle kernel NULL pointer dereference at (null) [ 70.895009] IP: [<ffffffff813f4e5e>] __list_add+0x2e/0xf0 [ 70.895009] PGD 0 [ 70.895009] Oops: 0000 [#1] SMP [ 70.895009] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison libcrc32c xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_analog snd_hda_codec_generic coretemp kvm_intel arc4 iTCO_wdt iTCO_vendor_support kvm ppdev irqbypass iwl3945 iwlegacy i2c_i801 joydev mac80211 r592 cfg80211 snd_hda_intel memstick snd_hda_codec snd_hda_core snd_hwdep snd_seq lpc_ich acpi_cpufreq snd_seq_device e1000e snd_pcm thinkpad_acpi ptp shpchp snd_timer pps_core snd wmi soundcore parport_pc rfkill fjes parport tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace loop sunrpc binfmt_misc i915 sdhci_pci serio_raw sdhci mmc_core ata_generic pata_acpi yenta_socket i2c_algo_bit drm_kms_helper drm video [ 70.895009] CPU: 0 PID: 87 Comm: kworker/u4:4 Not tainted 4.5.0-0.rc5.git0.1.fc24.x86_64 #1 [ 70.895009] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011 [ 70.895009] Workqueue: dm-thin do_worker [dm_thin_pool] [ 70.895009] task: ffff880135e28000 ti: ffff8800b97b8000 task.ti: ffff8800b97b8000 [ 70.895009] RIP: 0010:[<ffffffff813f4e5e>] [<ffffffff813f4e5e>] __list_add+0x2e/0xf0 [ 70.895009] RSP: 0018:ffff8800b97bbc88 EFLAGS: 00010246 [ 70.895009] RAX: 00000000ffffffff RBX: ffff8800b97bbcb0 RCX: 0000000000000000 [ 70.895009] RDX: ffff8800ad536830 RSI: 0000000000000000 RDI: ffff8800b97bbcb0 [ 70.895009] RBP: ffff8800b97bbca0 R08: 0000000000000000 R09: 0000000000000000 [ 70.895009] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 70.895009] R13: ffff8800ad536830 R14: 00000000ffffffff R15: ffff8800ad536830 [ 70.895009] FS: 0000000000000000(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000 [ 70.895009] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 70.895009] CR2: 0000000000000000 CR3: 0000000001c0a000 CR4: 00000000000006f0 [ 70.895009] Stack: [ 70.895009] ffff8800ad536828 ffff880135e28000 ffff8800ad53682c ffff8800b97bbd00 [ 70.895009] ffffffff817d0ab6 ffff8800b9718000 ffff8800b97bbce0 ffffffff810cf7b9 [ 70.895009] 000000004d9553ba ffff88013bb16dc0 ffff8800ad536828 ffff8800ad536c50 [ 70.895009] Call Trace: [ 70.895009] [<ffffffff817d0ab6>] __mutex_lock_slowpath+0x96/0x120 [ 70.895009] [<ffffffff810cf7b9>] ? ttwu_do_wakeup+0x19/0xe0 [ 70.895009] [<ffffffff817d0b5f>] mutex_lock+0x1f/0x30 [ 70.895009] [<ffffffffa078f3ce>] dm_tm_issue_prefetches+0x3e/0x70 [dm_persistent_data] [ 70.895009] [<ffffffffa07a9bd2>] dm_pool_issue_prefetches+0x12/0x14 [dm_thin_pool] [ 70.895009] [<ffffffffa07a5a4d>] do_worker+0x5d/0x8b0 [dm_thin_pool] [ 70.895009] [<ffffffff810bbc19>] ? move_linked_works+0x59/0x80 [ 70.895009] [<ffffffff810bd5a1>] ? pwq_activate_delayed_work+0x41/0xb0 [ 70.895009] [<ffffffff810bec57>] process_one_work+0x187/0x440 [ 70.895009] [<ffffffff810bef5e>] worker_thread+0x4e/0x480 [ 70.895009] [<ffffffff810bef10>] ? process_one_work+0x440/0x440 [ 70.895009] [<ffffffff810c4ea8>] kthread+0xd8/0xf0 [ 70.895009] [<ffffffff810c4dd0>] ? kthread_worker_fn+0x180/0x180 [ 70.895009] [<ffffffff817d323f>] ret_from_fork+0x3f/0x70 [ 70.895009] [<ffffffff810c4dd0>] ? kthread_worker_fn+0x180/0x180 [ 70.895009] Code: e5 41 55 41 54 53 48 81 3f a0 86 03 82 48 89 fb 49 89 f4 49 89 d5 74 44 48 81 7f 08 a0 86 03 82 74 3a 4d 8b 45 08 4d 39 e0 75 52 <4d> 8b 04 24 4d 39 c5 75 70 4c 39 e3 0f 84 8a 00 00 00 4c 39 eb [ 70.895009] RIP [<ffffffff813f4e5e>] __list_add+0x2e/0xf0 [ 70.895009] RSP <ffff8800b97bbc88> [ 70.895009] CR2: 0000000000000000 [ 70.895009] ---[ end trace 5435170255154967 ]--- [ 71.488024] BUG: unable to handle kernel paging request at ffffffffffffffd8 [ 71.489006] IP: [<ffffffff810c5580>] kthread_data+0x10/0x20 [ 71.489006] PGD 1c0d067 PUD 1c0f067 PMD 0 [ 71.489006] Oops: 0000 [#2] SMP [ 71.489006] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison libcrc32c xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_analog snd_hda_codec_generic coretemp kvm_intel arc4 iTCO_wdt iTCO_vendor_support kvm ppdev irqbypass iwl3945 iwlegacy i2c_i801 joydev mac80211 r592 cfg80211 snd_hda_intel memstick snd_hda_codec snd_hda_core snd_hwdep snd_seq lpc_ich acpi_cpufreq snd_seq_device e1000e snd_pcm thinkpad_acpi ptp shpchp snd_timer pps_core snd wmi soundcore parport_pc rfkill fjes parport tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace loop sunrpc binfmt_misc i915 sdhci_pci serio_raw sdhci mmc_core ata_generic pata_acpi yenta_socket i2c_algo_bit drm_kms_helper drm video [ 71.489006] CPU: 0 PID: 87 Comm: kworker/u4:4 Tainted: G D 4.5.0-0.rc5.git0.1.fc24.x86_64 #1 [ 71.489006] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011 [ 71.489006] task: ffff880135e28000 ti: ffff8800b97b8000 task.ti: ffff8800b97b8000 [ 71.489006] RIP: 0010:[<ffffffff810c5580>] [<ffffffff810c5580>] kthread_data+0x10/0x20 [ 71.489006] RSP: 0018:ffff8800b97bb960 EFLAGS: 00010002 [ 71.489006] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 71.489006] RDX: ffff880137004000 RSI: 0000000000000000 RDI: ffff880135e28000 [ 71.489006] RBP: ffff8800b97bb960 R08: ffff880135e280a8 R09: 0000000000000001 [ 71.489006] R10: 00000010a5046b23 R11: 0000000000000000 R12: 0000000000016dc0 [ 71.489006] R13: ffff880135e28658 R14: ffff880135e28000 R15: ffff88013ba16dc0 [ 71.489006] FS: 0000000000000000(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000 [ 71.489006] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 71.489006] CR2: 0000000000000028 CR3: 0000000001c0a000 CR4: 00000000000006f0 [ 71.489006] Stack: [ 71.489006] ffff8800b97bb978 ffffffff810bfb41 ffff88013ba16dc0 ffff8800b97bb9c8 [ 71.489006] ffffffff817ce98a 0000000000000001 ffff880100000000 ffff880135e28000 [ 71.489006] ffff8800b97b8000 0000000000000000 ffff8800b97bba18 ffff8800b97bb4b0 [ 71.489006] Call Trace: [ 71.489006] [<ffffffff810bfb41>] wq_worker_sleeping+0x11/0x90 [ 71.489006] [<ffffffff817ce98a>] __schedule+0x65a/0xa00 [ 71.489006] [<ffffffff817ced6c>] schedule+0x3c/0x90 [ 71.489006] [<ffffffff810a91de>] do_exit+0x7ce/0xb50 [ 71.489006] [<ffffffff8101a97a>] oops_end+0x9a/0xd0 [ 71.489006] [<ffffffff81068ece>] no_context+0x13e/0x390 [ 71.489006] [<ffffffff811113ae>] ? try_to_del_timer_sync+0x5e/0x90 [ 71.489006] [<ffffffff810691a0>] __bad_area_nosemaphore+0x80/0x1f0 [ 71.489006] [<ffffffff81069323>] bad_area_nosemaphore+0x13/0x20 [ 71.489006] [<ffffffff810695e7>] __do_page_fault+0xb7/0x400 [ 71.489006] [<ffffffff81069960>] do_page_fault+0x30/0x80 [ 71.489006] [<ffffffff817d5288>] page_fault+0x28/0x30 [ 71.489006] [<ffffffff813f4e5e>] ? __list_add+0x2e/0xf0 [ 71.489006] [<ffffffff817d0ab6>] __mutex_lock_slowpath+0x96/0x120 [ 71.489006] [<ffffffff810cf7b9>] ? ttwu_do_wakeup+0x19/0xe0 [ 71.489006] [<ffffffff817d0b5f>] mutex_lock+0x1f/0x30 [ 71.489006] [<ffffffffa078f3ce>] dm_tm_issue_prefetches+0x3e/0x70 [dm_persistent_data] [ 71.489006] [<ffffffffa07a9bd2>] dm_pool_issue_prefetches+0x12/0x14 [dm_thin_pool] [ 71.489006] [<ffffffffa07a5a4d>] do_worker+0x5d/0x8b0 [dm_thin_pool] [ 71.489006] [<ffffffff810bbc19>] ? move_linked_works+0x59/0x80 [ 71.489006] [<ffffffff810bd5a1>] ? pwq_activate_delayed_work+0x41/0xb0 [ 71.489006] [<ffffffff810bec57>] process_one_work+0x187/0x440 [ 71.489006] [<ffffffff810bef5e>] worker_thread+0x4e/0x480 [ 71.489006] [<ffffffff810bef10>] ? process_one_work+0x440/0x440 [ 71.489006] [<ffffffff810c4ea8>] kthread+0xd8/0xf0 [ 71.489006] [<ffffffff810c4dd0>] ? kthread_worker_fn+0x180/0x180 [ 71.489006] [<ffffffff817d323f>] ret_from_fork+0x3f/0x70 [ 71.489006] [<ffffffff810c4dd0>] ? kthread_worker_fn+0x180/0x180 [ 71.489006] Code: 87 ab 70 00 e9 53 ff ff ff e8 1d 0a fe ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 87 e0 05 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 [ 71.489006] RIP [<ffffffff810c5580>] kthread_data+0x10/0x20 [ 71.489006] RSP <ffff8800b97bb960> [ 71.489006] CR2: ffffffffffffffd8 [ 71.489006] ---[ end trace 5435170255154968 ]--- [ 71.489006] Fixing recursive fault but reboot is needed! Reproduced with this tweak to an existing dmtest: https://github.com/jthornber/device-mapper-test-suite/commit/b25b5ced93d1fbb94145db8f2166e95f74da2a9a Fixed with this patch: https://github.com/jthornber/linux-2.6/commit/596989fe2d510a9b8c40ca59e9c4794b5a754ed1 This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. *** Bug 1305983 has been marked as a duplicate of this bug. *** Patch(es) available on kernel-2.6.32-633.el6 I have updated the builder and test passed. I will let it run a while and report if there are any issues. Thanks. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-0855.html |