Bug 1484587 - kernel BUG at block/blk-core.c:2054 (raid1 and raid10 resync broken)
Summary: kernel BUG at block/blk-core.c:2054 (raid1 and raid10 resync broken)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 26
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-23 23:42 UTC by Jonathan Underwood
Modified: 2017-09-13 05:20 UTC (History)
23 users (show)

Fixed In Version: kernel-4.12.11-300.fc26 kernel-4.12.11-200.fc25
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-12 00:23:39 UTC
Type: Bug


Attachments (Terms of Use)
Patch to apply upstream patch and spec file changes (5.08 KB, patch)
2017-08-29 19:55 UTC, Jonathan Underwood
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 196749 0 None None None 2019-03-17 20:24:14 UTC

Description Jonathan Underwood 2017-08-23 23:42:47 UTC
Description of problem:
I keep hitting the bug detailed in the backtrace below. I am running Fedora 26 on an Hp Microserver Gen 8. I have a 2 disk RAID 1 array set up with mdadm comprising two 2TB disks. I set up the raid array as follows:

mdadm -R --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1

I am hitting the bug when attempting to write random data to the md device by doing the following:

cryptsetup open --type plain /dev/md0 container --key-file /dev/random
dd bs=1M if=/dev/zero of=/dev/mapper/container status=progress

At some point during the dd invocation, after a few GB has been written, I see the following bug:



[  464.808442] ------------[ cut here ]------------
[  464.808445] kernel BUG at block/blk-core.c:2054!
[  464.808506] invalid opcode: 0000 [#1] SMP
[  464.808561] Modules linked in: dm_crypt ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support gpio_ich ipmi_ssif intel_uncore intel_rapl_perf raid1 ipmi_si ipmi_devintf hpwdt hpilo ipmi_msghandler acpi_power_meter pcc_cpufreq lpc_ich tpm_tis tpm_tis_core shpchp ie31200_edac tpm i2c_algo_bit
[  464.808724]  drm_kms_helper ttm crc32c_intel drm uas serio_raw usb_storage tg3 ptp pps_core
[  464.808790] CPU: 0 PID: 493 Comm: md0_resync Not tainted 4.12.5-300.fc26.x86_64 #1
[  464.808848] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 11/02/2015
[  464.808906] task: ffff8a68eb41c980 task.stack: ffffaf2881e8c000
[  464.808968] RIP: 0010:generic_make_request+0x2c4/0x2d0
[  464.809025] RSP: 0000:ffffaf2881e8fbe0 EFLAGS: 00010286
[  464.809241] RAX: ffff8a68eb41c980 RBX: ffff8a6836195a00 RCX: 0000000000000000
[  464.809460] RDX: 0000000000000402 RSI: 0000000000000001 RDI: ffff8a6836abedb8
[  464.809681] RBP: ffffaf2881e8fc30 R08: 0000000000000010 R09: 0000000000001000
[  464.809900] R10: ffffaf2881e8fc48 R11: 00000000ffffffff R12: 0000000000000008
[  464.810120] R13: ffff8a683601f600 R14: 0000000000000080 R15: ffff8a68eb8d8000
[  464.810341] FS:  0000000000000000(0000) GS:ffff8a690a600000(0000) knlGS:0000000000000000
[  464.810721] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  464.810938] CR2: 00007fb59a851000 CR3: 000000005d01c000 CR4: 00000000001406f0
[  464.811161] Call Trace:
[  464.811381]  ? bitmap_start_sync+0x56/0x100
[  464.811601]  raid1_sync_request+0xadf/0xb90 [raid1]
[  464.811820]  ? raid1_sync_request+0xadf/0xb90 [raid1]
[  464.812039]  ? is_mddev_idle+0xa6/0x10a
[  464.812256]  md_do_sync+0x8cd/0xee0
[  464.812475]  ? finish_wait+0x80/0x80
[  464.812693]  md_thread+0x125/0x170
[  464.812910]  ? md_thread+0x125/0x170
[  464.813128]  kthread+0x125/0x140
[  464.813346]  ? find_pers+0x70/0x70
[  464.813560]  ? kthread_park+0x60/0x60
[  464.813778]  ? do_syscall_64+0x67/0x140
[  464.814001]  ret_from_fork+0x25/0x30
[  464.814219] Code: bc 24 68 07 00 00 f0 49 83 ac 24 68 07 00 00 01 0f 85 85 fe ff ff 41 ff 94 24 78 07 00 00 e9 78 fe ff ff 4c 89 28 e9 ae fd ff ff <0f> 0b e8 95 6e cc ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 
[  464.814812] RIP: generic_make_request+0x2c4/0x2d0 RSP: ffffaf2881e8fbe0
[  464.815058] ---[ end trace 2b032b91acd7b852 ]---
[  464.820281] ------------[ cut here ]------------
[  464.820511] WARNING: CPU: 0 PID: 493 at kernel/exit.c:785 do_exit+0x51/0xb50
[  464.820734] Modules linked in: dm_crypt ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support gpio_ich ipmi_ssif intel_uncore intel_rapl_perf raid1 ipmi_si ipmi_devintf hpwdt hpilo ipmi_msghandler acpi_power_meter pcc_cpufreq lpc_ich tpm_tis tpm_tis_core shpchp ie31200_edac tpm i2c_algo_bit
[  464.822702]  drm_kms_helper ttm crc32c_intel drm uas serio_raw usb_storage tg3 ptp pps_core
[  464.823095] CPU: 0 PID: 493 Comm: md0_resync Tainted: G      D         4.12.5-300.fc26.x86_64 #1
[  464.823487] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 11/02/2015
[  464.823710] task: ffff8a68eb41c980 task.stack: ffffaf2881e8c000
[  464.823933] RIP: 0010:do_exit+0x51/0xb50
[  464.824151] RSP: 0000:ffffaf2881e8fed8 EFLAGS: 00010202
[  464.824373] RAX: ffffaf2881e8fd80 RBX: ffff8a68eb41c980 RCX: 0000000000000000
[  464.824598] RDX: ffff8a68df4a2400 RSI: 0000000000000000 RDI: ffffffff94ef8d60
[  464.824821] RBP: ffffaf2881e8ff48 R08: 00000000000e9000 R09: 00000000000e8025
[  464.825043] R10: 0000000000000008 R11: 0000000000000000 R12: 000000000000000b
[  464.825266] R13: ffffaf2881e8fb38 R14: 0000000000000000 R15: ffffaf2881e8fb38
[  464.825493] FS:  0000000000000000(0000) GS:ffff8a690a600000(0000) knlGS:0000000000000000
[  464.825880] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  464.826100] CR2: 00007fb59a851000 CR3: 000000005d01c000 CR4: 00000000001406f0
[  464.826322] Call Trace:
[  464.826546]  ? kthread+0x125/0x140
[  464.826767]  rewind_stack_do_exit+0x17/0x20
[  464.826988] Code: 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 a7 b8 06 00 48 8b 83 60 0b 00 00 48 85 c0 74 0e 48 8b 10 48 39 d0 0f 84 60 04 00 00 <0f> ff 65 8b 05 86 4f f6 6b 25 00 ff 1f 00 89 45 9c 0f 85 28 08 
[  464.827576] ---[ end trace 2b032b91acd7b853 ]---

Version-Release number of selected component (if applicable):
kernel-4.12.5-300.fc26.x86_64

How reproducible:
Everytime

Steps to Reproduce:
See above


Additional info:

Comment 1 Jonathan Underwood 2017-08-24 14:53:50 UTC
Jens Axboe responded that he believes this is fixed upstream with the following commit.

Could this be added to the Fedora 4.12 based kernels?

Fixed by:

commit 0c9d5b127f695818c2c5a3868c1f28ca2969e905
Author: NeilBrown <neilb@suse.com>
Date:   Thu Apr 6 12:06:37 2017 +1000

    md/raid1: avoid reusing a resync bio after error handling.

Comment 2 Laura Abbott 2017-08-24 15:40:04 UTC
That commit is already present in 4.12 based kernels so there must be something else going on

Comment 3 Jonathan Underwood 2017-08-25 06:57:38 UTC
This patch is expected to fix the problem. Would it be possible to push a build for f26 with this included for testing?

https://marc.info/?l=linux-raid&m=150362889201103&w=2

Comment 4 Jonathan Underwood 2017-08-29 16:11:56 UTC
Here is a scratch build with that patch included:

https://koji.fedoraproject.org/koji/taskinfo?taskID=21520472

Could you please add this patch to the fedora kernels until it hits 4.12 stable - it arrived too late for 4.12.10. Without this patch resync for raid1 and raid10 is broken, which I worry could lead to data loss.

Comment 5 Justin M. Forbes 2017-08-29 16:32:22 UTC
Have you tested this patch? Does it fix the issue for you?

Comment 6 Jonathan Underwood 2017-08-29 17:50:23 UTC
The scratch build didn't complete in time for me to test before work. I'll test it this evening when I get home.

Comment 7 Jonathan Underwood 2017-08-29 19:55:36 UTC
Created attachment 1319777 [details]
Patch to apply upstream patch and spec file changes

I have tested the scratch built kernel 4.12.9-301, and it looks like the added patch has fixed the bug.

Attached is the full patch to add it to the spec file etc, generated with git diff.

Comment 8 Fedora Update System 2017-09-08 20:42:46 UTC
kernel-4.12.11-300.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-6764d16965

Comment 9 Fedora Update System 2017-09-08 20:44:40 UTC
kernel-4.12.11-200.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-a3a8638a60

Comment 10 Fedora Update System 2017-09-10 05:53:28 UTC
kernel-4.12.11-300.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-6764d16965

Comment 11 Fedora Update System 2017-09-10 07:22:50 UTC
kernel-4.12.11-200.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-a3a8638a60

Comment 12 Fedora Update System 2017-09-12 00:23:39 UTC
kernel-4.12.11-300.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 13 Fedora Update System 2017-09-13 05:20:27 UTC
kernel-4.12.11-200.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.