Bug 2058501

Summary: mdadm-4.2-2.el9 regressions panic during reshape
Product: Red Hat Enterprise Linux 9 Reporter: Fine Fan <ffan>
Component: mdadmAssignee: Nigel Croxon <ncroxon>
Status: VERIFIED --- QA Contact: Fine Fan <ffan>
Severity: unspecified Docs Contact:
Priority: high    
Version: 9.0CC: dledford, lmiksik, ncroxon, xni, yizhan
Target Milestone: rcKeywords: Regression, Triaged
Target Release: 9.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-5.14.0-137.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Fine Fan 2022-02-25 07:04:18 UTC
This bug was initially created as a copy of Bug #2058496

I am copying this bug because: 



Description of problem:


Version-Release number of selected component (if applicable):
RHEL-9.0.0-20220223.1
kernel-5.14.0-63.el9.x86_64
mdadm-4.2-2.el9


How reproducible:


Steps to Reproduce:
mdadm --create --run /dev/md0 --level 1 --metadata 1.2 --raid-devices 6  /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4 /dev/loop5     --spare-devices 1  /dev/loop6

mkfs -t ext4 /dev/md0

mount -t ext4 /dev/md0 /mnt/md_test

dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=100

mdadm --grow -l0 /dev/md0 --backup-file=tmp0


Actual results:
The server panic.

Expected results:
The server don't panic.

Additional info:

Comment 1 Fine Fan 2022-02-25 08:12:11 UTC
[  773.597483] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  773.604478] #PF: supervisor instruction fetch in kernel mode
[  773.610136] #PF: error_code(0x0010) - not-present page
[  773.615274] PGD 0 P4D 0 
[  773.617807] Oops: 0010 [#1] PREEMPT SMP NOPTI
[  773.622164] CPU: 24 PID: 1954 Comm: jbd2/md0-8 Kdump: loaded Not tainted 5.14.0-63.el9.x86_64 #1
[  773.630943] Hardware name: Dell Inc. PowerEdge R6515/035YY8, BIOS 2.5.5 10/07/2021
[  773.638500] RIP: 0010:0x0
[  773.641130] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[  773.648001] RSP: 0018:ffffb206405afab0 EFLAGS: 00010206
[  773.653225] RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000080
[  773.660348] RDX: 0000000080000000 RSI: 0000000000000000 RDI: 0000000000092800
[  773.667474] RBP: ffff96e91a7e4718 R08: ffff96e901b24600 R09: 0000000000000400
[  773.674606] R10: 0000000000000000 R11: ffff96e9165aa3f0 R12: 0000000000092c00
[  773.681738] R13: 0000000000000400 R14: ffff96e91a7e4718 R15: ffff96e9012ed4c0
[  773.688860] FS:  0000000000000000(0000) GS:ffff96f01f200000(0000) knlGS:0000000000000000
[  773.696940] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  773.702796] CR2: ffffffffffffffd6 CR3: 0000000427610004 CR4: 0000000000770ee0
[  773.709922] PKRU: 55555554
[  773.712634] Call Trace:
[  773.715079]  mempool_alloc+0x62/0x160
[  773.718747]  ? ext4_map_blocks+0x3aa/0x5e0 [ext4]
[  773.723459]  ? xas_store+0x1d9/0x5f0
[  773.727039]  ? jbd2_transaction_committed+0x55/0x60 [jbd2]
[  773.732523]  bio_alloc_bioset+0x9d/0x330
[  773.736450]  bio_clone_fast+0x1a/0x70
[  773.740115]  md_account_bio+0x39/0x70
[  773.743782]  raid0_make_request+0x9c/0x350 [raid0]
[  773.748573]  md_handle_request+0x12c/0x1c0
[  773.752664]  ? ktime_get+0x38/0x90
[  773.756071]  ? submit_bio_checks+0x1ce/0x5a0
[  773.760342]  md_submit_bio+0x67/0xa0
[  773.763912]  __submit_bio+0x95/0x140
[  773.767483]  __submit_bio_noacct+0x81/0x1e0
[  773.771662]  submit_bh_wbc+0x15c/0x180
[  773.775414]  jbd2_journal_commit_transaction+0xa2c/0x19d0 [jbd2]
[  773.781421]  ? finish_task_switch.isra.0+0xb4/0x290
[  773.786300]  kjournald2+0xaf/0x280 [jbd2]
[  773.790311]  ? do_wait_intr_irq+0xa0/0xa0
[  773.794323]  ? jbd2_journal_release_jbd_inode+0x150/0x150 [jbd2]
[  773.800329]  kthread+0x149/0x170
[  773.803666]  ? set_kthread_struct+0x40/0x40
[  773.807956]  ret_from_fork+0x22/0x30
[  773.811537] Modules linked in: raid0 ext4 mbcache jbd2 raid1 loop rfkill sunrpc dm_multipath intel_rapl_msr dcdbas intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass rapl pcspkr dell_smbios dell_wmi_descriptor wmi_bmof ipmi_ssif mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec k10temp i2c_piix4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sd_mod t10_pi sg crct10dif_pclmul ahci crc32_pclmul crc32c_intel libahci libata tg3 ghash_clmulni_intel ccp sp5100_tco wmi dm_mirror dm_region_hash dm_log dm_mod
[  773.863491] CR2: 0000000000000000

Comment 2 XiaoNi 2022-02-28 02:03:15 UTC
This should be fixed by patch

commit 0c031fd37f69deb0cd8c43bbfcfccd62ebd7e952
Author: Xiao Ni <xni>
Date:   Fri Dec 10 17:31:15 2021 +0800

    md: Move alloc/free acct bioset in to personality

Comment 3 Nigel Croxon 2022-03-08 14:28:17 UTC
Submitted into RHEL-9.1.0
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/550 
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2042797
Submitted to: git:redhat/centos-stream/src/kernel/centos-stream-9.git
Patch #22 in the series.

-Nigel

Comment 11 Nigel Croxon 2022-05-02 16:11:49 UTC
Proposed upstream fix
https://www.spinics.net/lists/raid/msg70201.html