Bug 2058496
| Summary: | mdadm-4.2-2.el8 regression panic during reshape | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Fine Fan <ffan> |
| Component: | mdadm | Assignee: | Nigel Croxon <ncroxon> |
| Status: | CLOSED WONTFIX | QA Contact: | Fine Fan <ffan> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | 8.6 | CC: | dledford, heinzm, lmiksik, ncroxon, xni, yizhan |
| Target Milestone: | rc | Keywords: | Regression, Triaged |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-4.18.0-404.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-08-25 07:28:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Fine Fan
2022-02-25 06:37:05 UTC
[ 374.148492] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 374.156397] PGD 0 P4D 0 [ 374.158933] Oops: 0010 [#1] SMP NOPTI [ 374.162592] CPU: 26 PID: 2201 Comm: jbd2/md0-8 Kdump: loaded Not tainted 4.18.0-367.el8.x86_64 #1 [ 374.171457] Hardware name: Dell Inc. PowerEdge R6515/035YY8, BIOS 2.5.5 10/07/2021 [ 374.179014] RIP: 0010:0x0 [ 374.181641] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 374.188513] RSP: 0018:ffffa9f60358ba08 EFLAGS: 00010206 [ 374.193739] RAX: 0000000000000000 RBX: 0000000000411200 RCX: ffff9c4b097af7d8 [ 374.200871] RDX: ffff9c4b097af7c8 RSI: 0000000000000000 RDI: 0000000000411200 [ 374.207996] RBP: 0000000000611200 R08: 0000000000000001 R09: 0000000000000001 [ 374.215126] R10: 0000000000000002 R11: 0000000000000400 R12: ffff9c4b097af808 [ 374.222251] R13: ffffa9f60358ba40 R14: ffffffffb993d840 R15: ffff9c4b097af7d8 [ 374.229376] FS: 0000000000000000(0000) GS:ffff9c521f280000(0000) knlGS:0000000000000000 [ 374.237460] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 374.243197] CR2: ffffffffffffffd6 CR3: 0000000673810005 CR4: 0000000000770ee0 [ 374.250320] PKRU: 55555554 [ 374.253026] Call Trace: [ 374.255504] mempool_alloc+0x67/0x180 [ 374.259214] bio_alloc_bioset+0x14a/0x220 [ 374.263225] bio_clone_fast+0x19/0x60 [ 374.266892] md_account_bio+0x39/0x80 [ 374.270559] raid0_make_request+0xa0/0x550 [raid0] [ 374.275351] ? blk_throtl_bio+0x252/0xb80 [ 374.279363] ? finish_wait+0x80/0x80 [ 374.282942] md_handle_request+0x119/0x190 [ 374.287040] md_make_request+0x5b/0xb0 [ 374.290785] generic_make_request+0x25b/0x350 [ 374.295144] submit_bio+0x3c/0x160 [ 374.298541] ? bio_add_page+0x42/0x50 [ 374.302207] submit_bh_wbc+0x16a/0x190 [ 374.305960] jbd2_journal_commit_transaction+0x6b6/0x1a00 [jbd2] [ 374.311967] ? __switch_to_asm+0x41/0x70 [ 374.315892] ? sk_filter_is_valid_access+0x50/0x60 [ 374.320684] ? __switch_to+0x10c/0x450 [ 374.324436] kjournald2+0xbd/0x270 [jbd2] [ 374.328448] ? finish_wait+0x80/0x80 [ 374.332020] ? commit_timeout+0x10/0x10 [jbd2] [ 374.336466] kthread+0x10a/0x120 [ 374.339697] ? set_kthread_struct+0x40/0x40 [ 374.343876] ret_from_fork+0x22/0x40 [ 374.347456] Modules linked in: raid0 ext4 mbcache jbd2 raid1 loop sunrpc dm_multipath dell_smbios intel_rapl_msr dell_wmi_descriptor wmi_bmof dcdbas intel_rapl_common amd64_edac_mod edac_mce_amd amd_energy kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl pcspkr ipmi_ssif ccp sp5100_tco k10temp i2c_piix4 wmi acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci crc32c_intel libata tg3 i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod [ 374.399497] CR2: 0000000000000000 This should be fixed by patch
commit 0c031fd37f69deb0cd8c43bbfcfccd62ebd7e952
Author: Xiao Ni <xni>
Date: Fri Dec 10 17:31:15 2021 +0800
md: Move alloc/free acct bioset in to personality
The --backup-file= should specify a directory with it. Bad: mdadm --grow -l0 /dev/md0 --backup-file=tmp0 Good: mdadm --grow -l0 /dev/md0 --backup-file=/tmp/tmp0 Getting across this bz I was wondering why the summary talks about reshape as of the bug descriptiabove on a takeover from a 6-legged raid1 with one spare device to a 1 legged raid0.
I.e. 6 out of seven devices are being dropped with one kept as a raid0 leg (i.e. a linear layout) in the takeover:
# uname -r
4.18.0-372.3.1.el8.x86_64
# Running on virtio-scsi devices, ^ kernel doesn't oops (loop issue, we've seen those before?)
# mdadm -C /dev/md0 -e 1.2 -l1 -n6 /dev/sd[a-f] -x 1 /dev/sdg
mdadm: array /dev/md0 started.
# cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4]
md0 : active raid1 sdg[6](S) sdf[5] sde[4] sdd[3] sdc[2] sdb[1] sda[0]
523264 blocks super 1.2 [6/6] [UUUUUU]
[=================>...] resync = 85.0% (445888/523264) finish=0.0min speed=222944K/sec
unused devices: <none>
# mkfs -t ext4 /dev/md0
# mount /dev/md0 /mnt
# dd if=/dev/urandom of=/mnt/testfile bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.287026 s, 365 MB/s
# df -h /mnt
Filesystem Size Used Avail Use% Mounted on
/dev/md0 487M 103M 356M 23% /mnt
# mdadm -G /dev/md0 -l0 --backup-file=tmp0
mdadm: level of /dev/md0 changed to raid0
# cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4]
md0 : active raid0 sde[4]
523264 blocks super 1.2 64k chunks
unused devices: <none>
# ll /mnt
total 102414
drwx------. 2 root root 12288 Mar 30 08:15 lost+found
-rw-r--r--. 1 root root 104857600 Mar 30 08:15 testfile
# mdadm -G /dev/md0 -l0
mdadm: level of /dev/md0 changed to raid0
# cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4]
md0 : active raid0 sde[4]
523264 blocks super 1.2 64k chunks
unused devices: <none>
Fix has been proposed upstream. https://www.spinics.net/lists/raid/msg70201.html Test kernel https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45518198 Based on my list of patchset #23 After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |