Bug 2058496
| Summary: | mdadm-4.2-2.el8 regression panic during reshape | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Fine Fan <ffan> |
| Component: | mdadm | Assignee: | Nigel Croxon <ncroxon> |
| Status: | VERIFIED --- | QA Contact: | Fine Fan <ffan> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | 8.6 | CC: | dledford, heinzm, lmiksik, ncroxon, xni, yizhan |
| Target Milestone: | rc | Keywords: | Regression, Triaged |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-4.18.0-404.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Fine Fan
2022-02-25 06:37:05 UTC
[ 374.148492] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 374.156397] PGD 0 P4D 0 [ 374.158933] Oops: 0010 [#1] SMP NOPTI [ 374.162592] CPU: 26 PID: 2201 Comm: jbd2/md0-8 Kdump: loaded Not tainted 4.18.0-367.el8.x86_64 #1 [ 374.171457] Hardware name: Dell Inc. PowerEdge R6515/035YY8, BIOS 2.5.5 10/07/2021 [ 374.179014] RIP: 0010:0x0 [ 374.181641] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 374.188513] RSP: 0018:ffffa9f60358ba08 EFLAGS: 00010206 [ 374.193739] RAX: 0000000000000000 RBX: 0000000000411200 RCX: ffff9c4b097af7d8 [ 374.200871] RDX: ffff9c4b097af7c8 RSI: 0000000000000000 RDI: 0000000000411200 [ 374.207996] RBP: 0000000000611200 R08: 0000000000000001 R09: 0000000000000001 [ 374.215126] R10: 0000000000000002 R11: 0000000000000400 R12: ffff9c4b097af808 [ 374.222251] R13: ffffa9f60358ba40 R14: ffffffffb993d840 R15: ffff9c4b097af7d8 [ 374.229376] FS: 0000000000000000(0000) GS:ffff9c521f280000(0000) knlGS:0000000000000000 [ 374.237460] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 374.243197] CR2: ffffffffffffffd6 CR3: 0000000673810005 CR4: 0000000000770ee0 [ 374.250320] PKRU: 55555554 [ 374.253026] Call Trace: [ 374.255504] mempool_alloc+0x67/0x180 [ 374.259214] bio_alloc_bioset+0x14a/0x220 [ 374.263225] bio_clone_fast+0x19/0x60 [ 374.266892] md_account_bio+0x39/0x80 [ 374.270559] raid0_make_request+0xa0/0x550 [raid0] [ 374.275351] ? blk_throtl_bio+0x252/0xb80 [ 374.279363] ? finish_wait+0x80/0x80 [ 374.282942] md_handle_request+0x119/0x190 [ 374.287040] md_make_request+0x5b/0xb0 [ 374.290785] generic_make_request+0x25b/0x350 [ 374.295144] submit_bio+0x3c/0x160 [ 374.298541] ? bio_add_page+0x42/0x50 [ 374.302207] submit_bh_wbc+0x16a/0x190 [ 374.305960] jbd2_journal_commit_transaction+0x6b6/0x1a00 [jbd2] [ 374.311967] ? __switch_to_asm+0x41/0x70 [ 374.315892] ? sk_filter_is_valid_access+0x50/0x60 [ 374.320684] ? __switch_to+0x10c/0x450 [ 374.324436] kjournald2+0xbd/0x270 [jbd2] [ 374.328448] ? finish_wait+0x80/0x80 [ 374.332020] ? commit_timeout+0x10/0x10 [jbd2] [ 374.336466] kthread+0x10a/0x120 [ 374.339697] ? set_kthread_struct+0x40/0x40 [ 374.343876] ret_from_fork+0x22/0x40 [ 374.347456] Modules linked in: raid0 ext4 mbcache jbd2 raid1 loop sunrpc dm_multipath dell_smbios intel_rapl_msr dell_wmi_descriptor wmi_bmof dcdbas intel_rapl_common amd64_edac_mod edac_mce_amd amd_energy kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl pcspkr ipmi_ssif ccp sp5100_tco k10temp i2c_piix4 wmi acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci crc32c_intel libata tg3 i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod [ 374.399497] CR2: 0000000000000000 This should be fixed by patch
commit 0c031fd37f69deb0cd8c43bbfcfccd62ebd7e952
Author: Xiao Ni <xni>
Date: Fri Dec 10 17:31:15 2021 +0800
md: Move alloc/free acct bioset in to personality
The --backup-file= should specify a directory with it. Bad: mdadm --grow -l0 /dev/md0 --backup-file=tmp0 Good: mdadm --grow -l0 /dev/md0 --backup-file=/tmp/tmp0 Getting across this bz I was wondering why the summary talks about reshape as of the bug descriptiabove on a takeover from a 6-legged raid1 with one spare device to a 1 legged raid0.
I.e. 6 out of seven devices are being dropped with one kept as a raid0 leg (i.e. a linear layout) in the takeover:
# uname -r
4.18.0-372.3.1.el8.x86_64
# Running on virtio-scsi devices, ^ kernel doesn't oops (loop issue, we've seen those before?)
# mdadm -C /dev/md0 -e 1.2 -l1 -n6 /dev/sd[a-f] -x 1 /dev/sdg
mdadm: array /dev/md0 started.
# cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4]
md0 : active raid1 sdg[6](S) sdf[5] sde[4] sdd[3] sdc[2] sdb[1] sda[0]
523264 blocks super 1.2 [6/6] [UUUUUU]
[=================>...] resync = 85.0% (445888/523264) finish=0.0min speed=222944K/sec
unused devices: <none>
# mkfs -t ext4 /dev/md0
# mount /dev/md0 /mnt
# dd if=/dev/urandom of=/mnt/testfile bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.287026 s, 365 MB/s
# df -h /mnt
Filesystem Size Used Avail Use% Mounted on
/dev/md0 487M 103M 356M 23% /mnt
# mdadm -G /dev/md0 -l0 --backup-file=tmp0
mdadm: level of /dev/md0 changed to raid0
# cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4]
md0 : active raid0 sde[4]
523264 blocks super 1.2 64k chunks
unused devices: <none>
# ll /mnt
total 102414
drwx------. 2 root root 12288 Mar 30 08:15 lost+found
-rw-r--r--. 1 root root 104857600 Mar 30 08:15 testfile
# mdadm -G /dev/md0 -l0
mdadm: level of /dev/md0 changed to raid0
# cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4]
md0 : active raid0 sde[4]
523264 blocks super 1.2 64k chunks
unused devices: <none>
Fix has been proposed upstream. https://www.spinics.net/lists/raid/msg70201.html Test kernel https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45518198 Based on my list of patchset #23 |