1232492 – NULL pointer dereference at 0000000000000038 IP: [<ffffffff815f514f>] bitmap_load+0x45f/0x610

Bug 1232492 - NULL pointer dereference at 0000000000000038 IP: [<ffffffff815f514f>] bitmap_load+0x45f/0x610

Summary: NULL pointer dereference at 0000000000000038 IP: [<ffffffff815f514f>] bitmap_...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	22
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-06-16 21:40 UTC by Nate Clark
Modified:	2015-11-03 14:30 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-11-03 14:30:33 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Script to cause crash (301 bytes, application/x-shellscript) 2015-06-16 21:40 UTC, Nate Clark	no flags	Details
Full vmcore-dmesg from a crash (155.06 KB, text/plain) 2015-06-16 21:41 UTC, Nate Clark	no flags	Details
Script to cause crash (simplified) (140 bytes, application/x-shellscript) 2015-06-16 21:51 UTC, Nate Clark	no flags	Details
Show Obsolete (1) View All

Description Nate Clark 2015-06-16 21:40:20 UTC

Created attachment 1039666 [details]
Script to cause crash

Description of problem:
md encounters a null in bitmap_load while handling a run array ioctl. When trying to construct an md device with a corrupt bitmap

Version-Release number of selected component (if applicable):
kernel-4.0.4-303.fc22.x86_64 and kernel-4.0.5-300.fc22.x86_64

How reproducible:
Happens about 1 in 3 or 4 tries

Steps to Reproduce:
1. Create GPT partition on two disks which goes to the end of the device (ie /dev/sdb1 and /dev/sdc1)
2. Create raid 1 group using those partitions (ie /dev/md127)
3. Update /etc/mdadm.conf with raid group (might not be needed)
4. Add PROGRAM line to mdadm.conf which calls something that just blocks for a few seconds. A simple shell script with a sleep 5 works fine.
5. Use attached script with updated devices to cause crash.

Actual results:
Kernel panics with null pointer dereference
[  828.852970] md/raid1:md11: active with 0 out of 2 mirrors
[  828.853060] created bitmap (30 pages) for device md11
[  828.855879] md11: bitmap initialized from disk: read 2 pages, set 59321 of 59615 bits
[  828.855898] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
[  828.864892] IP: [<ffffffff815f4daf>] bitmap_load+0x45f/0x610
[  828.871377] PGD 852cb2067 PUD 853169067 PMD 0
[  828.876656] Oops: 0002 [#1] SMP
[  828.880491] Modules linked in: ip6table_filter ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6_tables iptable_nat nf_nat_ipv4 iptable_mangle iptable_raw softdog ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat xt_CT nf_conntrack bonding intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support i2c_algo_bit crc32c_intel ghash_clmulni_intel ttm drm_kms_helper drm sb_edac edac_core i2c_i801 lpc_ich ipmi_devintf mfd_core mei_me mei ipmi_si tpm_tis shpchp tpm ipmi_msghandler wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc uas usb_storage raid1 ixgbe mpt2sas e1000e isci mdio vxlan libsas ip6_udp_tunnel udp_tunnel raid_class dca scsi_transport_sas ptp pps_core [last unloaded: ip6_tables]
[  828.967552]
[  828.968027] CPU: 9 PID: 491 Comm: mdadm Tainted: P        W  OE   4.0.4-303.fc22.x86_64 #1
[  828.977741] Hardware name: Newisys NDS-SB1EA/NDS-SB1EA, BIOS HDS 9.00 11/13/2014
[  828.986484] task: ffff880852840000 ti: ffff880859794000 task.ti: ffff880859794000
[  828.995322] RIP: 0010:[<ffffffff815f4daf>]  [<ffffffff815f4daf>] bitmap_load+0x45f/0x610
[  829.004924] RSP: 0018:ffff880859797c88  EFLAGS: 00010202
[  829.011135] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 0000000000000049
[  829.019387] RDX: 0000000000001388 RSI: ffff88087fd2e698 RDI: ffff88084a72f000
[  829.027637] RBP: ffff880859797d18 R08: 000000000000000a R09: 000000000000082c
[  829.035882] R10: ffffffff81f01fed R11: 000000000000082c R12: ffffea00215529c0
[  829.044157] R13: 0000000000000001 R14: 000000000000e8df R15: ffff88085a501700
[  829.052423] FS:  00007effff9a0700(0000) GS:ffff88087fd20000(0000) knlGS:0000000000000000
[  829.061962] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  829.068668] CR2: 0000000000000038 CR3: 0000000857518000 CR4: 00000000000407e0
[  829.076937] Stack:
[  829.079463]  0000000059797ca8 ffff88084a72f350 ffff88084a72f000 000000000000e7b9
[  829.088506]  0000000000000000 0000000000000000 0000000000000000 000000000000e8de
[  829.097564]  000000000000e8df 0000000000000000 0000000008000000 0000000005d72fb0
[  829.106627] Call Trace:
[  829.109639]  [<ffffffff815ef5b0>] do_md_run+0x30/0xa0
[  829.115562]  [<ffffffff815f13de>] md_ioctl+0xe9e/0x1bc0
[  829.121690]  [<ffffffff810fecfd>] ? call_rcu_sched+0x1d/0x20
[  829.128286]  [<ffffffff811b8fc1>] ? shmem_destroy_inode+0x31/0x50
[  829.135377]  [<ffffffff8123a2b7>] ? evict+0x107/0x190
[  829.141300]  [<ffffffff8137cd6f>] blkdev_ioctl+0x1bf/0x7d0
[  829.147707]  [<ffffffff812360c5>] ? dput+0xc5/0x230
[  829.153456]  [<ffffffff81258a5d>] block_ioctl+0x3d/0x50
[  829.159571]  [<ffffffff81232046>] do_vfs_ioctl+0x2c6/0x4d0
[  829.166009]  [<ffffffff8121f14e>] ? ____fput+0xe/0x10
[  829.171932]  [<ffffffff812322d1>] SyS_ioctl+0x81/0xa0
[  829.177852]  [<ffffffff81789749>] system_call_fastpath+0x12/0x17
[  829.184840] Code: ff ff e8 75 28 19 00 f0 41 80 67 78 fd 49 8b 47 30 f0 80 88 b8 01 00 00 20 48 8b 7d 80 48 8b 87 48 01 00 00 48 8b 97 80 03 00 00 <48> 89 50 38 48 8b bf 48 01 00 00 e8 b1 06 ff ff 4c 89 ff e8 59
[  829.211685] RIP  [<ffffffff815f4daf>] bitmap_load+0x45f/0x610
[  829.218457]  RSP <ffff880859797c88>
[  829.222622] CR2: 0000000000000038


Expected results:
Either the md device fails to assemble because of the corruption or assembles ignoring the corruption.

Additional info:
I have dumpfiles for 4.0.4-303 and 4.0.5-300 if that information is needed. I did try to update bitmap_load to not set timeout or wakeup the thead if mddev->thread was null but that was not the only place which encountered a problem. It seems like the mddev struct might not be in a good state while there is an outstanding call to the program specified in mdadm.conf.

Comment 1 Nate Clark 2015-06-16 21:41:31 UTC

Created attachment 1039669 [details]
Full vmcore-dmesg from a crash

Comment 2 Nate Clark 2015-06-16 21:51:34 UTC

Created attachment 1039670 [details]
Script to cause crash (simplified)

It appears the corruption is not actually needed all that is required is to run mdadm --assemble while an outstanding call to program is being performed.

Comment 3 Justin M. Forbes 2015-10-20 19:34:38 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs.

Fedora 22 has now been rebased to 4.2.3-200.fc22.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23.

If you experience different issues, please open a new bug report for those.

Comment 4 Nate Clark 2015-11-03 14:30:33 UTC

The issue was resolved in kernel 4.2 with commit bd6919228d7e1867ae9e24ab27e3e4a366c87d21, which was back ported to 4.1 stable.

Note You need to log in before you can comment on or make changes to this bug.