Bug 962243

Summary: unable to handle kernel NULL pointer dereference at (null) [efivarfs_file_read+0x46]
Product: [Fedora] Fedora Reporter: John Reiser <jreiser>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, the.ridikulus.rat
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-08 17:18:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
anaconda /tmp/syslog
none
output from "efibootmgr --verbose"
none
strace of efibootmgr write-to-nvram none

Description John Reiser 2013-05-12 23:03:52 UTC
Description of problem: Kernel handling of nvram on my mainboard fails badly.  UEFI install of Fedora-19-Beta-TC4 failed to set new efi boot target (bug 949786; marked as dup of bug 947142) because of ENOSPC in nvram.  Looking around after that caused two kernel BUGs.


Version-Release number of selected component (if applicable):
kernel-3.9.0-301.fc19.x86_64

How reproducible:


Steps to Reproduce:
1. UEFI install of Fedora-19-Beta-TC4-x86_64 on ASUS P8Z68-V/GEN3.
2. After failing to set new efi boot target, then try to copy files from /sys/devices/virtual/misc/nvram in order to diagnose problem.
3.
  
Actual results: /bin/cp fails due to I/O Error.  Anaconda /tmp/syslog says:

21:35:56,600 ALERT kernel:[ 1941.897956] BUG: unable to handle kernel NULL pointer dereference at           (null)
21:35:56,600 ALERT kernel:[ 1941.900186] IP: [<ffffffff81645305>] _raw_spin_lock_irq+0x15/0x40
21:35:56,600 WARNING kernel:[ 1941.902409] PGD 3fdcf9067 PUD 4075fa067 PMD 0
21:35:56,600 WARNING kernel:[ 1941.904637] Oops: 0002 [#1] SMP
21:35:56,600 WARNING kernel:[ 1941.906852] Modules linked in: xfs btrfs zlib_deflate libcrc32c fcoe libfcoe libfc scsi_transport_fc scsi_tgt eeepc_wmi asus_wmi sparse_keymap rfkill mperf microcode lpc_ich mei mfd_core serio_raw i2c_i801 uinput vfat fat radeon mxm_wmi crc32_pclmul crc32c_intel i2c_algo_bit drm_kms_helper ghash_clmulni_intel ttm e1000e drm usb_storage ptp i2c_core pps_core wmi video sunrpc dm_crypt dm_round_robin dm_multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 iscsi_ibft iscsi_boot_sysfs scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs
21:35:56,600 WARNING kernel:[ 1941.914395] CPU 1
21:35:56,600 WARNING kernel:[ 1941.914412] Pid: 15597, comm: cat Not tainted 3.9.0-301.fc19.x86_64 #1 System manufacturer System Product Name/P8Z68-V GEN3
21:35:56,600 WARNING kernel:[ 1941.919428] RIP: 0010:[<ffffffff81645305>]  [<ffffffff81645305>] _raw_spin_lock_irq+0x15/0x40
21:35:56,600 WARNING kernel:[ 1941.922010] RSP: 0018:ffff880403993eb0  EFLAGS: 00010082
21:35:56,600 WARNING kernel:[ 1941.924601] RAX: 0000000000000100 RBX: ffff880400adc008 RCX: ffff880403993f50
21:35:56,600 WARNING kernel:[ 1941.927212] RDX: 0000000000010000 RSI: 00000000008d5000 RDI: 0000000000000000
21:35:56,600 WARNING kernel:[ 1941.929831] RBP: ffff880403993eb0 R08: 0000000000000000 R09: 0000000000000000
21:35:56,600 WARNING kernel:[ 1941.932443] R10: 00007fff2546a940 R11: 0000000000000246 R12: ffff880400adc408
21:35:56,600 WARNING kernel:[ 1941.935061] R13: ffff880403993f50 R14: 0000000000000000 R15: ffff880403993f50
21:35:56,600 WARNING kernel:[ 1941.937684] FS:  00007f47af964740(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000
21:35:56,600 WARNING kernel:[ 1941.940328] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
21:35:56,600 WARNING kernel:[ 1941.942978] CR2: 0000000000000000 CR3: 00000004041e7000 CR4: 00000000000407e0
21:35:56,600 WARNING kernel:[ 1941.945652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
21:35:56,600 WARNING kernel:[ 1941.948322] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
21:35:56,600 WARNING kernel:[ 1941.950987] Process cat (pid: 15597, threadinfo ffff880403992000, task ffff880400649770)
21:35:56,600 WARNING kernel:[ 1941.953667] Stack:
21:35:56,600 WARNING kernel:[ 1941.956330]  ffff880403993f08 ffffffff814f54c6 0000000000010000 00000000008d5000
21:35:56,600 WARNING kernel:[ 1941.959047]  ffff8803ffd7b200 0000000000000000 ffff8803ffd7b200 00000000008d5000
21:35:56,600 WARNING kernel:[ 1941.961778]  ffff880403993f50 0000000000000000 0000000000000fff ffff880403993f38
21:35:56,600 WARNING kernel:[ 1941.964512] Call Trace:
21:35:56,600 WARNING kernel:[ 1941.967229]  [<ffffffff814f54c6>] efivarfs_file_read+0x46/0x180
21:35:56,600 WARNING kernel:[ 1941.969969]  [<ffffffff8119971c>] vfs_read+0x9c/0x170
21:35:56,600 WARNING kernel:[ 1941.972705]  [<ffffffff81199ae9>] sys_read+0x49/0xa0
21:35:56,600 WARNING kernel:[ 1941.975439]  [<ffffffff8164d819>] system_call_fastpath+0x16/0x1b
21:35:56,600 WARNING kernel:[ 1941.978187] Code: 0e 0f 1f 44 00 00 f3 90 0f b6 17 38 d1 75 f7 48 89 f0 5d c3 66 90 66 66 66 66 90 55 48 89 e5 fa 66 66 90 66 66 90 b8 00 01 00 00 <f0> 66 0f c1 07 0f b6 d4 38 c2 74 10 0f 1f 80 00 00 00 00 f3 90
21:35:56,600 ALERT kernel:[ 1941.981333] RIP  [<ffffffff81645305>] _raw_spin_lock_irq+0x15/0x40
21:35:56,600 WARNING kernel:[ 1941.984284]  RSP <ffff880403993eb0>
21:35:56,600 WARNING kernel:[ 1941.987227] CR2: 0000000000000000
21:35:56,600 WARNING kernel:[ 1941.990171] ---[ end trace 2fa02b954c29ef40 ]---

More probing causes second BUG; see attached syslog.  More probing causes hard hang: requires hardware reset.

Expected results: no problems manipulating nvram to store EFI boot info


Additional info:

There is nothing below /sys/fs/pstore: no Oops or BUG dumps; nothing.

At the time of the Anaconda install failure, there were a few dozen files below /sys/devices/virtual/misc/nvram, most with a long name containing a UUID, "ls -l" showed that most of them were short but three or four were some kilobytes long.  Trying to copy them to USB flash memory caused the second BUG.  Trying to cp the ones that did not make it, caused a hard "hang" (infinite loop).

Where is the "raw nvram" device, so that I can copy it byte-for-byte in order to diagnose storage layout?

BIOS is version P8Z68-V-GEN3-ASUS-3603.ROM (Oct/Nov 2012), believed to be the latest available from ASUS.

I will attach complete syslog, output from "efibootmgr --verbose", and strace of failing efibootmgr write-to-nvram.

Read-only "efibootmgr --verbose" hints at mismatch between BIOS and kernel when interpreting storage layout in nvram:
-----
Boot0002* Hard Drive    BIOS(2,0,00)AMGOAMNO........s.K.i.n.g.s.t.o.n.D.T. .R.u.b.b.e.r. .3...0............^?........A........................^?.....@..Gd-.;.A..MQ..L.K.i.n.g.s.t.o.n.D.T. .R.u.b.b.e.r. .3...0...^?...AMBOAMNO........o.H.D.T.7.2.2.5.1.6.D.L.A.3.8.0............^?........A......................^?.....>..Gd-.;.A..MQ..L. . . . . . .D.V.7.K.B.1.C.T.H.E.U.4.K.P...^?...AMBO
-----
That's almost certainly reading from free space.  The displayed BIOS boot menu contains only the harddrive serial number, and in particular nothing of the string  "AMGOAMNO...".

It looks to me like the BIOS re-computes the EFI boot order on every boot, because disconnecting a data cable causes the omission of that device from the boot menu as shown by the BIOS.

Comment 1 John Reiser 2013-05-12 23:04:35 UTC
Created attachment 747018 [details]
anaconda /tmp/syslog

Comment 2 John Reiser 2013-05-12 23:05:10 UTC
Created attachment 747019 [details]
output from "efibootmgr --verbose"

Comment 3 John Reiser 2013-05-12 23:05:53 UTC
Created attachment 747020 [details]
strace of efibootmgr write-to-nvram

Comment 4 Josh Boyer 2013-09-18 20:48:09 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 5 Josh Boyer 2013-10-08 17:18:11 UTC
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.