1758413 – kernel fails to boot on qemu: BUG: kernel NULL pointer dereference, address: 0000000000000ac8

Bug 1758413 - kernel fails to boot on qemu: BUG: kernel NULL pointer dereference, address: 0000000000000ac8

Summary: kernel fails to boot on qemu: BUG: kernel NULL pointer dereference, address: ...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1758903 (view as bug list)
Depends On:
Blocks:	TRACKER-bugs-affecting-libguestfs
TreeView+	depends on / blocked

Reported:	2019-10-04 05:09 UTC by Remi Collet
Modified:	2019-10-14 07:56 UTC (History)
CC List:	22 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-10-14 07:56:11 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
build.log (4.29 MB, text/plain) 2019-10-04 09:24 UTC, Richard W.M. Jones	no flags	Details
View All

Description Remi Collet 2019-10-04 05:09:56 UTC

Rebuild for https://fedoraproject.org/wiki/Changes/php74

See https://koji.fedoraproject.org/koji/taskinfo?taskID=38043876

Comment 1 Richard W.M. Jones 2019-10-04 09:23:14 UTC

[    4.058604] EXT4-fs (sdb): mounting ext2 file system using the ext4 subsystem
[    4.070033] BUG: kernel NULL pointer dereference, address: 0000000000000ac8
[    4.070033] #PF: supervisor read access in kernel mode
[    4.070033] #PF: error_code(0x0000) - not-present page
[    4.070033] PGD 2da2f067 P4D 2da2f067 PUD 2da2e067 PMD 0 
[    4.070033] Oops: 0000 [#1] SMP NOPTI
[    4.070033] CPU: 0 PID: 1 Comm: init Not tainted 5.4.0-0.rc1.git0.1.fc32.x86_64 #1
[    4.070033] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[    4.070033] RIP: 0010:mem_cgroup_track_foreign_dirty_slowpath+0x39/0x150
[    4.070033] Code: 48 8b 2d 4a cc 11 01 53 4c 8b 67 38 66 66 66 66 90 49 8b 45 00 48 89 e9 31 db be ff ff ff ff 48 8b 38 49 8d 84 24 e0 0a 00 00 <48> 39 78 e8 74 5a 48 8b 50 f8 48 39 ca 79 0e 44 8b 00 41 83 f8 01
[    4.070033] RSP: 0018:ffffb58ac000bc28 EFLAGS: 00000046
[    4.070033] RAX: 0000000000000ae0 RBX: 0000000000000000 RCX: 00000000fffb7552
[    4.070033] RDX: fffffffffffffff8 RSI: 00000000ffffffff RDI: 0000000000000002
[    4.070033] RBP: 00000000fffb7552 R08: 0000000000000001 R09: 0000000000001000
[    4.070033] R10: ffff921f2d522400 R11: ffff921f093ee100 R12: 0000000000000000
[    4.070033] R13: ffff921f2d437078 R14: ffff921f2e84ac90 R15: 0000000000000000
[    4.070033] FS:  00007ffdba5f7480(0000) GS:ffff921f2f000000(0000) knlGS:0000000000000000
[    4.070033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.070033] CR2: 0000000000000ac8 CR3: 000000002da2a000 CR4: 00000000000006f0
[    4.070033] Call Trace:
[    4.070033]  __set_page_dirty+0x50/0xc0
[    4.070033]  mark_buffer_dirty+0xbe/0xf0
[    4.070033]  ext4_commit_super+0x1bd/0x2c0
[    4.070033]  ext4_setup_super+0x127/0x1d0
[    4.070033]  ext4_fill_super+0x2151/0x3d00
[    4.070033]  ? bdev_name.isra.0+0x50/0xe0
[    4.070033]  ? snprintf+0x49/0x60
[    4.070033]  ? mount_bdev+0x176/0x1a0
[    4.070033]  mount_bdev+0x176/0x1a0
[    4.070033]  ? ext4_calculate_overhead+0x480/0x480
[    4.070033]  legacy_get_tree+0x27/0x40
[    4.070033]  vfs_get_tree+0x25/0xb0
[    4.070033]  do_mount+0x738/0x9f0
[    4.070033]  ksys_mount+0x7e/0xc0
[    4.070033]  __x64_sys_mount+0x21/0x30
[    4.070033]  do_syscall_64+0x5b/0x180
[    4.070033]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    4.070033] RIP: 0033:0x402418
[    4.070033] Code: 00 00 00 48 8d 3d c8 2e 00 00 e8 2a 0d 00 00 c7 05 39 5d 00 00 01 00 00 00 48 83 c4 08 c3 b0 3c b4 00 0f b7 c0 49 89 ca 0f 05 <48> 3d 7c ff ff ff 76 0f f7 d8 50 e8 bd 03 00 00 59 89 08 48 83 c8
[    4.070033] RSP: 002b:00007ffdba5f7338 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[    4.070033] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000402418
[    4.070033] RDX: 00000000004051d8 RSI: 00000000004051e4 RDI: 00000000004051ce
[    4.070033] RBP: 0000000000000015 R08: 00000000004055ec R09: 00007ffdba5f7100
[    4.070033] R10: 0000000000000400 R11: 0000000000000206 R12: 00007ffdba5f7350
[    4.070033] R13: 00007ffdba5f7354 R14: 0044b82fa09b5a53 R15: 0000000000408617
[    4.070033] Modules linked in: libcrc32c crc8 crc7 crc64 crc4 crc_itu_t virtio_fs fuse virtio_mmio virtio_input virtio_balloon virtio_scsi virtio_rpmsg_bus rpmsg_core nd_pmem nd_btt virtio_net net_failover failover virtio_crypto crypto_engine virtio_console virtio_blk libnvdimm crc32_generic
[    4.070033] CR2: 0000000000000ac8
[    4.070033] ---[ end trace cd5d9585454442ff ]---
[    4.070033] RIP: 0010:mem_cgroup_track_foreign_dirty_slowpath+0x39/0x150
[    4.070033] Code: 48 8b 2d 4a cc 11 01 53 4c 8b 67 38 66 66 66 66 90 49 8b 45 00 48 89 e9 31 db be ff ff ff ff 48 8b 38 49 8d 84 24 e0 0a 00 00 <48> 39 78 e8 74 5a 48 8b 50 f8 48 39 ca 79 0e 44 8b 00 41 83 f8 01
[    4.070033] RSP: 0018:ffffb58ac000bc28 EFLAGS: 00000046
[    4.070033] RAX: 0000000000000ae0 RBX: 0000000000000000 RCX: 00000000fffb7552
[    4.070033] RDX: fffffffffffffff8 RSI: 00000000ffffffff RDI: 0000000000000002
[    4.070033] RBP: 00000000fffb7552 R08: 0000000000000001 R09: 0000000000001000
[    4.070033] R10: ffff921f2d522400 R11: ffff921f093ee100 R12: 0000000000000000
[    4.070033] R13: ffff921f2d437078 R14: ffff921f2e84ac90 R15: 0000000000000000
[    4.070033] FS:  00007ffdba5f7480(0000) GS:ffff921f2f000000(0000) knlGS:0000000000000000
[    4.070033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.070033] CR2: 0000000000000ac8 CR3: 000000002da2a000 CR4: 00000000000006f0
[    4.075710] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    4.076032] Kernel Offset: 0x39000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Comment 2 Richard W.M. Jones 2019-10-04 09:24:03 UTC

Created attachment 1622531 [details]
build.log

Full log attached.

Comment 3 Richard W.M. Jones 2019-10-04 09:25:21 UTC

https://people.redhat.com/~rjones/qemu-sanity-check/

I wrote this back in 2013, and it would catch these kinds of errors automatically
if deployed.

Comment 4 Jeremy Cline 2019-10-04 16:34:46 UTC

Well, this is the Rawhide kernel just after the merge window, it's fortunate it boots anywhere. I'm not saying upstream *shouldn't* be using continuous integration to catch all of issues before they're merged, but that's clearly not happening and from a Fedora packaging perspective I occasionally have to build Rawhide kernels with known (or unknown) issues.

As for testing with qemu-sanity-check in particular, perhaps Major could comment on whether or not that's happening or planned on the Red Hat side of things.

Comment 5 David Hill 2019-10-06 18:29:58 UTC

I hit this issue too .  Thanks Richard for pointing me to this BZ.

Comment 6 Richard W.M. Jones 2019-10-06 18:37:40 UTC

*** Bug 1758903 has been marked as a duplicate of this bug. ***

Comment 7 Richard W.M. Jones 2019-10-13 11:28:02 UTC

Still failing in 5.4.0-0.rc2.git0.1.fc32.x86_64

Comment 8 Richard W.M. Jones 2019-10-13 16:17:25 UTC

I noticed this has been fixed upstream.  Bisecting shows that it's fixed by this commit:

commit 08d1d0e6d0a00c6e687201774f3bf61177741e80
Author: Baoquan He <bhe>
Date:   Sun Oct 6 17:58:15 2019 -0700

    memcg: only record foreign writebacks with dirty pages when memcg is not disabled
    
    In kdump kernel, memcg usually is disabled with 'cgroup_disable=memory'
    for saving memory.  Now kdump kernel will always panic when dump vmcore
    to local disk:
    
      BUG: kernel NULL pointer dereference, address: 0000000000000ab8
      Oops: 0000 [#1] SMP NOPTI
      CPU: 0 PID: 598 Comm: makedumpfile Not tainted 5.3.0+ #26
      Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
      RIP: 0010:mem_cgroup_track_foreign_dirty_slowpath+0x38/0x140
      Call Trace:
       __set_page_dirty+0x52/0xc0
       iomap_set_page_dirty+0x50/0x90
       iomap_write_end+0x6e/0x270
       iomap_write_actor+0xce/0x170
       iomap_apply+0xba/0x11e
       iomap_file_buffered_write+0x62/0x90
       xfs_file_buffered_aio_write+0xca/0x320 [xfs]
       new_sync_write+0x12d/0x1d0
       vfs_write+0xa5/0x1a0
       ksys_write+0x59/0xd0
       do_syscall_64+0x59/0x1e0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
    
    And this will corrupt the 1st kernel too with 'cgroup_disable=memory'.
    
    Via the trace and with debugging, it is pointing to commit 97b27821b485
    ("writeback, memcg: Implement foreign dirty flushing") which introduced
    this regression.  Disabling memcg causes the null pointer dereference at
    uninitialized data in function mem_cgroup_track_foreign_dirty_slowpath().
    
    Fix it by returning directly if memcg is disabled, but not trying to
    record the foreign writebacks with dirty pages.
    
    Link: http://lkml.kernel.org/r/20190924141928.GD31919@MiWiFi-R3L-srv
    Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing")
    Signed-off-by: Baoquan He <bhe>
    Acked-by: Michal Hocko <mhocko>
    Cc: Johannes Weiner <hannes>
    Cc: Jan Kara <jack>
    Cc: Tejun Heo <tj>
    Cc: Jens Axboe <axboe>
    Signed-off-by: Andrew Morton <akpm>
    Signed-off-by: Linus Torvalds <torvalds>

 include/linux/memcontrol.h | 3 +++
 1 file changed, 3 insertions(+)

Comment 9 Richard W.M. Jones 2019-10-14 07:56:11 UTC

-rc3 was released yesterday evening.

Note You need to log in before you can comment on or make changes to this bug.

airlied
bskeggs
dhill
hdegoede
ichavero
itamar
jarodwilson
jcline
jeremy
jglisse
john.j5live
jonathan
josef
kernel-maint
linville
masami256
mchehab
mhayden
mjg59
pasik
rjones
steved