Bug 1937218 - gfs2: Panic with kernel BUG at fs/gfs2/inode.h:64! on kernel 3.10.0-1160.15.2.el7
Summary: gfs2: Panic with kernel BUG at fs/gfs2/inode.h:64! on kernel 3.10.0-1160.15.2...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.9
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 7.9
Assignee: Robert Peterson
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-10 07:46 UTC by Reid Wahl
Modified: 2021-07-06 12:42 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-06 12:42:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5871771 0 None None None 2021-03-10 07:53:36 UTC

Description Reid Wahl 2021-03-10 07:46:42 UTC
Description of problem:

A node panics with the following call trace (taken from var/crash/127.0.0.1-2021-03-08-13:05:58/vmcore-dmesg.txt):

[ 5456.045663] ------------[ cut here ]------------
[ 5456.045762] kernel BUG at fs/gfs2/inode.h:64!
[ 5456.045841] invalid opcode: 0000 [#1] SMP 
[ 5456.045895] Modules linked in: gfs2 dlm udp_diag tcp_diag inet_diag vmw_vsock_vmci_transport vsock sb_edac ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd vmw_balloon pcspkr joydev sg parport_pc parport vmw_vmci i2c_piix4 auth_rpcgss sunrpc binfmt_misc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod drm crc_t10dif crct10dif_generic nfit crct10dif_pclmul crct10dif_common libnvdimm crc32c_intel serio_raw ata_piix libata vmw_pvscsi vmxnet3 drm_panel_orientation_quirks floppy dm_mirror dm_region_hash dm_log dm_mod fuse
[ 5456.046310] CPU: 1 PID: 70374 Comm: kworker/1:2 Kdump: loaded Tainted: G        W      ------------   3.10.0-1160.15.2.el7.x86_64 #1
[ 5456.046372] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[ 5456.046418] Workqueue: delete_workqueue delete_work_func [gfs2]
[ 5456.046463] task: ffff9197b49a4200 ti: ffff91981227c000 task.ti: ffff91981227c000
[ 5456.046601] RIP: 0010:[<ffffffffc08db748>]  [<ffffffffc08db748>] gfs2_add_inode_blocks.part.24+0x14/0x16 [gfs2]
[ 5456.046773] RSP: 0018:ffff91981227fb48  EFLAGS: 00010246
[ 5456.046872] RAX: 0000000000000038 RBX: ffff91974a1e03e0 RCX: 0000000000000000
[ 5456.046976] RDX: 0000000000000000 RSI: ffff91983fc938d8 RDI: ffff91983fc938d8
[ 5456.047107] RBP: ffff91981227fb48 R08: 0000000000000000 R09: ffff9197bbbb9040
[ 5456.047245] R10: 0000000000000629 R11: 0000000000000000 R12: 0000000000046c02
[ 5456.047361] R13: ffff919812026000 R14: ffff91977f55e2a8 R15: ffff9198120250f8
[ 5456.047494] FS:  0000000000000000(0000) GS:ffff91983fc80000(0000) knlGS:0000000000000000
[ 5456.047616] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5456.047712] CR2: 00007f13c4e06684 CR3: 0000000094812000 CR4: 00000000001607e0
[ 5456.047953] Call Trace:
[ 5456.047979]  [<ffffffffc08ab1e8>] punch_hole+0xfa8/0x1140 [gfs2]
[ 5456.048021]  [<ffffffffc08aafd6>] ? punch_hole+0xd96/0x1140 [gfs2]
[ 5456.048062]  [<ffffffffc08ace82>] gfs2_file_dealloc+0x12/0x20 [gfs2]
[ 5456.048108]  [<ffffffffc08d5978>] gfs2_evict_inode+0x528/0x660 [gfs2]
[ 5456.048181]  [<ffffffffc08d561b>] ? gfs2_evict_inode+0x1cb/0x660 [gfs2]
[ 5456.048211]  [<ffffffffa2a6c324>] evict+0xb4/0x180
[ 5456.048232]  [<ffffffffa2a6c75c>] iput+0xfc/0x190
[ 5456.048254]  [<ffffffffc08b92bc>] delete_work_func+0x6c/0x80 [gfs2]
[ 5456.048309]  [<ffffffffa28bde3f>] process_one_work+0x17f/0x440
[ 5456.048347]  [<ffffffffa28bef56>] worker_thread+0x126/0x3c0
[ 5456.048374]  [<ffffffffa28bee30>] ? manage_workers.isra.26+0x2a0/0x2a0
[ 5456.048401]  [<ffffffffa28c5e11>] kthread+0xd1/0xe0
[ 5456.049980]  [<ffffffffa28c5d40>] ? insert_kthread_work+0x40/0x40
[ 5456.051159]  [<ffffffffa2f94df7>] ret_from_fork_nospec_begin+0x21/0x21
[ 5456.052435]  [<ffffffffa28c5d40>] ? insert_kthread_work+0x40/0x40
[ 5456.053625] Code: e2 b9 ea ff ff ff e9 48 fb ff ff 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 8b 47 28 48 89 e5 48 8b b8 50 03 00 00 e8 28 db ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 8b 47 28 48 89 
[ 5456.056340] RIP  [<ffffffffc08db748>] gfs2_add_inode_blocks.part.24+0x14/0x16 [gfs2]
[ 5456.057486]  RSP <ffff91981227fb48>


We have a vmcore in support case 02888285 (and downloaded to supportshell), but I'm not sure how to attach it to the BZ due to its size. Maybe this gives you what you need: https://galvatron.cee.redhat.com/manager/537433292


I also see a number of "invalid metadata block" withdrawals in the logs, but they don't appear to happen immediately before the panics -- so if there's a causal relationship, it may be panic -> corruption/withdrawal rather than vice versa. Example:

Mar  9 10:01:04 node01 kernel: GFS2: fsid=MQ-CTMQA1:mqhavolfs.1: fatal: invalid metadata block
GFS2: fsid=MQ-CTMQA1:mqhavolfs.1:   bh = 204479 (magic number)
GFS2: fsid=MQ-CTMQA1:mqhavolfs.1:   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 428
Mar  9 10:01:04 node01 kernel: GFS2: fsid=MQ-CTMQA1:mqhavolfs.1: about to withdraw this file system
Mar  9 10:01:04 node01 kernel: GFS2: fsid=MQ-CTMQA1:mqhavolfs.1: telling LM to unmount
Mar  9 10:01:04 node01 kernel: GFS2: fsid=MQ-CTMQA1:mqhavolfs.1: withdrawn
Mar  9 10:01:04 node01 kernel: CPU: 3 PID: 4179 Comm: gfs2_quotad Kdump: loaded Tainted: G        W      ------------   3.10.0-1160.15.2.el7.x86_64 #1
Mar  9 10:01:04 node01 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Mar  9 10:01:04 node01 kernel: Call Trace:
Mar  9 10:01:04 node01 kernel:  [<ffffffffb9781fba>] dump_stack+0x19/0x1b
Mar  9 10:01:04 node01 kernel:  [<ffffffffc05913d6>] gfs2_lm_withdraw+0x146/0x180 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb9286cb3>] ? unlock_buffer+0x23/0x30
Mar  9 10:01:04 node01 kernel:  [<ffffffffc05916f5>] gfs2_meta_check_ii+0x45/0x50 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc05783e9>] gfs2_meta_indirect_buffer+0xe9/0x150 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0560c44>] __fillup_metapath+0x44/0x90 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0560cca>] fillup_metapath+0x3a/0x50 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0562867>] punch_hole+0x627/0x1140 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90e23c9>] ? pick_next_entity+0xa9/0x190
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90e266c>] ? set_next_entity+0x3c/0xe0
Mar  9 10:01:04 node01 kernel:  [<ffffffffb97878ff>] ? __schedule+0x3af/0x860
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90adb8b>] ? lock_timer_base.isra.38+0x2b/0x50
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90ae09e>] ? try_to_del_timer_sync+0x5e/0x90
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0564e58>] gfs2_truncatei_resume+0x18/0x30 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0571e68>] gfs2_glock_finish_truncate+0x18/0x70 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0586f9d>] gfs2_quotad+0xfd/0x2c0 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c6f00>] ? wake_up_atomic_t+0x30/0x30
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0586ea0>] ? gfs2_wake_up_statfs+0x40/0x40 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5e11>] kthread+0xd1/0xe0
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5d40>] ? insert_kthread_work+0x40/0x40
Mar  9 10:01:04 node01 kernel:  [<ffffffffb9794df7>] ret_from_fork_nospec_begin+0x21/0x21
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5d40>] ? insert_kthread_work+0x40/0x40
Mar  9 10:01:04 node01 kernel: ------------[ cut here ]------------
Mar  9 10:01:04 node01 kernel: Modules linked in: gfs2 dlm udp_diag tcp_diag inet_diag vmw_vsock_vmci_transport vsock sb_edac iosf_mbi ppdev crc32_pclmul ghash_clmulni_intel aesni_intel vmw_balloon lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr sg parport_pc vmw_vmci parport i2c_piix4 auth_rpcgss binfmt_misc sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi sd_mod crc_t10dif crct10dif_generic vmwgfx drm_kms_helper crct10dif_pclmul crct10dif_common crc32c_intel syscopyarea sysfillrect serio_raw sysimgblt fb_sys_fops ttm drm ata_piix vmxnet3 vmw_pvscsi libata nfit libnvdimm drm_panel_orientation_quirks floppy dm_mirror dm_region_hash dm_log dm_mod fuse
Mar  9 10:01:04 node01 kernel: CPU: 3 PID: 4179 Comm: gfs2_quotad Kdump: loaded Tainted: G        W      ------------   3.10.0-1160.15.2.el7.x86_64 #1
Mar  9 10:01:04 node01 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Mar  9 10:01:04 node01 kernel: Call Trace:
Mar  9 10:01:04 node01 kernel:  [<ffffffffb9781fba>] dump_stack+0x19/0x1b
Mar  9 10:01:04 node01 kernel:  [<ffffffffb909b1b8>] __warn+0xd8/0x100
Mar  9 10:01:04 node01 kernel:  [<ffffffffb909b23f>] warn_slowpath_fmt+0x5f/0x80
Mar  9 10:01:04 node01 kernel:  [<ffffffffb928644e>] __brelse+0x2e/0x50
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0560d06>] release_metapath+0x26/0x40 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc056275e>] punch_hole+0x51e/0x1140 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90e23c9>] ? pick_next_entity+0xa9/0x190
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90e266c>] ? set_next_entity+0x3c/0xe0
Mar  9 10:01:04 node01 kernel:  [<ffffffffb97878ff>] ? __schedule+0x3af/0x860
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90adb8b>] ? lock_timer_base.isra.38+0x2b/0x50
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90ae09e>] ? try_to_del_timer_sync+0x5e/0x90
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0564e58>] gfs2_truncatei_resume+0x18/0x30 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0571e68>] gfs2_glock_finish_truncate+0x18/0x70 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0586f9d>] gfs2_quotad+0xfd/0x2c0 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c6f00>] ? wake_up_atomic_t+0x30/0x30
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0586ea0>] ? gfs2_wake_up_statfs+0x40/0x40 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5e11>] kthread+0xd1/0xe0
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5d40>] ? insert_kthread_work+0x40/0x40
Mar  9 10:01:04 node01 kernel:  [<ffffffffb9794df7>] ret_from_fork_nospec_begin+0x21/0x21
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5d40>] ? insert_kthread_work+0x40/0x40
Mar  9 10:01:04 node01 kernel: ---[ end trace 2b40c3728427f56c ]---

-----

Version-Release number of selected component (if applicable):

kernel 3.10.0-1160.15.2.el7.x86_64

-----

How reproducible:

Intermittent

-----

Steps to Reproduce:

Unknown

-----

Actual results:

Panic and call trace in Description

-----

Expected results:

No panic

-----

Additional info:

Let us know if you need any more info, like filesystem metadata immediately after the panic.


Note You need to log in before you can comment on or make changes to this bug.