Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1937218

Summary: gfs2: Panic with kernel BUG at fs/gfs2/inode.h:64! on kernel 3.10.0-1160.15.2.el7
Product: Red Hat Enterprise Linux 7 Reporter: Reid Wahl <nwahl>
Component: kernelAssignee: Robert Peterson <rpeterso>
kernel sub component: GFS/GFS2 QA Contact: cluster-qe <cluster-qe>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: high    
Priority: high CC: dwysocha, gfs2-maint, rpeterso, sbradley
Version: 7.9Flags: pm-rhel: mirror+
Target Milestone: rc   
Target Release: 7.9   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-06 12:42:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Reid Wahl 2021-03-10 07:46:42 UTC
Description of problem:

A node panics with the following call trace (taken from var/crash/127.0.0.1-2021-03-08-13:05:58/vmcore-dmesg.txt):

[ 5456.045663] ------------[ cut here ]------------
[ 5456.045762] kernel BUG at fs/gfs2/inode.h:64!
[ 5456.045841] invalid opcode: 0000 [#1] SMP 
[ 5456.045895] Modules linked in: gfs2 dlm udp_diag tcp_diag inet_diag vmw_vsock_vmci_transport vsock sb_edac ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd vmw_balloon pcspkr joydev sg parport_pc parport vmw_vmci i2c_piix4 auth_rpcgss sunrpc binfmt_misc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sd_mod drm crc_t10dif crct10dif_generic nfit crct10dif_pclmul crct10dif_common libnvdimm crc32c_intel serio_raw ata_piix libata vmw_pvscsi vmxnet3 drm_panel_orientation_quirks floppy dm_mirror dm_region_hash dm_log dm_mod fuse
[ 5456.046310] CPU: 1 PID: 70374 Comm: kworker/1:2 Kdump: loaded Tainted: G        W      ------------   3.10.0-1160.15.2.el7.x86_64 #1
[ 5456.046372] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[ 5456.046418] Workqueue: delete_workqueue delete_work_func [gfs2]
[ 5456.046463] task: ffff9197b49a4200 ti: ffff91981227c000 task.ti: ffff91981227c000
[ 5456.046601] RIP: 0010:[<ffffffffc08db748>]  [<ffffffffc08db748>] gfs2_add_inode_blocks.part.24+0x14/0x16 [gfs2]
[ 5456.046773] RSP: 0018:ffff91981227fb48  EFLAGS: 00010246
[ 5456.046872] RAX: 0000000000000038 RBX: ffff91974a1e03e0 RCX: 0000000000000000
[ 5456.046976] RDX: 0000000000000000 RSI: ffff91983fc938d8 RDI: ffff91983fc938d8
[ 5456.047107] RBP: ffff91981227fb48 R08: 0000000000000000 R09: ffff9197bbbb9040
[ 5456.047245] R10: 0000000000000629 R11: 0000000000000000 R12: 0000000000046c02
[ 5456.047361] R13: ffff919812026000 R14: ffff91977f55e2a8 R15: ffff9198120250f8
[ 5456.047494] FS:  0000000000000000(0000) GS:ffff91983fc80000(0000) knlGS:0000000000000000
[ 5456.047616] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5456.047712] CR2: 00007f13c4e06684 CR3: 0000000094812000 CR4: 00000000001607e0
[ 5456.047953] Call Trace:
[ 5456.047979]  [<ffffffffc08ab1e8>] punch_hole+0xfa8/0x1140 [gfs2]
[ 5456.048021]  [<ffffffffc08aafd6>] ? punch_hole+0xd96/0x1140 [gfs2]
[ 5456.048062]  [<ffffffffc08ace82>] gfs2_file_dealloc+0x12/0x20 [gfs2]
[ 5456.048108]  [<ffffffffc08d5978>] gfs2_evict_inode+0x528/0x660 [gfs2]
[ 5456.048181]  [<ffffffffc08d561b>] ? gfs2_evict_inode+0x1cb/0x660 [gfs2]
[ 5456.048211]  [<ffffffffa2a6c324>] evict+0xb4/0x180
[ 5456.048232]  [<ffffffffa2a6c75c>] iput+0xfc/0x190
[ 5456.048254]  [<ffffffffc08b92bc>] delete_work_func+0x6c/0x80 [gfs2]
[ 5456.048309]  [<ffffffffa28bde3f>] process_one_work+0x17f/0x440
[ 5456.048347]  [<ffffffffa28bef56>] worker_thread+0x126/0x3c0
[ 5456.048374]  [<ffffffffa28bee30>] ? manage_workers.isra.26+0x2a0/0x2a0
[ 5456.048401]  [<ffffffffa28c5e11>] kthread+0xd1/0xe0
[ 5456.049980]  [<ffffffffa28c5d40>] ? insert_kthread_work+0x40/0x40
[ 5456.051159]  [<ffffffffa2f94df7>] ret_from_fork_nospec_begin+0x21/0x21
[ 5456.052435]  [<ffffffffa28c5d40>] ? insert_kthread_work+0x40/0x40
[ 5456.053625] Code: e2 b9 ea ff ff ff e9 48 fb ff ff 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 8b 47 28 48 89 e5 48 8b b8 50 03 00 00 e8 28 db ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 8b 47 28 48 89 
[ 5456.056340] RIP  [<ffffffffc08db748>] gfs2_add_inode_blocks.part.24+0x14/0x16 [gfs2]
[ 5456.057486]  RSP <ffff91981227fb48>


We have a vmcore in support case 02888285 (and downloaded to supportshell), but I'm not sure how to attach it to the BZ due to its size. Maybe this gives you what you need: https://galvatron.cee.redhat.com/manager/537433292


I also see a number of "invalid metadata block" withdrawals in the logs, but they don't appear to happen immediately before the panics -- so if there's a causal relationship, it may be panic -> corruption/withdrawal rather than vice versa. Example:

Mar  9 10:01:04 node01 kernel: GFS2: fsid=MQ-CTMQA1:mqhavolfs.1: fatal: invalid metadata block
GFS2: fsid=MQ-CTMQA1:mqhavolfs.1:   bh = 204479 (magic number)
GFS2: fsid=MQ-CTMQA1:mqhavolfs.1:   function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 428
Mar  9 10:01:04 node01 kernel: GFS2: fsid=MQ-CTMQA1:mqhavolfs.1: about to withdraw this file system
Mar  9 10:01:04 node01 kernel: GFS2: fsid=MQ-CTMQA1:mqhavolfs.1: telling LM to unmount
Mar  9 10:01:04 node01 kernel: GFS2: fsid=MQ-CTMQA1:mqhavolfs.1: withdrawn
Mar  9 10:01:04 node01 kernel: CPU: 3 PID: 4179 Comm: gfs2_quotad Kdump: loaded Tainted: G        W      ------------   3.10.0-1160.15.2.el7.x86_64 #1
Mar  9 10:01:04 node01 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Mar  9 10:01:04 node01 kernel: Call Trace:
Mar  9 10:01:04 node01 kernel:  [<ffffffffb9781fba>] dump_stack+0x19/0x1b
Mar  9 10:01:04 node01 kernel:  [<ffffffffc05913d6>] gfs2_lm_withdraw+0x146/0x180 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb9286cb3>] ? unlock_buffer+0x23/0x30
Mar  9 10:01:04 node01 kernel:  [<ffffffffc05916f5>] gfs2_meta_check_ii+0x45/0x50 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc05783e9>] gfs2_meta_indirect_buffer+0xe9/0x150 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0560c44>] __fillup_metapath+0x44/0x90 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0560cca>] fillup_metapath+0x3a/0x50 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0562867>] punch_hole+0x627/0x1140 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90e23c9>] ? pick_next_entity+0xa9/0x190
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90e266c>] ? set_next_entity+0x3c/0xe0
Mar  9 10:01:04 node01 kernel:  [<ffffffffb97878ff>] ? __schedule+0x3af/0x860
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90adb8b>] ? lock_timer_base.isra.38+0x2b/0x50
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90ae09e>] ? try_to_del_timer_sync+0x5e/0x90
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0564e58>] gfs2_truncatei_resume+0x18/0x30 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0571e68>] gfs2_glock_finish_truncate+0x18/0x70 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0586f9d>] gfs2_quotad+0xfd/0x2c0 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c6f00>] ? wake_up_atomic_t+0x30/0x30
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0586ea0>] ? gfs2_wake_up_statfs+0x40/0x40 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5e11>] kthread+0xd1/0xe0
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5d40>] ? insert_kthread_work+0x40/0x40
Mar  9 10:01:04 node01 kernel:  [<ffffffffb9794df7>] ret_from_fork_nospec_begin+0x21/0x21
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5d40>] ? insert_kthread_work+0x40/0x40
Mar  9 10:01:04 node01 kernel: ------------[ cut here ]------------
Mar  9 10:01:04 node01 kernel: Modules linked in: gfs2 dlm udp_diag tcp_diag inet_diag vmw_vsock_vmci_transport vsock sb_edac iosf_mbi ppdev crc32_pclmul ghash_clmulni_intel aesni_intel vmw_balloon lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr sg parport_pc vmw_vmci parport i2c_piix4 auth_rpcgss binfmt_misc sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi sd_mod crc_t10dif crct10dif_generic vmwgfx drm_kms_helper crct10dif_pclmul crct10dif_common crc32c_intel syscopyarea sysfillrect serio_raw sysimgblt fb_sys_fops ttm drm ata_piix vmxnet3 vmw_pvscsi libata nfit libnvdimm drm_panel_orientation_quirks floppy dm_mirror dm_region_hash dm_log dm_mod fuse
Mar  9 10:01:04 node01 kernel: CPU: 3 PID: 4179 Comm: gfs2_quotad Kdump: loaded Tainted: G        W      ------------   3.10.0-1160.15.2.el7.x86_64 #1
Mar  9 10:01:04 node01 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Mar  9 10:01:04 node01 kernel: Call Trace:
Mar  9 10:01:04 node01 kernel:  [<ffffffffb9781fba>] dump_stack+0x19/0x1b
Mar  9 10:01:04 node01 kernel:  [<ffffffffb909b1b8>] __warn+0xd8/0x100
Mar  9 10:01:04 node01 kernel:  [<ffffffffb909b23f>] warn_slowpath_fmt+0x5f/0x80
Mar  9 10:01:04 node01 kernel:  [<ffffffffb928644e>] __brelse+0x2e/0x50
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0560d06>] release_metapath+0x26/0x40 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc056275e>] punch_hole+0x51e/0x1140 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90e23c9>] ? pick_next_entity+0xa9/0x190
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90e266c>] ? set_next_entity+0x3c/0xe0
Mar  9 10:01:04 node01 kernel:  [<ffffffffb97878ff>] ? __schedule+0x3af/0x860
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90adb8b>] ? lock_timer_base.isra.38+0x2b/0x50
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90ae09e>] ? try_to_del_timer_sync+0x5e/0x90
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0564e58>] gfs2_truncatei_resume+0x18/0x30 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0571e68>] gfs2_glock_finish_truncate+0x18/0x70 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0586f9d>] gfs2_quotad+0xfd/0x2c0 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c6f00>] ? wake_up_atomic_t+0x30/0x30
Mar  9 10:01:04 node01 kernel:  [<ffffffffc0586ea0>] ? gfs2_wake_up_statfs+0x40/0x40 [gfs2]
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5e11>] kthread+0xd1/0xe0
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5d40>] ? insert_kthread_work+0x40/0x40
Mar  9 10:01:04 node01 kernel:  [<ffffffffb9794df7>] ret_from_fork_nospec_begin+0x21/0x21
Mar  9 10:01:04 node01 kernel:  [<ffffffffb90c5d40>] ? insert_kthread_work+0x40/0x40
Mar  9 10:01:04 node01 kernel: ---[ end trace 2b40c3728427f56c ]---

-----

Version-Release number of selected component (if applicable):

kernel 3.10.0-1160.15.2.el7.x86_64

-----

How reproducible:

Intermittent

-----

Steps to Reproduce:

Unknown

-----

Actual results:

Panic and call trace in Description

-----

Expected results:

No panic

-----

Additional info:

Let us know if you need any more info, like filesystem metadata immediately after the panic.