Bug 2308364

Summary: [Special Handling] [GSS] [9.2.z panic] kernel BUG at fs/ceph/addr.c:97!
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: kelwhite
Component: cephAssignee: Xiubo Li <xiubli>
ceph sub component: CephFS QA Contact: Elad <ebenahar>
Status: ASSIGNED --- Docs Contact:
Severity: high    
Priority: high CC: assingh, bkunal, bniver, etamir, gjose, mcaldeir, muagarwa, ngangadh, nojha, ofamera, pdonnell, soakley, sostapov, xiubli
Version: 4.13   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2317519 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description kelwhite 2024-08-28 18:33:33 UTC
Description of problem:

o. Kernel panic at "kernel BUG at fs/ceph/addr.c:97!" - multiple events 

  crash> struct module ffffffffc137ee40 | grep "version\|name"
  name = "ceph",
  version = 0x0,
  srcversion = 0xffff9aa40e1c9560 "61825C8C4400603CEB0D471",
  rhelversion = 0xffff9a9d890183e8 "9.2",


crash> bt
PID: 2146740  TASK: ffff8aac36db0000  CPU: 71   COMMAND: "xxxxxx"
 #0 [ffffb47a335f3900] machine_kexec at ffffffffb966c767
 #1 [ffffb47a335f3958] __crash_kexec at ffffffffb97c58ca
 #2 [ffffb47a335f3a18] crash_kexec at ffffffffb97c6a88
 #3 [ffffb47a335f3a20] oops_end at ffffffffb962921b
 #4 [ffffb47a335f3a40] do_trap at ffffffffb96259ae
 #5 [ffffb47a335f3a90] do_error_trap at ffffffffb9625a65
 #6 [ffffb47a335f3ad0] exc_invalid_op at ffffffffba12d71e
 #7 [ffffb47a335f3af0] asm_exc_invalid_op at ffffffffba200b36
    [exception RIP: ceph_dirty_folio+0x183]
    RIP: ffffffffc1259ce3  RSP: ffffb47a335f3ba0  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff8a7ff6414d98  RCX: 0000000000000002
    RDX: 0000000000000000  RSI: ffffd8efa2324d80  RDI: ffff8a7ff6415048
    RBP: ffffd8efa2324d80   R8: 0000000000000003   R9: ffff8a9c001eb678
    R10: 000000000000038e  R11: 0000000000007240  R12: ffff8a7ff6414f10
    R13: 0000000000000001  R14: ffff8a7ff6415048  R15: ffff8a804b2223d0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffb47a335f3bd0] set_page_dirty_lock at ffffffffb98e340e
 #9 [ffffb47a335f3be8] put_bvecs at ffffffffc125429f [ceph]
#10 [ffffb47a335f3c18] ceph_direct_read_write at ffffffffc1254f3b [ceph]
#11 [ffffb47a335f3d18] ceph_read_iter at ffffffffc1256b38 [ceph]
#12 [ffffb47a335f3de8] new_sync_read at ffffffffb99c8c49
#13 [ffffb47a335f3e90] vfs_read at ffffffffb99cb7bc
#14 [ffffb47a335f3ec8] __x64_sys_pread64 at ffffffffb99cb8a0
#15 [ffffb47a335f3f08] do_syscall_64 at ffffffffba12d159
#16 [ffffb47a335f3f50] entry_SYSCALL_64_after_hwframe at ffffffffba2000dc
    RIP: 00007f3ab25163d7  RSP: 00007f3a4e3451b0  RFLAGS: 00000293
    RAX: ffffffffffffffda  RBX: 0000000000000312  RCX: 00007f3ab25163d7
    RDX: 0000000000800000  RSI: 00007f39e68fc000  RDI: 0000000000000312
    RBP: 00007f39e68fc000   R8: 0000000000000000   R9: 0000000774ddb120
    R10: 0000000000000000  R11: 0000000000000293  R12: 0000000000800000
    R13: 0000000000000000  R14: 00007f3a4e3452d0  R15: 00007f3a54154000
    ORIG_RAX: 0000000000000011  CS: 0033  SS: 002b


Version-Release number of selected component (if applicable):

o. Internal ODF
o. 9.2.z kernel 
o. OCP: 4.14.27
o. ODF: 4.14.6
o. Ceph: 17.2.6-196.el9cp

How reproducible:

o Multiple events now. 

Actual results:

o. Hitting the BUG_ON that triggers the panic.

Expected results:

o. Not hitting the BUG_ON that triggers the panic.