Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 2375864

Summary: [8.1z] [NFS-Ganesha] [BYOK]After Client node reboot during IO, remount suceeds, IO resumed, but mount path becomes inaccessible
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: sumr
Component: NFS-GaneshaAssignee: Sachin Punadikar <spunadik>
NFS-Ganesha sub component: Ceph QA Contact: Manisha Saini <msaini>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: unspecified CC: bkunal, cephqe-warriors, ffilz, kkeithle, nchillar, ngangadh, spunadik
Version: 8.1Keywords: External
Target Milestone: ---   
Target Release: 8.1z2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2379788 (view as bug list) Environment:
Last Closed: 2025-12-03 15:45:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2379788    
Bug Blocks:    

Description sumr 2025-07-02 09:29:49 UTC
Description of problem:
If Client node is rebooted while IO in-progress on NFS mount paths for exports with kmip_key, remount after reboot suceeds, but mount path becomes inaccessible after IO is resumed.None of the below steps suceed at this stage,
- Remount
- Remount with new export
- Remount after NFS service restart : Here MDS error is reported "2 clients failing to respond to capability release"

Logs:

[root@ceph-sumar-nfs-byok-f9le1w-node8 ~]# ceph orch ps --refresh|grep cephfs-nfs
nfs.cephfs-nfs.0.2.ceph-sumar-nfs-byok-f9le1w-node2.rlcnoo  ceph-sumar-nfs-byok-f9le1w-node2            *:2049       running (14m)     2s ago  23m     127M        -  6.5               b8860365707a  4b0b1987ce25  

[root@ceph-sumar-nfs-byok-f9le1w-node8 ~]# ceph fs status
cephfs - 4 clients
======
RANK  STATE                        MDS                           ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs.ceph-sumar-nfs-byok-f9le1w-node4.ipmqnq  Reqs:  403 /s   104k   102k   269   2281   
 1    active  cephfs.ceph-sumar-nfs-byok-f9le1w-node7.gphhce  Reqs:    0 /s    10     13     11      0   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata   747M  93.3G  
cephfs.cephfs.data    data    1310M  93.3G  
                 STANDBY MDS                    
cephfs.ceph-sumar-nfs-byok-f9le1w-node5.ueiwno  
cephfs.ceph-sumar-nfs-byok-f9le1w-node6.lvjjam  
cephfs.ceph-sumar-nfs-byok-f9le1w-node3.fvavbu  
MDS version: ceph version 19.2.1-227.el9cp (f8cd1c06d7d97a64a3ab3705bf22b98b86721cba) squid (stable)
[root@ceph-sumar-nfs-byok-f9le1w-node8 ~]# 
Broadcast message from root@ceph-sumar-nfs-byok-f9le1w-node8 on pts/1 (Wed 2025-07-02 08:08:28 UTC):

The system will reboot now!

[root@ceph-sumar-nfs-byok-f9le1w-node8 ~]# mount -t nfs 10.0.64.200:/cephfs_sv4 /mnt/cephfs_sv4
[root@ceph-sumar-nfs-byok-f9le1w-node8 ~]# mount -t nfs 10.0.64.200:/cephfs_sv5 /mnt/cephfs_sv5
[root@ceph-sumar-nfs-byok-f9le1w-node8 ~]# ceph -s
  cluster:
    id:     54e956dc-4ff2-11f0-bde2-fa163e3d39fc
    health: HEALTH_OK

[root@ceph-sumar-nfs-byok-f9le1w-node8 ~]# ceph fs status
cephfs - 2 clients
======
RANK  STATE                        MDS                           ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  cephfs.ceph-sumar-nfs-byok-f9le1w-node4.ipmqnq  Reqs:    0 /s   106k   102k   267   2162   

[root@ceph-sumar-nfs-byok-f9le1w-node8 ~]# ceph-fuse -n client.admin /mnt/cephfs -r / --client_fs=cephfs
2025-07-02T08:21:44.541+0000 7f348f31d580 -1 init, newargv = 0x55dea15653b0 newargc=15
ceph-fuse[1990]: starting ceph client
ceph-fuse[1990]: starting fuse
[root@ceph-sumar-nfs-byok-f9le1w-node8 ~]# ls -l /mnt/cephfs/volumes/_nogroup/sv4/
.meta                                 70f6e2ef-3e5c-4a77-9df0-14c5790c9aa9/ 
[root@ceph-sumar-nfs-byok-f9le1w-node8 ~]# ls -l /mnt/cephfs/volumes/_nogroup/sv4/70f6e2ef-3e5c-4a77-9df0-14c5790c9aa9/
total 12178
drwxr-xr-x. 5 root root 11141120 Jul  2 08:11 PujuUHAIlcHqkIO5g,JXGBQ9O,9vbItIM0IfzqZnNG8

[root@ceph-sumar-nfs-byok-f9le1w-node8 ~]# cd /mnt/cephfs_sv4


^C^Z^Z^Z^Z^Z^Z^Z^C^C^C^C^C

[root@ceph-sumar-nfs-byok-f9le1w-node8 cephuser]# mount -t nfs 10.0.64.200:/cephfs_sv6 /mnt/cephfs_sv6



^C
[root@ceph-sumar-nfs-byok-f9le1w-node8 cephuser]# ceph orch restart nfs.cephfs-nfs  
Scheduled to restart nfs.cephfs-nfs.0.2.ceph-sumar-nfs-byok-f9le1w-node2.rlcnoo on host 'ceph-sumar-nfs-byok-f9le1w-node2'

[root@ceph-sumar-nfs-byok-f9le1w-node8 cephuser]# mount -t nfs 10.0.64.200:/cephfs_sv6 /mnt/cephfs_sv6
^C
[root@ceph-sumar-nfs-byok-f9le1w-node8 cephuser]# ceph fs subvolume create cephfs sv7;ceph fs subvolume getpath cephfs sv7


^CInterrupted

[root@ceph-sumar-nfs-byok-f9le1w-node8 cephuser]# ceph -s
  cluster:
    id:     54e956dc-4ff2-11f0-bde2-fa163e3d39fc
    health: HEALTH_WARN
            2 clients failing to respond to capability release
            1 MDSs report slow requests
 
  services:
    mon: 3 daemons, quorum ceph-sumar-nfs-byok-f9le1w-node1-installer,ceph-sumar-nfs-byok-f9le1w-node3,ceph-sumar-nfs-byok-f9le1w-node2 (age 25h)
    mgr: ceph-sumar-nfs-byok-f9le1w-node1-installer.womqwd(active, since 2d), standbys: ceph-sumar-nfs-byok-f9le1w-node2.mpubty
    mds: 2/2 daemons up, 3 standby
    osd: 16 osds: 16 up (since 96m), 16 in (since 9d)
 
  data:
    volumes: 1/1 healthy
    pools:   5 pools, 593 pgs
    objects: 102.16k objects, 655 MiB
    usage:   14 GiB used, 306 GiB / 320 GiB avail
    pgs:     593 active+clean
 
  io:
    client:   85 B/s rd, 0 op/s rd, 0 op/s wr
 
[root@ceph-sumar-nfs-byok-f9le1w-node8 cephuser]# ceph -s
  cluster:
    id:     54e956dc-4ff2-11f0-bde2-fa163e3d39fc
    health: HEALTH_WARN
            2 clients failing to respond to capability release
            1 MDSs report slow requests
 
  services:
    mon: 3 daemons, quorum ceph-sumar-nfs-byok-f9le1w-node1-installer,ceph-sumar-nfs-byok-f9le1w-node3,ceph-sumar-nfs-byok-f9le1w-node2 (age 25h)
    mgr: ceph-sumar-nfs-byok-f9le1w-node1-installer.womqwd(active, since 2d), standbys: ceph-sumar-nfs-byok-f9le1w-node2.mpubty
    mds: 2/2 daemons up, 3 standby
    osd: 16 osds: 16 up (since 97m), 16 in (since 9d)
 
  data:
    volumes: 1/1 healthy
    pools:   5 pools, 593 pgs
    objects: 102.16k objects, 655 MiB
    usage:   14 GiB used, 306 GiB / 320 GiB avail
    pgs:     593 active+clean
 
  io:
    client:   85 B/s rd, 0 op/s rd, 0 op/s wr

[root@ceph-sumar-nfs-byok-f9le1w-node8 cephuser]# ceph health detail
HEALTH_OK

[root@ceph-sumar-nfs-byok-f9le1w-node8 cephuser]# ceph nfs export create cephfs cephfs-nfs /cephfs_sv7 cephfs --path /volumes/_nogroup/sv7/22e81ad9-ab4f-4cd4-8b49-0e0a10bf98f3
{
  "bind": "/cephfs_sv7",
  "cluster": "cephfs-nfs",
  "fs": "cephfs",
  "mode": "RW",
  "path": "/volumes/_nogroup/sv7/22e81ad9-ab4f-4cd4-8b49-0e0a10bf98f3"
}
[root@ceph-sumar-nfs-byok-f9le1w-node8 cephuser]# mount -t nfs 10.0.64.200:/cephfs_sv6 /mnt/cephfs_sv6

^C

[root@ceph-sumar-nfs-byok-f9le1w-node8 cephuser]# mkdir /mnt/cephfs_sv7
[root@ceph-sumar-nfs-byok-f9le1w-node8 cephuser]# mount -t nfs 10.0.64.200:/cephfs_sv7 /mnt/cephfs_sv7

Version-Release number of selected component (if applicable):19.2.1-227.el9cp


How reproducible:


Steps to Reproduce:
1. Run IO on NFS mount created with exports having kmip_key
2. Reboot Client node running IO
3. After reboot and remount of NFS exports, verify IO can resume

Actual results: IO resume fails, mount path after remount becomes inaccessible. After NFS service restart, mount on new export also doesn't suceed


Expected results: IO should resume after Client node reboot on nfs mount path, mount path should remain accessible


Additional info: Will share NFS DBG logs in some time