2249814 – userspace cephfs client can crash when upgrading from RHCS 6 to 7 (or from RHCS 5 -> 6)

Bug 2249814 - userspace cephfs client can crash when upgrading from RHCS 6 to 7 (or from RHCS 5 -> 6)

Summary: userspace cephfs client can crash when upgrading from RHCS 6 to 7 (or from RH...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	CephFS
Sub Component:
Version:	6.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	6.1z3
Assignee:	Venky Shankar
QA Contact:	sumr
Docs Contact:	Disha Walvekar
URL:
Whiteboard:
Depends On:	2247174
Blocks:	2247624 2249844
TreeView+	depends on / blocked

Reported:	2023-11-15 13:13 UTC by Venky Shankar
Modified:	2023-12-12 13:56 UTC (History)
CC List:	10 users (show)
Fixed In Version:	ceph-17.2.6-161.el9cp
Doc Type:	Bug Fix
Doc Text:	.User-space Ceph File System (CephFS) works as expected post upgrade Previously, the user-space CephFS client would sometimes crash during a cluster upgrade. This would occur due to stale feature bits on the MDS side that were held on the user-space side. With this fix, ensure that the user-space CephFS client has updated MDS feature bits that allow the clients to work as expected after a cluster upgrade.
Clone Of:	2247174
Environment:
Last Closed:	2023-12-12 13:56:10 UTC
Embargoed:
Dependent Products:
Flags:	dwalveka: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	55240	None	None	None	2023-11-23 13:49:39 UTC
Ceph Project Bug Tracker	63188	None	None	None	2023-11-15 13:13:03 UTC
Red Hat Issue Tracker	RHCEPH-7918	None	None	None	2023-11-15 13:14:14 UTC
Red Hat Product Errata	RHSA-2023:7740	None	None	None	2023-12-12 13:56:16 UTC

Description Venky Shankar 2023-11-15 13:13:04 UTC

+++ This bug was initially created as a clone of Bug #2247174 +++

Description of problem:

Details here: https://tracker.ceph.com/issues/63188


How reproducible:

Very likely.

Steps to Reproduce:
1. Start with RHCS 5 client and MDS

2. Upgrade the (userspace) client to RHCS 7 (say, ceph-mgr)

3. Continue using the client

4. Upgrade the MDS (one active MDS should suffice)

5. The userspace client (ceph-mgr in this case) should crash.

--- Additional comment from Venky Shankar on 2023-10-31 05:34:08 UTC ---

Start with RHCS 6.1 client/MDS.

--- Additional comment from Venky Shankar on 2023-10-31 05:43:33 UTC ---

https://gitlab.cee.redhat.com/ceph/ceph/-/merge_requests/408

--- Additional comment from Venky Shankar on 2023-10-31 14:54:31 UTC ---

(In reply to Venky Shankar from comment #2)
> https://gitlab.cee.redhat.com/ceph/ceph/-/merge_requests/408

Merged.

--- Additional comment from  on 2023-10-31 19:45:27 UTC ---

Builds are ready for testing. We need a qa_ack+ in order to attach this BZ to the errata advisory and move to ON_QA.

--- Additional comment from errata-xmlrpc on 2023-11-01 04:52:19 UTC ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2023:118213-01
https://errata.devel.redhat.com/advisory/118213

--- Additional comment from errata-xmlrpc on 2023-11-01 04:52:26 UTC ---

This bug has been added to advisory RHBA-2023:118213 by Thomas Serlin (tserlin)

--- Additional comment from  on 2023-11-02 11:12:55 UTC ---

QA Test Plan:
- Repeat steps used to reproduce,

1. Setup 5.3 RHCS with CephFS config. Run IO.
2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
3. Upgrade Ceph Nodes to RHCS 7 build with fix.
4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can continue.

--- Additional comment from Venky Shankar on 2023-11-03 05:25:20 UTC ---

(In reply to sumr from comment #7)
> QA Test Plan:
> - Repeat steps used to reproduce,
> 
> 1. Setup 5.3 RHCS with CephFS config. Run IO.

Use the latest RHCS6 please.

> 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> continue.

--- Additional comment from  on 2023-11-07 12:32:54 UTC ---

(In reply to Venky Shankar from comment #8)
> (In reply to sumr from comment #7)
> > QA Test Plan:
> > - Repeat steps used to reproduce,
> > 
> > 1. Setup 5.3 RHCS with CephFS config. Run IO.
> 
> Use the latest RHCS6 please.
> 
> > 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> > 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> > 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> > continue.

Hi Venky,

Executed test steps planned for fix verification as mentioned above with upgrade from RHCS 6 build 17.2.6-153 to RHCS 7 build 18.2.0-118.

Test Steps:
1. Setup latest RHCS 6 with CephFS config. Run IO.
2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
3. Upgrade Ceph Nodes to RHCS 7 build with fix.
4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
continue.

Result summary : 
1.Ceph is healthy after upgrade but not immediately. As immediately upon upgrade Ceph was in HEALTH_WARN due to filesystem degraded state, but after few minutes with Recovery, Ceph and MDS are Healthy.
2.Existing Ceph-Fuse mount point got stale due to error "Cannot send after transport endpoint shutdown", Remount to new mount point was required to continue IO. IO could be continued with new mount point. No other error or crash seen on Cluster or Client side.

ASK : Please confirm if the behaviour seen post-upgrade is acceptable as MDS auto recovers and, Client gets blocklisted, but IO continues on new mount.

Complete Logs: http://magna002.ceph.redhat.com/ceph-qe-logs/suma/bz_verify/bz_2247174_verification.log

Snippet for post-upgrade state:

Cluster view:

Ceph status immediately after upgrade,

2023-11-07 05:26:34,479 (cephci.cephadm.test_cephadm_upgrade) [INFO] - cephci.ceph.ceph.py:725 -   cluster:
    id:     43b73854-7d47-11ee-9931-fa163ef75022
    health: HEALTH_WARN
            1 filesystem is degraded
            1 filesystem is online with fewer MDS than max_mds
            Degraded data redundancy: 34/6081 objects degraded (0.559%), 3 pgs degraded
 
  services:
    mon: 3 daemons, quorum ceph-sumar-regression-9s46lr-node1-installer,ceph-sumar-regression-9s46lr-node3,ceph-sumar-regression-9s46lr-node2 (age 8m)
    mgr: ceph-sumar-regression-9s46lr-node1-installer.flqshy(active, since 8m), standbys: ceph-sumar-regression-9s46lr-node2.vaqqrv
    mds: 1/1 daemons up, 2 standby
    osd: 12 osds: 12 up (since 75s), 12 in (since 113m)
 
  data:
    volumes: 0/1 healthy, 1 recovering
    pools:   4 pools, 135 pgs
    objects: 2.03k objects, 5.4 GiB
    usage:   25 GiB used, 155 GiB / 180 GiB avail
    pgs:     34/6081 objects degraded (0.559%)
             131 active+clean
             3   active+recovery_wait+degraded
             1   active+recovering

2023-11-07 05:26:56,808 (cephci.cephadm.test_cephadm_upgrade) [INFO] - cephci.ceph.ceph_admin.__init__.py:242 - service_type: mds
service_id: cephfs
service_name: mds.cephfs
placement:
  label: mds
status:
  created: '2023-11-07T08:33:53.741699Z'
  last_refresh: '2023-11-07T10:25:48.333470Z'
  running: 3
  size: 3

[root@ceph-sumar-regression-9s46lr-node6 cephfs]# ceph status
  cluster:
    id:     43b73854-7d47-11ee-9931-fa163ef75022
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-sumar-regression-9s46lr-node1-installer,ceph-sumar-regression-9s46lr-node3,ceph-sumar-regression-9s46lr-node2 (age 28m)
    mgr: ceph-sumar-regression-9s46lr-node1-installer.flqshy(active, since 28m), standbys: ceph-sumar-regression-9s46lr-node2.vaqqrv
    mds: 2/2 daemons up, 1 standby
    osd: 12 osds: 12 up (since 21m), 12 in (since 2h)
 
  data:
    volumes: 1/1 healthy
    pools:   3 pools, 49 pgs
    objects: 1.04k objects, 1.6 GiB
    usage:   13 GiB used, 167 GiB / 180 GiB avail
    pgs:     49 active+clean

Client View:

[root@ceph-sumar-regression-9s46lr-node6 cephfs]# ls
ls: cannot open directory '.': Cannot send after transport endpoint shutdown
[root@ceph-sumar-regression-9s46lr-node6 ~]# ceph-fuse -n client.ceph-sumar-regression-9s46lr-node6 --client_fs cephfs /mnt/cephfs_1
2023-11-07T06:46:39.327-0500 7f64e6e39480 -1 init, newargv = 0x5605dc8c5f60 newargc=15
ceph-fuse[42555]: starting ceph client
ceph-fuse[42555]: starting fuse
[root@ceph-sumar-regression-9s46lr-node6 ~]# cd /mnt/cephfs_1
[root@ceph-sumar-regression-9s46lr-node6 cephfs_1]# ls
fio_file_512M      smallfile_dir18  smallfile_dir30   smallfile_dir311  smallfile_dir323  smallfile_dir335  smallfile_dir347  smallfile_dir359  smallfile_dir46  smallfile_dir58  smallfile_dir7   smallfile_dir81  smallfile_dir93

--- Additional comment from  on 2023-11-08 05:47:22 UTC ---

(In reply to sumr from comment #9)
> (In reply to Venky Shankar from comment #8)
> > (In reply to sumr from comment #7)
> > > QA Test Plan:
> > > - Repeat steps used to reproduce,
> > > 
> > > 1. Setup 5.3 RHCS with CephFS config. Run IO.
> > 
> > Use the latest RHCS6 please.
> > 
> > > 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> > > 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> > > 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> > > continue.
> 
> Hi Venky,
> 
> Executed test steps planned for fix verification as mentioned above with
> upgrade from RHCS 6 build 17.2.6-153 to RHCS 7 build 18.2.0-118.
> 
> Test Steps:
> 1. Setup latest RHCS 6 with CephFS config. Run IO.
> 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> continue.
> 
> Result summary : 
> 1.Ceph is healthy after upgrade but not immediately. As immediately upon
> upgrade Ceph was in HEALTH_WARN due to filesystem degraded state, but after
> few minutes with Recovery, Ceph and MDS are Healthy.
> 2.Existing Ceph-Fuse mount point got stale due to error "Cannot send after
> transport endpoint shutdown", Remount to new mount point was required to
> continue IO. IO could be continued with new mount point. No other error or
> crash seen on Cluster or Client side.
> 
> ASK : Please confirm if the behaviour seen post-upgrade is acceptable as MDS
> auto recovers and, Client gets blocklisted, but IO continues on new mount.
> 
> Complete Logs:
> http://magna002.ceph.redhat.com/ceph-qe-logs/suma/bz_verify/
> bz_2247174_verification.log
> 
> Snippet for post-upgrade state:
> 
> Cluster view:
> 
> Ceph status immediately after upgrade,
> 
> 2023-11-07 05:26:34,479 (cephci.cephadm.test_cephadm_upgrade) [INFO] -
> cephci.ceph.ceph.py:725 -   cluster:
>     id:     43b73854-7d47-11ee-9931-fa163ef75022
>     health: HEALTH_WARN
>             1 filesystem is degraded
>             1 filesystem is online with fewer MDS than max_mds
>             Degraded data redundancy: 34/6081 objects degraded (0.559%), 3
> pgs degraded
>  
>   services:
>     mon: 3 daemons, quorum
> ceph-sumar-regression-9s46lr-node1-installer,ceph-sumar-regression-9s46lr-
> node3,ceph-sumar-regression-9s46lr-node2 (age 8m)
>     mgr: ceph-sumar-regression-9s46lr-node1-installer.flqshy(active, since
> 8m), standbys: ceph-sumar-regression-9s46lr-node2.vaqqrv
>     mds: 1/1 daemons up, 2 standby
>     osd: 12 osds: 12 up (since 75s), 12 in (since 113m)
>  
>   data:
>     volumes: 0/1 healthy, 1 recovering
>     pools:   4 pools, 135 pgs
>     objects: 2.03k objects, 5.4 GiB
>     usage:   25 GiB used, 155 GiB / 180 GiB avail
>     pgs:     34/6081 objects degraded (0.559%)
>              131 active+clean
>              3   active+recovery_wait+degraded
>              1   active+recovering
> 
> 2023-11-07 05:26:56,808 (cephci.cephadm.test_cephadm_upgrade) [INFO] -
> cephci.ceph.ceph_admin.__init__.py:242 - service_type: mds
> service_id: cephfs
> service_name: mds.cephfs
> placement:
>   label: mds
> status:
>   created: '2023-11-07T08:33:53.741699Z'
>   last_refresh: '2023-11-07T10:25:48.333470Z'
>   running: 3
>   size: 3
> 
> [root@ceph-sumar-regression-9s46lr-node6 cephfs]# ceph status
>   cluster:
>     id:     43b73854-7d47-11ee-9931-fa163ef75022
>     health: HEALTH_OK
>  
>   services:
>     mon: 3 daemons, quorum
> ceph-sumar-regression-9s46lr-node1-installer,ceph-sumar-regression-9s46lr-
> node3,ceph-sumar-regression-9s46lr-node2 (age 28m)
>     mgr: ceph-sumar-regression-9s46lr-node1-installer.flqshy(active, since
> 28m), standbys: ceph-sumar-regression-9s46lr-node2.vaqqrv
>     mds: 2/2 daemons up, 1 standby
>     osd: 12 osds: 12 up (since 21m), 12 in (since 2h)
>  
>   data:
>     volumes: 1/1 healthy
>     pools:   3 pools, 49 pgs
>     objects: 1.04k objects, 1.6 GiB
>     usage:   13 GiB used, 167 GiB / 180 GiB avail
>     pgs:     49 active+clean
> 
> Client View:
> 
> [root@ceph-sumar-regression-9s46lr-node6 cephfs]# ls
> ls: cannot open directory '.': Cannot send after transport endpoint shutdown
> [root@ceph-sumar-regression-9s46lr-node6 ~]# ceph-fuse -n
> client.ceph-sumar-regression-9s46lr-node6 --client_fs cephfs /mnt/cephfs_1
> 2023-11-07T06:46:39.327-0500 7f64e6e39480 -1 init, newargv = 0x5605dc8c5f60
> newargc=15
> ceph-fuse[42555]: starting ceph client
> ceph-fuse[42555]: starting fuse
> [root@ceph-sumar-regression-9s46lr-node6 ~]# cd /mnt/cephfs_1
> [root@ceph-sumar-regression-9s46lr-node6 cephfs_1]# ls
> fio_file_512M      smallfile_dir18  smallfile_dir30   smallfile_dir311 
> smallfile_dir323  smallfile_dir335  smallfile_dir347  smallfile_dir359 
> smallfile_dir46  smallfile_dir58  smallfile_dir7   smallfile_dir81 
> smallfile_dir93

I have added client side logs for further debugging,

logs - http://magna002.ceph.redhat.com/ceph-qe-logs/suma/bz_verify/system_logs/

snippet:

2023-11-07T05:26:58.840-0500 7f3544ff9640 -1 client.24439 I was blocklisted at osd epoch 471

INFO: task ceph-fuse:42425 blocked for more than 1228 seconds.
[13394.110644]       Not tainted 5.14.0-284.30.1.el9_2.x86_64 #1
[13394.111298] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13394.112095] task:ceph-fuse       state:D stack:    0 pid:42425 ppid:     1 flags:0x00004006
[13394.112941] Call Trace:
[13394.113372]  <TASK>
[13394.113862]  __schedule+0x248/0x620
[13394.114402]  schedule+0x2d/0x60
[13394.114918]  request_wait_answer+0x131/0x220 [fuse]
[13394.115519]  ? cpuacct_percpu_seq_show+0x10/0x10
[13394.116108]  fuse_simple_request+0x19f/0x310 [fuse]
[13394.116808]  fuse_statfs+0xd8/0x140 [fuse]
[13394.117380]  statfs_by_dentry+0x64/0x90
[13394.117971]  user_statfs+0x57/0xc0
[13394.118461]  __do_sys_statfs+0x20/0x60
[13394.119003]  do_syscall_64+0x59/0x90
[13394.119537]  ? exc_page_fault+0x62/0x150
[13394.120154]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

--- Additional comment from Venky Shankar on 2023-11-09 05:56:48 UTC ---

(In reply to sumr from comment #9)
> (In reply to Venky Shankar from comment #8)
> > (In reply to sumr from comment #7)
> > > QA Test Plan:
> > > - Repeat steps used to reproduce,
> > > 
> > > 1. Setup 5.3 RHCS with CephFS config. Run IO.
> > 
> > Use the latest RHCS6 please.
> > 
> > > 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> > > 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> > > 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> > > continue.
> 
> Hi Venky,
> 
> Executed test steps planned for fix verification as mentioned above with
> upgrade from RHCS 6 build 17.2.6-153 to RHCS 7 build 18.2.0-118.
> 
> Test Steps:
> 1. Setup latest RHCS 6 with CephFS config. Run IO.
> 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> continue.
> 
> Result summary : 
> 1.Ceph is healthy after upgrade but not immediately. As immediately upon
> upgrade Ceph was in HEALTH_WARN due to filesystem degraded state, but after
> few minutes with Recovery, Ceph and MDS are Healthy.
> 2.Existing Ceph-Fuse mount point got stale due to error "Cannot send after
> transport endpoint shutdown", Remount to new mount point was required to
> continue IO. IO could be continued with new mount point. No other error or
> crash seen on Cluster or Client side.
> 
> ASK : Please confirm if the behaviour seen post-upgrade is acceptable as MDS
> auto recovers and, Client gets blocklisted, but IO continues on new mount.

Do you see this behaviour in other upgrade tests?

[...]

> I have added client side logs for further debugging,
> 
> logs -
> http://magna002.ceph.redhat.com/ceph-qe-logs/suma/bz_verify/system_logs/
> 
> snippet:
> 
> 2023-11-07T05:26:58.840-0500 7f3544ff9640 -1 client.24439 I was blocklisted
> at osd epoch 471
> 
> INFO: task ceph-fuse:42425 blocked for more than 1228 seconds.
> [13394.110644]       Not tainted 5.14.0-284.30.1.el9_2.x86_64 #1
> [13394.111298] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [13394.112095] task:ceph-fuse       state:D stack:    0 pid:42425 ppid:    
> 1 flags:0x00004006
> [13394.112941] Call Trace:
> [13394.113372]  <TASK>
> [13394.113862]  __schedule+0x248/0x620
> [13394.114402]  schedule+0x2d/0x60
> [13394.114918]  request_wait_answer+0x131/0x220 [fuse]
> [13394.115519]  ? cpuacct_percpu_seq_show+0x10/0x10
> [13394.116108]  fuse_simple_request+0x19f/0x310 [fuse]
> [13394.116808]  fuse_statfs+0xd8/0x140 [fuse]
> [13394.117380]  statfs_by_dentry+0x64/0x90
> [13394.117971]  user_statfs+0x57/0xc0
> [13394.118461]  __do_sys_statfs+0x20/0x60
> [13394.119003]  do_syscall_64+0x59/0x90
> [13394.119537]  ? exc_page_fault+0x62/0x150
> [13394.120154]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Seems like the mount was unresponsive and that caused the MDS to blocklist the client. Do you see other (kernel) client getting blocklisted? Was only this (ceph-fuse) client being used for IO during upgrade?

--- Additional comment from  on 2023-11-09 06:20:58 UTC ---

(In reply to Venky Shankar from comment #11)
> (In reply to sumr from comment #9)
> > (In reply to Venky Shankar from comment #8)
> > > (In reply to sumr from comment #7)
> > > > QA Test Plan:
> > > > - Repeat steps used to reproduce,
> > > > 
> > > > 1. Setup 5.3 RHCS with CephFS config. Run IO.
> > > 
> > > Use the latest RHCS6 please.
> > > 
> > > > 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> > > > 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> > > > 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> > > > continue.
> > 
> > Hi Venky,
> > 
> > Executed test steps planned for fix verification as mentioned above with
> > upgrade from RHCS 6 build 17.2.6-153 to RHCS 7 build 18.2.0-118.
> > 
> > Test Steps:
> > 1. Setup latest RHCS 6 with CephFS config. Run IO.
> > 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> > 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> > 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> > continue.
> > 
> > Result summary : 
> > 1.Ceph is healthy after upgrade but not immediately. As immediately upon
> > upgrade Ceph was in HEALTH_WARN due to filesystem degraded state, but after
> > few minutes with Recovery, Ceph and MDS are Healthy.
> > 2.Existing Ceph-Fuse mount point got stale due to error "Cannot send after
> > transport endpoint shutdown", Remount to new mount point was required to
> > continue IO. IO could be continued with new mount point. No other error or
> > crash seen on Cluster or Client side.
> > 
> > ASK : Please confirm if the behaviour seen post-upgrade is acceptable as MDS
> > auto recovers and, Client gets blocklisted, but IO continues on new mount.
> 
> Do you see this behaviour in other upgrade tests?
> 
> [...]
> 
> > I have added client side logs for further debugging,
> > 
> > logs -
> > http://magna002.ceph.redhat.com/ceph-qe-logs/suma/bz_verify/system_logs/
> > 
> > snippet:
> > 
> > 2023-11-07T05:26:58.840-0500 7f3544ff9640 -1 client.24439 I was blocklisted
> > at osd epoch 471
> > 
> > INFO: task ceph-fuse:42425 blocked for more than 1228 seconds.
> > [13394.110644]       Not tainted 5.14.0-284.30.1.el9_2.x86_64 #1
> > [13394.111298] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [13394.112095] task:ceph-fuse       state:D stack:    0 pid:42425 ppid:    
> > 1 flags:0x00004006
> > [13394.112941] Call Trace:
> > [13394.113372]  <TASK>
> > [13394.113862]  __schedule+0x248/0x620
> > [13394.114402]  schedule+0x2d/0x60
> > [13394.114918]  request_wait_answer+0x131/0x220 [fuse]
> > [13394.115519]  ? cpuacct_percpu_seq_show+0x10/0x10
> > [13394.116108]  fuse_simple_request+0x19f/0x310 [fuse]
> > [13394.116808]  fuse_statfs+0xd8/0x140 [fuse]
> > [13394.117380]  statfs_by_dentry+0x64/0x90
> > [13394.117971]  user_statfs+0x57/0xc0
> > [13394.118461]  __do_sys_statfs+0x20/0x60
> > [13394.119003]  do_syscall_64+0x59/0x90
> > [13394.119537]  ? exc_page_fault+0x62/0x150
> > [13394.120154]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> Seems like the mount was unresponsive and that caused the MDS to blocklist
> the client. Do you see other (kernel) client getting blocklisted? Was only
> this (ceph-fuse) client being used for IO during upgrade?

> Do you see this behaviour in other upgrade tests?
No, existing upgrade regression tests do IO during Ceph cluster upgrade and Ceph Status is Healthy with IO. In this case, the only new step was Client been upgraded prior to upgrade with IO.

Do you see other (kernel) client getting blocklisted?
Only Ceph-fuse mount was covered, kernel mount was not created. If you don't need this system for further debugging I can rerun the same QA steps with kernel mount too.

--- Additional comment from Venky Shankar on 2023-11-09 06:30:21 UTC ---

(In reply to sumr from comment #12)
> (In reply to Venky Shankar from comment #11)
> > (In reply to sumr from comment #9)
> > > (In reply to Venky Shankar from comment #8)
> > > > (In reply to sumr from comment #7)
> > > > > QA Test Plan:
> > > > > - Repeat steps used to reproduce,
> > > > > 
> > > > > 1. Setup 5.3 RHCS with CephFS config. Run IO.
> > > > 
> > > > Use the latest RHCS6 please.
> > > > 
> > > > > 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> > > > > 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> > > > > 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> > > > > continue.
> > > 
> > > Hi Venky,
> > > 
> > > Executed test steps planned for fix verification as mentioned above with
> > > upgrade from RHCS 6 build 17.2.6-153 to RHCS 7 build 18.2.0-118.
> > > 
> > > Test Steps:
> > > 1. Setup latest RHCS 6 with CephFS config. Run IO.
> > > 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> > > 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> > > 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> > > continue.
> > > 
> > > Result summary : 
> > > 1.Ceph is healthy after upgrade but not immediately. As immediately upon
> > > upgrade Ceph was in HEALTH_WARN due to filesystem degraded state, but after
> > > few minutes with Recovery, Ceph and MDS are Healthy.
> > > 2.Existing Ceph-Fuse mount point got stale due to error "Cannot send after
> > > transport endpoint shutdown", Remount to new mount point was required to
> > > continue IO. IO could be continued with new mount point. No other error or
> > > crash seen on Cluster or Client side.
> > > 
> > > ASK : Please confirm if the behaviour seen post-upgrade is acceptable as MDS
> > > auto recovers and, Client gets blocklisted, but IO continues on new mount.
> > 
> > Do you see this behaviour in other upgrade tests?
> > 
> > [...]
> > 
> > > I have added client side logs for further debugging,
> > > 
> > > logs -
> > > http://magna002.ceph.redhat.com/ceph-qe-logs/suma/bz_verify/system_logs/
> > > 
> > > snippet:
> > > 
> > > 2023-11-07T05:26:58.840-0500 7f3544ff9640 -1 client.24439 I was blocklisted
> > > at osd epoch 471
> > > 
> > > INFO: task ceph-fuse:42425 blocked for more than 1228 seconds.
> > > [13394.110644]       Not tainted 5.14.0-284.30.1.el9_2.x86_64 #1
> > > [13394.111298] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > > this message.
> > > [13394.112095] task:ceph-fuse       state:D stack:    0 pid:42425 ppid:    
> > > 1 flags:0x00004006
> > > [13394.112941] Call Trace:
> > > [13394.113372]  <TASK>
> > > [13394.113862]  __schedule+0x248/0x620
> > > [13394.114402]  schedule+0x2d/0x60
> > > [13394.114918]  request_wait_answer+0x131/0x220 [fuse]
> > > [13394.115519]  ? cpuacct_percpu_seq_show+0x10/0x10
> > > [13394.116108]  fuse_simple_request+0x19f/0x310 [fuse]
> > > [13394.116808]  fuse_statfs+0xd8/0x140 [fuse]
> > > [13394.117380]  statfs_by_dentry+0x64/0x90
> > > [13394.117971]  user_statfs+0x57/0xc0
> > > [13394.118461]  __do_sys_statfs+0x20/0x60
> > > [13394.119003]  do_syscall_64+0x59/0x90
> > > [13394.119537]  ? exc_page_fault+0x62/0x150
> > > [13394.120154]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > 
> > Seems like the mount was unresponsive and that caused the MDS to blocklist
> > the client. Do you see other (kernel) client getting blocklisted? Was only
> > this (ceph-fuse) client being used for IO during upgrade?
> 
> > Do you see this behaviour in other upgrade tests?
> No, existing upgrade regression tests do IO during Ceph cluster upgrade and
> Ceph Status is Healthy with IO. In this case, the only new step was Client
> been upgraded prior to upgrade with IO.
> 
> Do you see other (kernel) client getting blocklisted?
> Only Ceph-fuse mount was covered, kernel mount was not created. If you don't
> need this system for further debugging I can rerun the same QA steps with
> kernel mount too.

Yes, please. Additionally, you could test using both RHCS 5/6 builds. See if this (blocklisted client) is consistently reproducible. Also, use both the user-space and the kernel driver in the test.

--- Additional comment from  on 2023-11-09 12:13:32 UTC ---

(In reply to Venky Shankar from comment #13)
> (In reply to sumr from comment #12)
> > (In reply to Venky Shankar from comment #11)
> > > (In reply to sumr from comment #9)
> > > > (In reply to Venky Shankar from comment #8)
> > > > > (In reply to sumr from comment #7)
> > > > > > QA Test Plan:
> > > > > > - Repeat steps used to reproduce,
> > > > > > 
> > > > > > 1. Setup 5.3 RHCS with CephFS config. Run IO.
> > > > > 
> > > > > Use the latest RHCS6 please.
> > > > > 
> > > > > > 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> > > > > > 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> > > > > > 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> > > > > > continue.
> > > > 
> > > > Hi Venky,
> > > > 
> > > > Executed test steps planned for fix verification as mentioned above with
> > > > upgrade from RHCS 6 build 17.2.6-153 to RHCS 7 build 18.2.0-118.
> > > > 
> > > > Test Steps:
> > > > 1. Setup latest RHCS 6 with CephFS config. Run IO.
> > > > 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> > > > 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> > > > 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> > > > continue.
> > > > 
> > > > Result summary : 
> > > > 1.Ceph is healthy after upgrade but not immediately. As immediately upon
> > > > upgrade Ceph was in HEALTH_WARN due to filesystem degraded state, but after
> > > > few minutes with Recovery, Ceph and MDS are Healthy.
> > > > 2.Existing Ceph-Fuse mount point got stale due to error "Cannot send after
> > > > transport endpoint shutdown", Remount to new mount point was required to
> > > > continue IO. IO could be continued with new mount point. No other error or
> > > > crash seen on Cluster or Client side.
> > > > 
> > > > ASK : Please confirm if the behaviour seen post-upgrade is acceptable as MDS
> > > > auto recovers and, Client gets blocklisted, but IO continues on new mount.
> > > 
> > > Do you see this behaviour in other upgrade tests?
> > > 
> > > [...]
> > > 
> > > > I have added client side logs for further debugging,
> > > > 
> > > > logs -
> > > > http://magna002.ceph.redhat.com/ceph-qe-logs/suma/bz_verify/system_logs/
> > > > 
> > > > snippet:
> > > > 
> > > > 2023-11-07T05:26:58.840-0500 7f3544ff9640 -1 client.24439 I was blocklisted
> > > > at osd epoch 471
> > > > 
> > > > INFO: task ceph-fuse:42425 blocked for more than 1228 seconds.
> > > > [13394.110644]       Not tainted 5.14.0-284.30.1.el9_2.x86_64 #1
> > > > [13394.111298] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > > > this message.
> > > > [13394.112095] task:ceph-fuse       state:D stack:    0 pid:42425 ppid:    
> > > > 1 flags:0x00004006
> > > > [13394.112941] Call Trace:
> > > > [13394.113372]  <TASK>
> > > > [13394.113862]  __schedule+0x248/0x620
> > > > [13394.114402]  schedule+0x2d/0x60
> > > > [13394.114918]  request_wait_answer+0x131/0x220 [fuse]
> > > > [13394.115519]  ? cpuacct_percpu_seq_show+0x10/0x10
> > > > [13394.116108]  fuse_simple_request+0x19f/0x310 [fuse]
> > > > [13394.116808]  fuse_statfs+0xd8/0x140 [fuse]
> > > > [13394.117380]  statfs_by_dentry+0x64/0x90
> > > > [13394.117971]  user_statfs+0x57/0xc0
> > > > [13394.118461]  __do_sys_statfs+0x20/0x60
> > > > [13394.119003]  do_syscall_64+0x59/0x90
> > > > [13394.119537]  ? exc_page_fault+0x62/0x150
> > > > [13394.120154]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > 
> > > Seems like the mount was unresponsive and that caused the MDS to blocklist
> > > the client. Do you see other (kernel) client getting blocklisted? Was only
> > > this (ceph-fuse) client being used for IO during upgrade?
> > 
> > > Do you see this behaviour in other upgrade tests?
> > No, existing upgrade regression tests do IO during Ceph cluster upgrade and
> > Ceph Status is Healthy with IO. In this case, the only new step was Client
> > been upgraded prior to upgrade with IO.
> > 
> > Do you see other (kernel) client getting blocklisted?
> > Only Ceph-fuse mount was covered, kernel mount was not created. If you don't
> > need this system for further debugging I can rerun the same QA steps with
> > kernel mount too.
> 
> Yes, please. Additionally, you could test using both RHCS 5/6 builds. See if
> this (blocklisted client) is consistently reproducible. Also, use both the
> user-space and the kernel driver in the test.

Repro is in-progress, I tried once but repro attempt was not successful i.e., Cluster was healthy after upgrade.
Retrying again, will copy the logs to magna server when its accessible.

--- Additional comment from  on 2023-11-13 09:47:52 UTC ---

(In reply to sumr from comment #14)
> (In reply to Venky Shankar from comment #13)
> > (In reply to sumr from comment #12)
> > > (In reply to Venky Shankar from comment #11)
> > > > (In reply to sumr from comment #9)
> > > > > (In reply to Venky Shankar from comment #8)
> > > > > > (In reply to sumr from comment #7)
> > > > > > > QA Test Plan:
> > > > > > > - Repeat steps used to reproduce,
> > > > > > > 
> > > > > > > 1. Setup 5.3 RHCS with CephFS config. Run IO.
> > > > > > 
> > > > > > Use the latest RHCS6 please.
> > > > > > 
> > > > > > > 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> > > > > > > 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> > > > > > > 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> > > > > > > continue.
> > > > > 
> > > > > Hi Venky,
> > > > > 
> > > > > Executed test steps planned for fix verification as mentioned above with
> > > > > upgrade from RHCS 6 build 17.2.6-153 to RHCS 7 build 18.2.0-118.
> > > > > 
> > > > > Test Steps:
> > > > > 1. Setup latest RHCS 6 with CephFS config. Run IO.
> > > > > 2. Upgrade Ceph Client to RHCS 7 build with fix, continue IO
> > > > > 3. Upgrade Ceph Nodes to RHCS 7 build with fix.
> > > > > 4. Verify Ceph is Healthy, no crash seen with Client Ceph-mgr and IO can
> > > > > continue.
> > > > > 
> > > > > Result summary : 
> > > > > 1.Ceph is healthy after upgrade but not immediately. As immediately upon
> > > > > upgrade Ceph was in HEALTH_WARN due to filesystem degraded state, but after
> > > > > few minutes with Recovery, Ceph and MDS are Healthy.
> > > > > 2.Existing Ceph-Fuse mount point got stale due to error "Cannot send after
> > > > > transport endpoint shutdown", Remount to new mount point was required to
> > > > > continue IO. IO could be continued with new mount point. No other error or
> > > > > crash seen on Cluster or Client side.
> > > > > 
> > > > > ASK : Please confirm if the behaviour seen post-upgrade is acceptable as MDS
> > > > > auto recovers and, Client gets blocklisted, but IO continues on new mount.
> > > > 
> > > > Do you see this behaviour in other upgrade tests?
> > > > 
> > > > [...]
> > > > 
> > > > > I have added client side logs for further debugging,
> > > > > 
> > > > > logs -
> > > > > http://magna002.ceph.redhat.com/ceph-qe-logs/suma/bz_verify/system_logs/
> > > > > 
> > > > > snippet:
> > > > > 
> > > > > 2023-11-07T05:26:58.840-0500 7f3544ff9640 -1 client.24439 I was blocklisted
> > > > > at osd epoch 471
> > > > > 
> > > > > INFO: task ceph-fuse:42425 blocked for more than 1228 seconds.
> > > > > [13394.110644]       Not tainted 5.14.0-284.30.1.el9_2.x86_64 #1
> > > > > [13394.111298] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > > > > this message.
> > > > > [13394.112095] task:ceph-fuse       state:D stack:    0 pid:42425 ppid:    
> > > > > 1 flags:0x00004006
> > > > > [13394.112941] Call Trace:
> > > > > [13394.113372]  <TASK>
> > > > > [13394.113862]  __schedule+0x248/0x620
> > > > > [13394.114402]  schedule+0x2d/0x60
> > > > > [13394.114918]  request_wait_answer+0x131/0x220 [fuse]
> > > > > [13394.115519]  ? cpuacct_percpu_seq_show+0x10/0x10
> > > > > [13394.116108]  fuse_simple_request+0x19f/0x310 [fuse]
> > > > > [13394.116808]  fuse_statfs+0xd8/0x140 [fuse]
> > > > > [13394.117380]  statfs_by_dentry+0x64/0x90
> > > > > [13394.117971]  user_statfs+0x57/0xc0
> > > > > [13394.118461]  __do_sys_statfs+0x20/0x60
> > > > > [13394.119003]  do_syscall_64+0x59/0x90
> > > > > [13394.119537]  ? exc_page_fault+0x62/0x150
> > > > > [13394.120154]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > 
> > > > Seems like the mount was unresponsive and that caused the MDS to blocklist
> > > > the client. Do you see other (kernel) client getting blocklisted? Was only
> > > > this (ceph-fuse) client being used for IO during upgrade?
> > > 
> > > > Do you see this behaviour in other upgrade tests?
> > > No, existing upgrade regression tests do IO during Ceph cluster upgrade and
> > > Ceph Status is Healthy with IO. In this case, the only new step was Client
> > > been upgraded prior to upgrade with IO.
> > > 
> > > Do you see other (kernel) client getting blocklisted?
> > > Only Ceph-fuse mount was covered, kernel mount was not created. If you don't
> > > need this system for further debugging I can rerun the same QA steps with
> > > kernel mount too.
> > 
> > Yes, please. Additionally, you could test using both RHCS 5/6 builds. See if
> > this (blocklisted client) is consistently reproducible. Also, use both the
> > user-space and the kernel driver in the test.
> 
> Repro is in-progress, I tried once but repro attempt was not successful
> i.e., Cluster was healthy after upgrade.
> Retrying again, will copy the logs to magna server when its accessible.

I could not reproduce with two attempts,

Logs:
http://magna002.ceph.redhat.com/ceph-qe-logs/suma/bz_verify/bz_2247174_client_blocklist_repro1.log
http://magna002.ceph.redhat.com/ceph-qe-logs/suma/bz_verify/bz_2247174_client_blocklist_repro2.log

As the Ceph status is healthy and Client had no issues after upgrade as per QA test plan, marking this BZ as VERIFIED.

Comment 1 Venky Shankar 2023-11-15 13:15:08 UTC

backport PR: https://github.com/ceph/ceph/pull/54244

Comment 24 Venky Shankar 2023-11-28 10:08:58 UTC

Doc text update. PTAL.

Comment 27 errata-xmlrpc 2023-12-12 13:56:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:7740

Note You need to log in before you can comment on or make changes to this bug.