Created attachment 1433973 [details] fuse client log Description of problem: While running mds failover tests,at the start of test,IOs were running from 2 fuse and 2 kernel clients for filling up the cluster. While running IOs from a fuse client,I observed this message on ceph status: =================================== cluster: id: 40469cc1-e467-4a60-a122-d6b7716f7fd5 health: HEALTH_WARN 1 clients failing to respond to capability release 1 MDSs report slow requests services: mon: 1 daemons, quorum ceph-jenkins3-build-run201-node1-monmgrinstaller mgr: ceph-jenkins3-build-run201-node1-monmgrinstaller(active) mds: cephfs-2/2/2 up {0=ceph-jenkins3-build-run201-node4-mds=up:active,1=ceph-jenkins3-build-run201-node3-mds=up:active}, 2 up:standby osd: 12 osds: 12 up, 12 in data: pools: 3 pools, 192 pgs objects: 21683 objects, 6622 MB usage: 22236 MB used, 326 GB / 347 GB avail pgs: 192 active+clean ===================================== And there was no client IOs info in the ceph status. This test was running on VMs. IO tools used: Crefi,fio On fuse client, crefi was used which was hung. Version-Release number of selected component (if applicable): ceph version 12.2.4-10.el7cp (03fd19535b3701f3322c68b5f424335d6fc8dd66) luminous (stable) OS -Red Hat Enterprise Linux Server release 7.4 (Maipo) How reproducible: Always Steps to Reproduce: 1.Setup a ceph cluster with 4 MDS(2 active, 2 standby),4 clients(2 fuse,2 kernel), 1 mon+mgr,3 OSDS 2.Fill up cluster with IOs 3.Fail active mds,one after the other with IOs running. Actual results: IO hung with fuse client. I observed this on fuse client log. 2018-05-09 08:59:15.098969 7fb5d49ec700 0 -- 172.16.115.35:0/83230076 >> 172.16.115.77:6800/562710057 conn(0x56346dc4d800 :-1 s=STATE_OPEN pgs=24 cs=1 l=0).fault initiating reconnect Expected results: No IO failures and mds fail over should success. Additional info: Before running this test,another mds failover test had run with different dir pinning,was completed without any issues. log of this test: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1525864967774/cephfs-mds-failover_0.log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2375