Bug 1576551 - [CephFS] IOs from fuse-clients were hung
Summary: [CephFS] IOs from fuse-clients were hung
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: CephFS
Version: 3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z5
: 3.0
Assignee: Patrick Donnelly
QA Contact: Rishabh Dave
Shreekar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-09 17:29 UTC by Shreekar
Modified: 2018-08-09 18:27 UTC (History)
7 users (show)

Fixed In Version: RHEL: ceph-12.2.4-32 Ubuntu: ceph_12.2.4-36redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-09 18:27:11 UTC
Target Upstream Version:


Attachments (Terms of Use)
fuse client log (22.41 KB, text/plain)
2018-05-09 17:29 UTC, Shreekar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2375 0 None None None 2018-08-09 18:27:42 UTC

Description Shreekar 2018-05-09 17:29:55 UTC
Created attachment 1433973 [details]
fuse client log

Description of problem:
While running mds failover tests,at the start of test,IOs were running from 2 fuse and 2 kernel clients for  filling up the cluster. While running IOs from a fuse client,I observed this message on ceph status:
===================================
  cluster:
    id:     40469cc1-e467-4a60-a122-d6b7716f7fd5
    health: HEALTH_WARN
            1 clients failing to respond to capability release
            1 MDSs report slow requests
 
  services:
    mon: 1 daemons, quorum ceph-jenkins3-build-run201-node1-monmgrinstaller
    mgr: ceph-jenkins3-build-run201-node1-monmgrinstaller(active)
    mds: cephfs-2/2/2 up  {0=ceph-jenkins3-build-run201-node4-mds=up:active,1=ceph-jenkins3-build-run201-node3-mds=up:active}, 2 up:standby
    osd: 12 osds: 12 up, 12 in
 
  data:
    pools:   3 pools, 192 pgs
    objects: 21683 objects, 6622 MB
    usage:   22236 MB used, 326 GB / 347 GB avail
    pgs:     192 active+clean
=====================================
And there was no client IOs info in the ceph status. This test was running on VMs.
IO tools used: Crefi,fio
On fuse client, crefi was used which was hung.

Version-Release number of selected component (if applicable):
ceph version 12.2.4-10.el7cp (03fd19535b3701f3322c68b5f424335d6fc8dd66) luminous (stable)
OS -Red Hat Enterprise Linux Server release 7.4 (Maipo)
How reproducible:
Always

Steps to Reproduce:
1.Setup a ceph cluster with 4 MDS(2 active, 2 standby),4 clients(2 fuse,2 kernel), 1 mon+mgr,3 OSDS
2.Fill up cluster with IOs
3.Fail active mds,one after the other with IOs running.

Actual results:
IO hung with fuse client. I observed this on fuse client log.
2018-05-09 08:59:15.098969 7fb5d49ec700  0 -- 172.16.115.35:0/83230076 >> 172.16.115.77:6800/562710057 conn(0x56346dc4d800 :-1 s=STATE_OPEN pgs=24 cs=1 l=0).fault initiating reconnect


Expected results:
No IO failures and mds fail over should success.

Additional info:
Before running this test,another mds failover test had run with different dir pinning,was completed without any issues. log of this test: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1525864967774/cephfs-mds-failover_0.log

Comment 13 errata-xmlrpc 2018-08-09 18:27:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2375


Note You need to log in before you can comment on or make changes to this bug.