Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1344666

Summary:	mds0: Client magna118: failing to respond to cache pressure
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	rakesh-gm <rgowdege>
Component:	CephFS	Assignee:	John Spray <john.spray>
Status:	CLOSED WONTFIX	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	2.0	CC:	ceph-eng-bugs, gfarnum, john.spray, rgowdege
Target Milestone:	rc
Target Release:	2.1
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-06-22 16:18:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description rakesh-gm 2016-06-10 09:46:11 UTC

I was creating around 1 million files using "dd" on a ceph-fuse mount point. After 70k, I was seeing a lot slowness in file creation. And ceph -w showed   mds0: Client magna118: failing to respond to cache pressure.  I could not even do "ls" on mount point. Below is the health status 

cluster 27b4d1a0-a522-4866-b344-4ea5b101c8bd
     health HEALTH_WARN
            72 pgs degraded
            142 pgs stuck unclean
            72 pgs undersized
            recovery 1022647/4927032 objects degraded (20.756%)
            recovery 359102/4927032 objects misplaced (7.288%)
            mds0: Client magna118: failing to respond to cache pressure
     monmap e5: 1 mons at {magna070=10.8.128.70:6789/0}
            election epoch 63, quorum 0 magna070
      fsmap e357: 1/1/1 up {0=magna070=up:active}
     osdmap e448: 11 osds: 11 up, 11 in; 70 remapped pgs
            flags sortbitwise
      pgmap v185880: 496 pgs, 10 pools, 1607 GB data, 1603 kobjects
            68391 MB used, 10121 GB / 10187 GB avail
            1022647/4927032 objects degraded (20.756%)
            359102/4927032 objects misplaced (7.288%)
                 354 active+clean
                  72 active+undersized+degraded
                  70 active+remapped

i saw there was a discussion going on upstream regarding the similar issue. there was a suggestion to change mds cache size :  mds cache size = 2000000 . should i go ahead and do this change, or if you prefer to check the machine.


I got a reply from  Gregory Farnum via email.>> 
>>
How long did this file creation take? The MDS/CephFS doesn't throttle requests so it's more than capable of overwhelming the underlying storage. A create is going to be 2 RADOS ops even before you do any file data write back, and 11 OSDs on hard drives probably won't be able to keep up with the MDS' create rate.
Increasing the cache size might reduce the cache pressure for a good while, but it won't change how slow things might get.
>>

the file cration till 70k took almost 8 hours. the size of each file is around 1mb. 


SO tracking this from Bugzilla.

Comment 2 John Spray 2016-06-13 14:00:38 UTC

This looks like it's happening because of OSD problems (you have "142 pgs stuck unclean") -- if OSD requests for some files are stuck from the client, then the client will pin those files and refuse to drop them from its cache.

Please see if you can reproduce this issue on a system where the OSDs are healthy.

Comment 3 rakesh-gm 2016-06-17 08:03:10 UTC

HI, 

I have reproduced the issue where the osds are healthy. I have started to created a lot files and right now, 1191609 have been created and are creating slowly further. I also did ls -l on the ceph-fs mounted directory. I waited for more than 30 mins, I did not get any output. 

ceph -w 
    cluster 358a0147-68dc-40c9-966d-b174441f0d70
     health HEALTH_WARN
            mds0: Client magna049:0 failing to respond to cache pressure
     monmap e1: 1 mons at {a=10.8.128.45:6789/0}
            election epoch 3, quorum 0 a
      fsmap e130: 1/1/1 up {0=a=up:active}
     osdmap e7: 9 osds: 9 up, 9 in
            flags sortbitwise
      pgmap v20089: 72 pgs, 3 pools, 1163 GB data, 1163 kobjects
            15712 MB used, 8364 GB / 8379 GB avail
                  72 active+clean
ceph mds stat
e130: 1/1/1 up {0=a=up:active}

Comment 6 Greg Farnum 2016-06-22 16:18:30 UTC

Again, you've got a small cluster and are creating files as fast as the MDS can do so from a client. This is a scenario which could benefit from non-existent rate-limiting, and it's not surprising you're seeing stuff get slow.

Also, 72 PGs (presumably 24 each on 3 pools) across 9 OSDs is not an appropriate number. This isn't helping.
But even if you configured RADOS properly, you would still be able to overwhelm the system by pumping IO into it as quickly as possible for long enough. You could do the same thing on any system which doesn't rate-limit its inputs. We will probably do that in the future, but not now.