I was creating around 1 million files using "dd" on a ceph-fuse mount point. After 70k, I was seeing a lot slowness in file creation. And ceph -w showed mds0: Client magna118: failing to respond to cache pressure. I could not even do "ls" on mount point. Below is the health status cluster 27b4d1a0-a522-4866-b344-4ea5b101c8bd health HEALTH_WARN 72 pgs degraded 142 pgs stuck unclean 72 pgs undersized recovery 1022647/4927032 objects degraded (20.756%) recovery 359102/4927032 objects misplaced (7.288%) mds0: Client magna118: failing to respond to cache pressure monmap e5: 1 mons at {magna070=10.8.128.70:6789/0} election epoch 63, quorum 0 magna070 fsmap e357: 1/1/1 up {0=magna070=up:active} osdmap e448: 11 osds: 11 up, 11 in; 70 remapped pgs flags sortbitwise pgmap v185880: 496 pgs, 10 pools, 1607 GB data, 1603 kobjects 68391 MB used, 10121 GB / 10187 GB avail 1022647/4927032 objects degraded (20.756%) 359102/4927032 objects misplaced (7.288%) 354 active+clean 72 active+undersized+degraded 70 active+remapped i saw there was a discussion going on upstream regarding the similar issue. there was a suggestion to change mds cache size : mds cache size = 2000000 . should i go ahead and do this change, or if you prefer to check the machine. I got a reply from Gregory Farnum via email.>> >> How long did this file creation take? The MDS/CephFS doesn't throttle requests so it's more than capable of overwhelming the underlying storage. A create is going to be 2 RADOS ops even before you do any file data write back, and 11 OSDs on hard drives probably won't be able to keep up with the MDS' create rate. Increasing the cache size might reduce the cache pressure for a good while, but it won't change how slow things might get. >> the file cration till 70k took almost 8 hours. the size of each file is around 1mb. SO tracking this from Bugzilla.
This looks like it's happening because of OSD problems (you have "142 pgs stuck unclean") -- if OSD requests for some files are stuck from the client, then the client will pin those files and refuse to drop them from its cache. Please see if you can reproduce this issue on a system where the OSDs are healthy.
HI, I have reproduced the issue where the osds are healthy. I have started to created a lot files and right now, 1191609 have been created and are creating slowly further. I also did ls -l on the ceph-fs mounted directory. I waited for more than 30 mins, I did not get any output. ceph -w cluster 358a0147-68dc-40c9-966d-b174441f0d70 health HEALTH_WARN mds0: Client magna049:0 failing to respond to cache pressure monmap e1: 1 mons at {a=10.8.128.45:6789/0} election epoch 3, quorum 0 a fsmap e130: 1/1/1 up {0=a=up:active} osdmap e7: 9 osds: 9 up, 9 in flags sortbitwise pgmap v20089: 72 pgs, 3 pools, 1163 GB data, 1163 kobjects 15712 MB used, 8364 GB / 8379 GB avail 72 active+clean ceph mds stat e130: 1/1/1 up {0=a=up:active}
Again, you've got a small cluster and are creating files as fast as the MDS can do so from a client. This is a scenario which could benefit from non-existent rate-limiting, and it's not surprising you're seeing stuff get slow. Also, 72 PGs (presumably 24 each on 3 pools) across 9 OSDs is not an appropriate number. This isn't helping. But even if you configured RADOS properly, you would still be able to overwhelm the system by pumping IO into it as quickly as possible for long enough. You could do the same thing on any system which doesn't rate-limit its inputs. We will probably do that in the future, but not now.