Bug 1342402
| Summary: | Ceph-FS IOPs slowed down after adding OSDS | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | rakesh-gm <rgowdege> |
| Component: | CephFS | Assignee: | John Spray <john.spray> |
| Status: | CLOSED NOTABUG | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 2.0 | CC: | ceph-eng-bugs, john.spray, kdreyer, rgowdege, tchandra |
| Target Milestone: | rc | ||
| Target Release: | 2.2 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-06 12:08:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
It's not immediately clear that this is a cephfs issue. Has the same test ever been done while using e.g. an RBD image? You mention restarting the MDS (which probably wasn't necessary), was the performance different before/after restarting that? Also, just a sanity check, the wget is from somewhere local right, not from an internet source that might have slowed down on its own? (In reply to John Spray from comment #2) > It's not immediately clear that this is a cephfs issue. Has the same test > ever been done while using e.g. an RBD image? I have not tested with RBD, so I dont know the behaviour. > You mention restarting the MDS (which probably wasn't necessary), was the > performance different before/after restarting that? I was installing OSDS using ceph-ansible, (so the restart of MDS was not in my control). the IOPS were definitely slowed before the restart of MDS as as 3 more osds daemons were added. > Also, just a sanity check, the wget is from somewhere local right, not from > an internet source that might have slowed down on its own? wget is not from a local source, I was downloading an iso from fedora.org. I was downloading the same ISO several times, redirecting the file name using -O and the speeds were close in range of MBPS for the previous nodes. i saw the same issue while removing OSDS, but this i cant confirm now as i am still running the test OK, so for things to slow down while you add/remove OSDs is completely normal and expected (the OSDs are doing backfilling, which consumes some of their bandwidth). Once the status has all the PGs as active+clean again, performance should go back to normal. Of course, if you have removed OSDs, the overall system bandwidth will be lower after you have removed them. I notice that you have a relatively small number of PGs. This will lead to unpredictable performance. See the documentation for choosing the number of PGs in your pools: http://docs.ceph.com/docs/jewel/rados/operations/placement-groups/#a-preselection-of-pg-num Any performance-related test needs to be done with some dependable source for the data. wgets from the internet are just too unpredictable. It is better to use a benchmarking tool (such as fio) to generate a consistent load -- ask around in the QE group for advice on such benchmarks. still in needinfo -> re-targeting to 2.2 No response from the reporter, and no evidence this is a real bug. Closing. |
Description of problem: I was doing a wget of the file and at this point in time, I started to add mon and osds using ceph-ansible, IOPS were still in mbps during mon add and mon reached to a quorum. Then osds addition was completed ,At the end of this installation process ceph-MDS was restarted. ceph recovered to Active+clean state. But IOPS now dropped to kbps and remained the same till the download of the file was completed. Note that I had same FS by ceph-fuse and Kernel-client in different nodes. some of the outputs: 76% [==============================================================================================================> ] 1,635,711,544 237KB/s eta 30m 24s [root@magna070 ubuntu]# ceph -w cluster 27b4d1a0-a522-4866-b344-4ea5b101c8bd health HEALTH_OK monmap e2: 2 mons at {magna041=10.8.128.41:6789/0,magna070=10.8.128.70:6789/0} election epoch 12, quorum 0,1 magna041,magna070 fsmap e29: 1/1/1 up {0=magna070=up:active} osdmap e225: 12 osds: 12 up, 12 in flags sortbitwise pgmap v9801: 360 pgs, 8 pools, 20684 MB data, 5280 objects 63026 MB used, 11052 GB / 11114 GB avail 360 active+clean client io 850 kB/s wr, 0 op/s rd, 0 op/s wr 2016-06-03 07:57:08.579814 mon.0 [INF] pgmap v9800: 360 pgs: 360 active+clean; 20684 MB data, 63018 MB used, 11052 GB / 11114 GB avail; 429 kB/s wr, 0 op/s 2016-06-03 07:57:09.926819 mon.0 [INF] pgmap v9801: 360 pgs: 360 active+clean; 20684 MB data, 63026 MB used, 11052 GB / 11114 GB avail; 850 kB/s wr, 0 op/s 2016-06-03 07:57:10.985164 mon.0 [INF] pgmap v9802: 360 pgs: 360 active+clean; 20684 MB data, 63034 MB used, 11052 GB / 11114 GB avail 2016-06-03 07:57:14.293977 mon.0 [INF] pgmap v9803: 360 pgs: 360 active+clean; 20684 MB data, 63034 MB used, 11052 GB / 11114 GB avail ------------------>Attached MDS log file.