Bug 2314844 - [CephFS][ODF] CephFS Performance Degradation with MongoDB Database Workload [NEEDINFO]
Summary: [CephFS][ODF] CephFS Performance Degradation with MongoDB Database Workload
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.16
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Venky Shankar
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-09-26 09:11 UTC by Manuel Gotin
Modified: 2024-10-21 04:39 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
vshankar: needinfo? (mgotin)
vshankar: needinfo? (mgotin)


Attachments (Terms of Use)
log data (1.28 MB, application/zip)
2024-09-26 09:11 UTC, Manuel Gotin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OCSBZM-9325 0 None None None 2024-10-07 13:03:06 UTC

Description Manuel Gotin 2024-09-26 09:11:24 UTC
Created attachment 2048790 [details]
log data

> Description of problem:

MongoDB experiences extremely high latencies, when using CephFS as storage backend in ODF. Yahoo! Cloud Serving Benchmark (ycsb) reports 10-20x higher latencies for CephFS compared to CephRBD:

- - - - - - - - - - - - - - - - - - - - - - -
# CephFS (PVC: ocs-storagecluster-cephfs)
$ cat bench_cephfs.log | grep AverageLatency | grep READ
[READ], AverageLatency(us), 17249.01316388889

# CephRBD (PVC: ocs-storagecluster-ceph-rbd)
$ cat bench_cephrbd.log | grep AverageLatency | grep READ
[READ], AverageLatency(us), 808.0933466666667
- - - - - - - - - - - - - - - - - - - - - - -

When looking at openshift-storage namespace accounting, one of the OSD reports disproportionate cpu activity, whereas CephFS-specific pods are not overly utilized (MDS):

- - - - - - - - - - - - - - - - - - - - - - -
# CephFS
$ cat oc_top_cephfs.log | grep openshift-storage | tr -s ' ' | grep -e osd -e mds
[...]
openshift-storage rook-ceph-osd-0-c6868b5cf-k47l5 845m 1362Mi
openshift-storage rook-ceph-osd-2-748987d888-5rh84 429m 1177Mi
openshift-storage rook-ceph-osd-1-74686d895f-sgfgl 349m 1163Mi
openshift-storage rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-648bffc6qdmvp 10m 33Mi
openshift-storage rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-86f47b7c24mbj 9m 40Mi
[...]

# CephRBD
$ cat oc_top_cephrbd.log | grep openshift-storage | tr -s ' ' | grep -e osd
[...]
openshift-storage rook-ceph-osd-1-7695b46c4f-xhspv 542m 2136Mi
openshift-storage rook-ceph-osd-2-5f7fbbc9bc-ztsgp 358m 1950Mi
openshift-storage rook-ceph-osd-0-7f5b6f7794-jsld9 318m 2048Mi
[...]
- - - - - - - - - - - - - - - - - - - - - - -

We can also observe, that 2-3x more disk I/O is happening during the CephFS case:

- - - - - - - - - - - - - - - - - - - - - - -
# CephFS
$ cat kvm-host_cephfs.sar.out | tr -s ' ' | grep dm | grep Average
          DEV  tps     rkB/s     ...
Average: dm-3 2405.38 363871.41 ...
Average: dm-4 1149.02 162065.92 ...
Average: dm-5 1396.91 176241.56 ...

# CephRBD
$ cat kvm-host_cephfs.sar.out | tr -s ' ' | grep dm | grep Average
         DEV  tps     rkB/s     ...
Average: dm-3 1958.09 57859.56 2499.29 0.00 30.83 0.21 0.11 81.37
Average: dm-4 1588.28 45954.98 2496.40 0.00 30.51 0.17 0.11 75.41
Average: dm-5 2526.14 74985.87 2494.49 0.00 30.67 0.27 0.11 86.93
- - - - - - - - - - - - - - - - - - - - - - -

When looking at the OCP Node hosting the MongoDB pod, we observe 4x more network traffic during the CephFS case:

- - - - - - - - - - - - - - - - - - - - - - -
# CephFS
$ cat worker-mongodb_cephfs.sar.out | grep enp0s25 | grep Average | tr -s ' '
         IFACE   rxpck/s   txpck/s rxkB/s    ...
Average: enp0s25 534928.53 8458.99 766989.82 ..

# CephRBD
$ cat worker-mongodb_cephrbd.sar.out | grep enp0s25 | grep Average | tr -s ' '
         IFACE   rxpck/s   txpck/s rxkB/s    ...
Average: enp0s25 160476.36 34438.07 201791.45 ...
- - - - - - - - - - - - - - - - - - - - - - -

We conclude, that ceph is misbehaving in the CephFS+MongoDB case, since the disk & network I/O is highly utilized but latency and throughput degrades extremely. This issue was introduced in ODF 4.13 (ceph version 17.2.6-70.el9cp) - it can not be reproduced with older releases.


> Version of all relevant components (if applicable):

- ODF 4.16
- MongoDB 7.0.12

This issue arises since ODF 4.13 (ceph version 17.2.6-70.el9cp). When downgrading to ODF 4.12 (ceph version 16.2.10-94.el8cp) MongoDB on CephFS performs as expected.

> Does this issue impact your ability to continue to work with the product?

This impacts database workloads on CephFS PV/PVCs. The high network & disk I/O misbehaviour can degrade ODF performance for other workloads.

> Is there any workaround available to the best of your knowledge?

-


> Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1

> Can this issue reproducible?

This issue is reproducible on ODF since ODF 4.13.
It is also reproducible different platforms, e.g. s390x architecture.

> Can this issue reproduce from the UI?

-

> If this is a regression, please provide more details to justify this:

-


> Steps to Reproduce:

1. Deploy MongoDB using CephFS PVC
2. Do Read / Write Workload on MongoDB, e.g. via ycsb
3. Observe high latencies


> Actual results:

MongoDB Latency >17ms

> Expected results:

MongoDB Latency <1ms

> Additional info:

We could not reproduce this issue with traditional I/O workloads, e.g. FIO or dbench. We are aware, that it is not recommended to use CephFS PV/PVCs for Database Workloads (https://access.redhat.com/solutions/7003415). Do the issues relate to each other?

Comment 2 Sunil Kumar Acharya 2024-10-07 13:00:38 UTC
Moving the non-blocker BZs out of ODF-4.17.0. If you believe this is a blocker issue, please feel free to propose it back to ODF-4.17.0 as blocker with justification note.

Comment 4 Manuel Gotin 2024-10-18 12:00:24 UTC
I notice a needinfo flag set to this issue. What info is exactly needed?


Note You need to log in before you can comment on or make changes to this bug.