Bug 1879866 - [Tracker for Bug 1881316] FIO results on CephFS are up to 30% degraded from 4.5
Summary: [Tracker for Bug 1881316] FIO results on CephFS are up to 30% degraded from 4.5
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat
Component: ceph
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Patrick Donnelly
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks: 1881316
TreeView+ depends on / blocked
 
Reported: 2020-09-17 08:55 UTC by Avi Liani
Modified: 2021-08-24 07:50 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1881316 (view as bug list)
Environment:
Last Closed: 2020-11-17 15:34:59 UTC
Target Upstream Version:


Attachments (Terms of Use)

Comment 2 Yaniv Kaul 2020-09-17 09:57:32 UTC
So trying to understand the important items here:
1. It's on VMware LSO - what about other platforms?
2. What's the difference between the Ceph versions?
3. Is it on the same *OCP* versions?
4. RHCOS version?
5. I assume 'append' is the main issue?

How's Ceph doing? (ceph status would be nice to see!)

Comment 3 Avi Liani 2020-09-17 10:09:16 UTC
all logs and must-gather can be found at : http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/(In reply to Yaniv Kaul from comment #2)
> So trying to understand the important items here:
> 1. It's on VMware LSO - what about other platforms?
did not tested, yet

> 2. What's the difference between the Ceph versions?
Same ceph version

> 3. Is it on the same *OCP* versions?
yes - it is on the same OCP, only OCS was upgrade

> 4. RHCOS version?

> 5. I assume 'append' is the main issue?
You are referring the smallfile test, while this BZ is for the FIO test one page above in the report.
> 
> How's Ceph doing? (ceph status would be nice to see!)


All logs and must-gather can be found at : http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/

Comment 4 Avi Liani 2020-09-17 10:11:20 UTC
(In reply to Avi Liani from comment #3)
> all logs and must-gather can be found at :
> http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/(In
> reply to Yaniv Kaul from comment #2)
> > So trying to understand the important items here:
> > 1. It's on VMware LSO - what about other platforms?
> did not tested, yet
> 
> > 2. What's the difference between the Ceph versions?
> Same ceph version
> 
> > 3. Is it on the same *OCP* versions?
> yes - it is on the same OCP, only OCS was upgrade
> 
> > 4. RHCOS version?

# oc get node -o wide
NAME              STATUS   ROLES    AGE   VERSION           INTERNAL-IP    EXTERNAL-IP    OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
compute-0         Ready    worker   3d    v1.18.3+6c42de8   10.1.160.85    10.1.160.85    Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
compute-1         Ready    worker   3d    v1.18.3+6c42de8   10.1.160.105   10.1.160.105   Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
compute-2         Ready    worker   3d    v1.18.3+6c42de8   10.1.160.141   10.1.160.141   Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
control-plane-0   Ready    master   3d    v1.18.3+6c42de8   10.1.160.88    10.1.160.88    Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
control-plane-1   Ready    master   3d    v1.18.3+6c42de8   10.1.160.86    10.1.160.86    Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
control-plane-2   Ready    master   3d    v1.18.3+6c42de8   10.1.160.146   10.1.160.146   Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8

> 
> > 5. I assume 'append' is the main issue?
> You are referring the smallfile test, while this BZ is for the FIO test one
> page above in the report.
> > 
> > How's Ceph doing? (ceph status would be nice to see!)
> 
> 
> All logs and must-gather can be found at :
> http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/

Comment 5 Patrick Donnelly 2020-09-22 18:42:17 UTC
(In reply to Avi Liani from comment #3)
> all logs and must-gather can be found at :
> http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/(In
> reply to Yaniv Kaul from comment #2)
> > So trying to understand the important items here:
> > 1. It's on VMware LSO - what about other platforms?
> did not tested, yet
> 
> > 2. What's the difference between the Ceph versions?
> Same ceph version
>
> > 3. Is it on the same *OCP* versions?
> yes - it is on the same OCP, only OCS was upgrade

OCS 4.5 -> 4.6 but the Ceph version did not change?
 
> > 4. RHCOS version?
> 
> > 5. I assume 'append' is the main issue?
> You are referring the smallfile test, while this BZ is for the FIO test one
> page above in the report.
> > 
> > How's Ceph doing? (ceph status would be nice to see!)
> 
> 
> All logs and must-gather can be found at :
> http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/

How can I access this machine to examine the logs. HTTP is unusable for analyzing logs.

If the Ceph version did not change, there's no reason to believe that there is a bug in CephFS (bz1881316). I looked at:

https://docs.google.com/document/d/1thRo0AGK2af2ECUGiLOBQ28UQtzWLX-2iYxeSfdRy9s/edit#

This appears to be data from a single run. There really should be at least three runs to do a proper analysis.

Finally, these are data path I/O where the MDS is minimally involved and the client I/O pattern will be almost the same as RBD. We have done extensive performance testing in the past which affirms this. I suspect this is a transient performance hiccup of some kind.

Comment 6 Scott Ostapovicz 2020-10-05 13:21:30 UTC
Assigning to you Patrick to track the info you requested (needinfo).  If it is not a bug please close it.

Comment 12 Mudit Agarwal 2020-10-20 08:12:18 UTC
Moving it out from 4.6, please bring it back if there is sufficient data.


Note You need to log in before you can comment on or make changes to this bug.