Bug 1879866

Summary: [Tracker for Bug 1881316] FIO results on CephFS are up to 30% degraded from 4.5
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Avi Liani <alayani>
Component: cephAssignee: Patrick Donnelly <pdonnell>
Status: CLOSED NOTABUG QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: assingh, bniver, ekuric, jijoy, madam, muagarwa, ocs-bugs, pdonnell, sostapov
Target Milestone: ---Keywords: AutomationTriaged, Performance
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1881316 (view as bug list) Environment:
Last Closed: 2020-11-17 15:34:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1881316    

Comment 2 Yaniv Kaul 2020-09-17 09:57:32 UTC
So trying to understand the important items here:
1. It's on VMware LSO - what about other platforms?
2. What's the difference between the Ceph versions?
3. Is it on the same *OCP* versions?
4. RHCOS version?
5. I assume 'append' is the main issue?

How's Ceph doing? (ceph status would be nice to see!)

Comment 3 Avi Liani 2020-09-17 10:09:16 UTC
all logs and must-gather can be found at : http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/(In reply to Yaniv Kaul from comment #2)
> So trying to understand the important items here:
> 1. It's on VMware LSO - what about other platforms?
did not tested, yet

> 2. What's the difference between the Ceph versions?
Same ceph version

> 3. Is it on the same *OCP* versions?
yes - it is on the same OCP, only OCS was upgrade

> 4. RHCOS version?

> 5. I assume 'append' is the main issue?
You are referring the smallfile test, while this BZ is for the FIO test one page above in the report.
> 
> How's Ceph doing? (ceph status would be nice to see!)


All logs and must-gather can be found at : http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/

Comment 4 Avi Liani 2020-09-17 10:11:20 UTC
(In reply to Avi Liani from comment #3)
> all logs and must-gather can be found at :
> http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/(In
> reply to Yaniv Kaul from comment #2)
> > So trying to understand the important items here:
> > 1. It's on VMware LSO - what about other platforms?
> did not tested, yet
> 
> > 2. What's the difference between the Ceph versions?
> Same ceph version
> 
> > 3. Is it on the same *OCP* versions?
> yes - it is on the same OCP, only OCS was upgrade
> 
> > 4. RHCOS version?

# oc get node -o wide
NAME              STATUS   ROLES    AGE   VERSION           INTERNAL-IP    EXTERNAL-IP    OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
compute-0         Ready    worker   3d    v1.18.3+6c42de8   10.1.160.85    10.1.160.85    Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
compute-1         Ready    worker   3d    v1.18.3+6c42de8   10.1.160.105   10.1.160.105   Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
compute-2         Ready    worker   3d    v1.18.3+6c42de8   10.1.160.141   10.1.160.141   Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
control-plane-0   Ready    master   3d    v1.18.3+6c42de8   10.1.160.88    10.1.160.88    Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
control-plane-1   Ready    master   3d    v1.18.3+6c42de8   10.1.160.86    10.1.160.86    Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
control-plane-2   Ready    master   3d    v1.18.3+6c42de8   10.1.160.146   10.1.160.146   Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8

> 
> > 5. I assume 'append' is the main issue?
> You are referring the smallfile test, while this BZ is for the FIO test one
> page above in the report.
> > 
> > How's Ceph doing? (ceph status would be nice to see!)
> 
> 
> All logs and must-gather can be found at :
> http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/

Comment 5 Patrick Donnelly 2020-09-22 18:42:17 UTC
(In reply to Avi Liani from comment #3)
> all logs and must-gather can be found at :
> http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/(In
> reply to Yaniv Kaul from comment #2)
> > So trying to understand the important items here:
> > 1. It's on VMware LSO - what about other platforms?
> did not tested, yet
> 
> > 2. What's the difference between the Ceph versions?
> Same ceph version
>
> > 3. Is it on the same *OCP* versions?
> yes - it is on the same OCP, only OCS was upgrade

OCS 4.5 -> 4.6 but the Ceph version did not change?
 
> > 4. RHCOS version?
> 
> > 5. I assume 'append' is the main issue?
> You are referring the smallfile test, while this BZ is for the FIO test one
> page above in the report.
> > 
> > How's Ceph doing? (ceph status would be nice to see!)
> 
> 
> All logs and must-gather can be found at :
> http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1879866/

How can I access this machine to examine the logs. HTTP is unusable for analyzing logs.

If the Ceph version did not change, there's no reason to believe that there is a bug in CephFS (bz1881316). I looked at:

https://docs.google.com/document/d/1thRo0AGK2af2ECUGiLOBQ28UQtzWLX-2iYxeSfdRy9s/edit#

This appears to be data from a single run. There really should be at least three runs to do a proper analysis.

Finally, these are data path I/O where the MDS is minimally involved and the client I/O pattern will be almost the same as RBD. We have done extensive performance testing in the past which affirms this. I suspect this is a transient performance hiccup of some kind.

Comment 6 Scott Ostapovicz 2020-10-05 13:21:30 UTC
Assigning to you Patrick to track the info you requested (needinfo).  If it is not a bug please close it.

Comment 12 Mudit Agarwal 2020-10-20 08:12:18 UTC
Moving it out from 4.6, please bring it back if there is sufficient data.