Bug 2185532

Summary: [DR] OSD crash with OOM when removing data
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Mudit Agarwal <muagarwa>
Component: RADOSAssignee: Neha Ojha <nojha>
Status: CLOSED NOTABUG QA Contact: Pawan <pdhiran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2CC: bhubbard, bniver, ceph-eng-bugs, cephqe-warriors, ebenahar, ekuric, jdurgin, jespy, kramdoss, kseeger, mmuench, muagarwa, nojha, prsurve, rsussman, shberry, sostapov, vumrao
Target Milestone: ---Keywords: AutomationBackLog, Performance
Target Release: 6.1z2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2021931 Environment:
Last Closed: 2023-08-14 07:57:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2021931    

Description Mudit Agarwal 2023-04-10 06:57:03 UTC
Description of problem (please be detailed as possible and provide log
snippests):

OSD pod crashes due OOM

Version of all relevant components (if applicable):

OCP v4.8 / ODF v4.9 

$ ceph version
ceph version 16.2.0-143.el8cp (0e2c6f9639c37a03e55885fb922dc0cb1b5173cb) pacific (stable)

Default ODF installation - default limits/requests for ODF pods. 


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes

Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

3

Can this issue reproducible?
I got this issue 2 times, but do not have clear reproducer

- first time issue happened when I expanded cluster from 3->6 OSD nodes. When it happened cluster was in state described in https://bugzilla.redhat.com/show_bug.cgi?id=2021079
  As issue here started to be visible when I expanded cluster, looks similar to what is logged in https://bugzilla.redhat.com/show_bug.cgi?id=2008420 - 

- second time issue happened when as described in "Steps to Reproduce" below

Can this issue reproduce from the UI?
No

If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:

Below are steps which led to this issue. 
1. Install OCP/ODF on 2 clusters with above version and use ACM to setup ODF mirroring between clusters.
2. Create for example 100s ( in this case it was 600 ) Pods and attach to each pod for example 5 GB PVC , and write 1 GB of data per pod. This will lead that cca 600 GB will be written to ceph backend.
3. Check are images replicated between OCP/ODF clusters ( I used "rbd -p storagecluster-cephblockpool ls"), check "ceph df" output - it should be same on both clusters 
4. On first cluster delete all pods and VolumeReplication - this will trigger data delete on ceph backend. 
5. After 4 this, some of OSDs will end with OOM issue and restart, leading cluster in unstable state. I have seen this problem on both of clusters involved, but not at same time 

Actual results:
OSD(s) crash due OOM 

Expected results:

OSD(s) not to crash due OOM 

Additional info:

osd logs: 
http://perf148b.perf.lab.eng.bos.redhat.com/osd_crash_bz/

oc rsh -n openshift-storage $TOOLS_POD
sh-4.4$ ceph df
--- RAW STORAGE ---
CLASS    SIZE   AVAIL     USED  RAW USED  %RAW USED
ssd    18 TiB  16 TiB  1.7 TiB   1.7 TiB       9.37
TOTAL  18 TiB  16 TiB  1.7 TiB   1.7 TiB       9.37
 
--- POOLS ---
POOL                                    ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
storagecluster-cephblockpool             1  128  723 GiB  175.64k  1.7 TiB  11.02    5.0 TiB
storagecluster-cephfilesystem-metadata   2   32  503 KiB       22  1.5 MiB      0    4.7 TiB
storagecluster-cephfilesystem-data0      3  128      0 B        0      0 B      0    4.5 TiB
device_health_metrics                    4    1  1.2 MiB       12  2.3 MiB      0    6.7 TiB
sh-4.4$ ceph osd tree
ID   CLASS  WEIGHT    TYPE NAME                         STATUS  REWEIGHT  PRI-AFF
 -1         18.00000  root default                                               
 -5         18.00000      region us-west-2                                       
-10          6.00000          zone us-west-2a                                    
 -9          6.00000              host ip-10-0-134-115                           
  0    ssd   2.00000                  osd.0                 up   1.00000  1.00000
  3    ssd   2.00000                  osd.3                 up   1.00000  1.00000
  6    ssd   2.00000                  osd.6                 up   1.00000  1.00000
 -4          6.00000          zone us-west-2b                                    
 -3          6.00000              host ip-10-0-168-65                            
  1    ssd   2.00000                  osd.1                 up   1.00000  1.00000
  5    ssd   2.00000                  osd.5               down   1.00000  1.00000
  8    ssd   2.00000                  osd.8                 up   1.00000  1.00000
-14          6.00000          zone us-west-2c                                    
-13          6.00000              host ip-10-0-212-246                           
  2    ssd   2.00000                  osd.2                 up   1.00000  1.00000
  4    ssd   2.00000                  osd.4                 up   1.00000  1.00000
  7    ssd   2.00000                  osd.7                 up   1.00000  1.00000
  
  
  
sh-4.4$ ceph -s
  cluster:
    id:     d559afcb-accb-4431-a689-2e0555bf4b2b
    health: HEALTH_WARN
            1 osds down
            Slow OSD heartbeats on back (longest 9222.186ms)
            Slow OSD heartbeats on front (longest 8944.892ms)
            Degraded data redundancy: 54696/525393 objects degraded (10.410%), 45 pgs degraded, 92 pgs undersized
            snap trim queue for 5 pg(s) >= 32768 (mon_osd_snap_trim_queue_warn_on)
 
  services:
    mon:        3 daemons, quorum a,b,c (age 6d)
    mgr:        a(active, since 6d)
    mds:        1/1 daemons up, 1 hot standby
    osd:        9 osds: 8 up (since 80s), 9 in (since 2d)
    rbd-mirror: 1 daemon active (1 hosts)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 289 pgs
    objects: 175.13k objects, 570 GiB
    usage:   1.7 TiB used, 16 TiB / 18 TiB avail
    pgs:     54696/525393 objects degraded (10.410%)
             135 active+clean
             48  active+clean+snaptrim_wait
             47  active+undersized
             45  active+undersized+degraded
             14  active+clean+snaptrim
 
  io:
    client:   29 KiB/s rd, 20 KiB/s wr, 33 op/s rd, 75 op/s wr

Comment 5 Scott Ostapovicz 2023-07-12 12:37:41 UTC
Missed the 6.1 z1 window.  Retargeting to 6.1 z2.