Description of problem (please be detailed as possible and provide log
snippests):
OSD pod crashes due OOM
Version of all relevant components (if applicable):
OCP v4.8 / ODF v4.9
$ ceph version
ceph version 16.2.0-143.el8cp (0e2c6f9639c37a03e55885fb922dc0cb1b5173cb) pacific (stable)
Default ODF installation - default limits/requests for ODF pods.
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes
Is there any workaround available to the best of your knowledge?
NA
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3
Can this issue reproducible?
I got this issue 2 times, but do not have clear reproducer
- first time issue happened when I expanded cluster from 3->6 OSD nodes. When it happened cluster was in state described in https://bugzilla.redhat.com/show_bug.cgi?id=2021079
As issue here started to be visible when I expanded cluster, looks similar to what is logged in https://bugzilla.redhat.com/show_bug.cgi?id=2008420 -
- second time issue happened when as described in "Steps to Reproduce" below
Can this issue reproduce from the UI?
No
If this is a regression, please provide more details to justify this:
NA
Steps to Reproduce:
Below are steps which led to this issue.
1. Install OCP/ODF on 2 clusters with above version and use ACM to setup ODF mirroring between clusters.
2. Create for example 100s ( in this case it was 600 ) Pods and attach to each pod for example 5 GB PVC , and write 1 GB of data per pod. This will lead that cca 600 GB will be written to ceph backend.
3. Check are images replicated between OCP/ODF clusters ( I used "rbd -p storagecluster-cephblockpool ls"), check "ceph df" output - it should be same on both clusters
4. On first cluster delete all pods and VolumeReplication - this will trigger data delete on ceph backend.
5. After 4 this, some of OSDs will end with OOM issue and restart, leading cluster in unstable state. I have seen this problem on both of clusters involved, but not at same time
Actual results:
OSD(s) crash due OOM
Expected results:
OSD(s) not to crash due OOM
Additional info:
osd logs:
http://perf148b.perf.lab.eng.bos.redhat.com/osd_crash_bz/
oc rsh -n openshift-storage $TOOLS_POD
sh-4.4$ ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 18 TiB 16 TiB 1.7 TiB 1.7 TiB 9.37
TOTAL 18 TiB 16 TiB 1.7 TiB 1.7 TiB 9.37
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
storagecluster-cephblockpool 1 128 723 GiB 175.64k 1.7 TiB 11.02 5.0 TiB
storagecluster-cephfilesystem-metadata 2 32 503 KiB 22 1.5 MiB 0 4.7 TiB
storagecluster-cephfilesystem-data0 3 128 0 B 0 0 B 0 4.5 TiB
device_health_metrics 4 1 1.2 MiB 12 2.3 MiB 0 6.7 TiB
sh-4.4$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 18.00000 root default
-5 18.00000 region us-west-2
-10 6.00000 zone us-west-2a
-9 6.00000 host ip-10-0-134-115
0 ssd 2.00000 osd.0 up 1.00000 1.00000
3 ssd 2.00000 osd.3 up 1.00000 1.00000
6 ssd 2.00000 osd.6 up 1.00000 1.00000
-4 6.00000 zone us-west-2b
-3 6.00000 host ip-10-0-168-65
1 ssd 2.00000 osd.1 up 1.00000 1.00000
5 ssd 2.00000 osd.5 down 1.00000 1.00000
8 ssd 2.00000 osd.8 up 1.00000 1.00000
-14 6.00000 zone us-west-2c
-13 6.00000 host ip-10-0-212-246
2 ssd 2.00000 osd.2 up 1.00000 1.00000
4 ssd 2.00000 osd.4 up 1.00000 1.00000
7 ssd 2.00000 osd.7 up 1.00000 1.00000
sh-4.4$ ceph -s
cluster:
id: d559afcb-accb-4431-a689-2e0555bf4b2b
health: HEALTH_WARN
1 osds down
Slow OSD heartbeats on back (longest 9222.186ms)
Slow OSD heartbeats on front (longest 8944.892ms)
Degraded data redundancy: 54696/525393 objects degraded (10.410%), 45 pgs degraded, 92 pgs undersized
snap trim queue for 5 pg(s) >= 32768 (mon_osd_snap_trim_queue_warn_on)
services:
mon: 3 daemons, quorum a,b,c (age 6d)
mgr: a(active, since 6d)
mds: 1/1 daemons up, 1 hot standby
osd: 9 osds: 8 up (since 80s), 9 in (since 2d)
rbd-mirror: 1 daemon active (1 hosts)
data:
volumes: 1/1 healthy
pools: 4 pools, 289 pgs
objects: 175.13k objects, 570 GiB
usage: 1.7 TiB used, 16 TiB / 18 TiB avail
pgs: 54696/525393 objects degraded (10.410%)
135 active+clean
48 active+clean+snaptrim_wait
47 active+undersized
45 active+undersized+degraded
14 active+clean+snaptrim
io:
client: 29 KiB/s rd, 20 KiB/s wr, 33 op/s rd, 75 op/s wr