Bug 1599507

Summary: [Continuous OSD memory usage growth in a HEALTH_OK cluster] RGW workload makes OSD memory explode
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: RADOSAssignee: Kefu Chai <kchai>
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: medium Docs Contact: Aron Gunn <agunn>
Priority: medium    
Version: 2.5CC: agunn, ceph-eng-bugs, ceph-qe-bugs, dzafman, hnallurv, jdurgin, kchai, nojha, tchandra, tserlin
Target Milestone: z1Keywords: CodeChange
Target Release: 2.5   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.10-23.el7cp Ubuntu: ceph_10.2.10-20redhat1 Doc Type: Bug Fix
Doc Text:
.Reduce OSD memory usage for Ceph Object Gateway workloads The OSD memory usage was tuned to reduce unnecessary usage, especially for Ceph Object Gateway workloads.
Story Points: ---
Clone Of:
: 1599856 1599859 (view as bug list) Environment:
Last Closed: 2018-07-26 18:06:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1536401    

Description Vikhyat Umrao 2018-07-10 00:39:16 UTC
Description of problem:
Continuous OSD memory usage growth in a HEALTH_OK cluster.

 health HEALTH_OK
     monmap e1: 3 mons at {mon032-node=192.168.1.124:6789/0,mon033-node=192.168.1.189:6789/0,mon034-node=192.168.1.252:6789/0}
            election epoch 74, quorum 0,1,2 mon032-node,mon033-node,mon034-node
     osdmap e109308: 266 osds: 266 up, 266 in
            flags require_jewel_osds
      pgmap v34020501: 11208 pgs, 19 pools, 63034 GB data, 28983 kobjects
            185 TB used, 1415 TB / 1601 TB avail
               11206 active+clean
                   2 active+clean+scrubbing


The only thing I see is -

- Sortbitwise and recovery_deletes flags are not set.

From the configuration side:

- PG count looks good - 130-150 PGs/OSD not too high it is an average recommended.
- These are data OSD's.




$ cat sos_commands/process/ps_auxwww | grep ceph-osd
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

ceph      577907  0.7  1.0 2788452 1390624 ?     Ssl  Jun20 101:57 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
ceph      628569  0.8  1.0 2793660 1383116 ?     Ssl  Jun20 105:38 /usr/bin/ceph-osd -f --cluster ceph --id 32 --setuser ceph --setgroup ceph
ceph     1437216 11.7  4.0 7959848 5282616 ?     Ssl  May25 5884:42 /usr/bin/ceph-osd -f --cluster ceph --id 69 --setuser ceph --setgroup ceph
ceph     1449074 13.3  2.9 6766348 3902308 ?     Ssl  May25 6666:02 /usr/bin/ceph-osd -f --cluster ceph --id 99 --setuser ceph --setgroup ceph
ceph     2423656 12.1  4.9 9440528 6511880 ?     Ssl  May20 6932:06 /usr/bin/ceph-osd -f --cluster ceph --id 216 --setuser ceph --setgroup ceph
ceph     2423667 10.4  1.6 4959508 2140976 ?     Ssl  May20 5969:04 /usr/bin/ceph-osd -f --cluster ceph --id 275 --setuser ceph --setgroup ceph
ceph     2423710 14.1  2.7 6722064 3636048 ?     Ssl  May20 8082:32 /usr/bin/ceph-osd -f --cluster ceph --id 129 --setuser ceph --setgroup ceph
ceph     2423713 12.3  2.9 6622516 3837800 ?     Ssl  May20 7047:32 /usr/bin/ceph-osd -f --cluster ceph --id 190 --setuser ceph --setgroup ceph
ceph     2423714 17.9  7.1 12669624 9321828 ?    Ssl  May20 10277:10 /usr/bin/ceph-osd -f --cluster ceph --id 248 --setuser ceph --setgroup ceph
ceph     2423715 16.2  2.2 6013768 2988384 ?     Ssl  May20 9299:55 /usr/bin/ceph-osd -f --cluster ceph --id 160 --setuser ceph --setgroup ceph


/dev/sdd1                             779890608    2735620  777154988   1% /var/lib/ceph/osd/ceph-5
/dev/sdk1                            7811388396 1124562080 6686826316  15% /var/lib/ceph/osd/ceph-216
/dev/sdf1                            7811388396  785395816 7025992580  11% /var/lib/ceph/osd/ceph-69
/dev/sdg1                            7811388396  891768084 6919620312  12% /var/lib/ceph/osd/ceph-99
/dev/sdh1                            7811388396  990338308 6821050088  13% /var/lib/ceph/osd/ceph-129
/dev/sdm1                            7811388396  867845728 6943542668  12% /var/lib/ceph/osd/ceph-275
/dev/sdj1                            7811388396  875274412 6936113984  12% /var/lib/ceph/osd/ceph-190
/dev/sde1                             779890608    3192556  776698052   1% /var/lib/ceph/osd/ceph-32
/dev/sdl1                            7811388396  906215476 6905172920  12% /var/lib/ceph/osd/ceph-248
/dev/sdi1                            7811388396  919347392 6892041004  12% /var/lib/ceph/osd/ceph-160

OSD.32 and OSD.5 are SSD OSD's.

The maximum %MEM OSD's are - 248, 216 and 69.

Perf top is failing in these three OSD's with:
[22620.236007] perf: interrupt took too long (5014 > 4975), lowering kernel.perf_event_max_sample_rate to 39000

OMAP sizes:
===============

du -sh /var/lib/ceph/osd/ceph-*/current/omap
323M    /var/lib/ceph/osd/ceph-129/current/omap
336M    /var/lib/ceph/osd/ceph-160/current/omap
193M    /var/lib/ceph/osd/ceph-190/current/omap

172M    /var/lib/ceph/osd/ceph-216/current/omap
360M    /var/lib/ceph/osd/ceph-248/current/omap
307M    /var/lib/ceph/osd/ceph-69/current/omap

280M    /var/lib/ceph/osd/ceph-275/current/omap
1.9G    /var/lib/ceph/osd/ceph-32/current/omap
1.4G    /var/lib/ceph/osd/ceph-5/current/omap
304M    /var/lib/ceph/osd/ceph-99/current/omap


Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 2.4 async
ceph-osd-10.2.7-48.el7cp.x86_64

How reproducible:
Always in the customer environment.

Comment 19 errata-xmlrpc 2018-07-26 18:06:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2261