Description of problem: TCMALLOC tuning to adjust the thread cache is very much part of Bare metal Ceph OSD processes. On Bare metal systems: #ps e -p $(pgrep ceph-os) /usr/bin/ceph-osd -f --cluster ceph --id 11 --setuser ceph --setgroup ceph LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin CLUSTER=ceph TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728 CEPH_AUTO_RESTART_ON_UPGRADE=no ~]# cat /proc/1329464/environ -> ceph-osd pid on bare metal LANG=en_US.UTF-8PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/binCLUSTER=cephTCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728CEPH_AUTO_RESTART_ON_UPGRADE=no As we can see above, TCMALLOC is well defined. But when running Ceph-osd in containers, at present this option is neither passed on the command line or set as an environment variable. So for Ceph OSDs in containers this tuning will not come to effect. This can cause performance degradation. This was added to bare metal config via https://bugzilla.redhat.com/show_bug.cgi?id=1297502 Version-Release number of selected component (if applicable): RHCS 3.0 How reproducible: Deploy Ceph in containers using the latest RHCS image, via ceph-ansible Steps to Reproduce: # ps -ef | grep ceph-osd | grep devsdo root 1278554 1278377 0 11:47 ? 00:00:00 /usr/bin/docker-current run --rm --net=host --privileged=true --pid=host -v /var/lib/ceph:/var/lib/ceph -v /etc/ceph:/etc/ceph -v /dev:/dev -v /etc/localtime:/etc/localtime:ro --device=/dev/sdo --device=/dev/sdo1 -e OSD_JOURNAL=/dev/nvme0n1 -e OSD_DEVICE=/dev/sdo -e CLUSTER=ceph -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE -e OSD_JOURNAL_SIZE=15120 -e OSD_FILESTORE=1 --name=ceph-osd-smerf02-10g-devsdo -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhceph:ceph-3.0-rhel-7-docker-candidate-64440-20171009211408 2. # docker exec dc68d1025629 cat /proc/1278554/environ => ceph-osd process in containers PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/binPWD=/LANG=en_US.UTF-8SHLVL=1 As seen above, this option is missing when running in containers. 3. Looking at environment variables for ceph-osd running in container # docker exec dc68d1025629 env PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=smerf02-10g OSD_JOURNAL=/dev/nvme0n1 OSD_DEVICE=/dev/sdo CLUSTER=ceph CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE OSD_JOURNAL_SIZE=15120 OSD_FILESTORE=1 container=docker HOME=/root We do not see any TCMALLOC setting here.
Release-notes please.
The variable is "TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES" not TCMalloc
In https://github.com/ceph/ceph-ansible/releases/tag/v3.0.5 Ken, could you please build a new package? Thanks.
You can unlock the situation by manually applying this commit https://github.com/ceph/ceph-ansible/pull/2128/commits/ab7eb79212f90edbfe29faf40dac5d209c7a70a9 to your current code and get going verifying this bug.
$ sudo docker exec ceph-osd-magna005-sdc env |grep TCMALLOC TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=102400 $ sudo docker exec ceph-osd-magna005-sdc cat /proc/10986/environ HOSTNAME=magna005OSD_DEVICE=/dev/sdcOLDPWD=/var/lib/ceph/tmp/tmp.S4SjQnk1ohLC_ALL=CPATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binPWD=/rootOSD_JOURNAL=SHLVL=0HOME=/rootTCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728CLUSTER=ceph_3OSD_DMCRYPT=1CEPH_DAEMON=osd_ceph_disk_activatecontainer=dockerOSD_FILESTORE=1 $ ps aux |grep TCMALLOC root 10061 0.0 0.0 811796 14980 ? Sl 06:14 0:00 /usr/bin/docker-current run --rm --net=host --privileged=true --pid=host --memory=1g --cpu-quota=100000 -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -v /var/lib/ceph:/var/lib/ceph -v /etc/ceph:/etc/ceph -e OSD_JOURNAL= -e OSD_FILESTORE=1 -e OSD_DMCRYPT=1 -e CLUSTER=ceph_3 -e OSD_DEVICE=/dev/sdb -e TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=102400 -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE --name=ceph-osd-magna005-sdb brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhceph:ceph-3.0-rhel-7-docker-candidate-61072-20171104225422 Looks good to me with ceph-ansible-3.0.9-1.el7cp.noarch, moving to VERIFIED state. Regards, Vasishta
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387