Hide Forgot
Description of problem: teamdctl team state appears to consumed memory if run over a period of time. Version-Release number of selected component (if applicable): teamd How reproducible: Every time Steps to Reproduce: 1.repeatedly query "teamdctl ateam state" 2.check ps -auwwx for memory consumption 3. Actual results: Customer notes upto 2GB of memory ussage over 24hr. I found memory consumption go from .2% to 1.3% in just under an hour. Expected results: memory usage to state the same. Additional info: Customer provided a script for how they are tracking this. # cat team_mem_leak.sh #!/bin/bash while true do /usr/bin/teamdctl ateam0 state > /dev/null usleep 100000 /usr/bin/date "+%Y-%m-%dT%H:%M:%S $(ps -auwwx | fgrep /usr/bin/teamd |fgrep -v fgrep)" done This does have some impact as the customer is using SolarFlare ptp daemon with a team interface, sfptpd runs 'teamdctl ateam0 state' to determine the state of the team. here are the results of my test withthe customers script on the mentioned teamd package. # head -5 mem_leak.txt | column -t USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 2019-04-29T13:58:29 root 716 0.0 0.2 33672 2076 ? S 13:51 0:00 /usr/bin/teamd -o -n -U -D -N -t ateam0 -c {"hwaddr": "DE:19:5C:DB:52:DC", "link_watch": {"name": "ethtool"}, "runner": {"name": "activebackup"}} root 4323 0.0 0.0 112660 864 ttyS0 R+ 13:58 0:00 grep -F /usr/bin/teamd 2019-04-29T13:58:29 root 716 0.0 0.2 33672 2076 ? S 13:51 0:00 /usr/bin/teamd -o -n -U -D -N -t ateam0 -c {"hwaddr": "DE:19:5C:DB:52:DC", "link_watch": {"name": "ethtool"}, "runner": {"name": "activebackup"}} 2019-04-29T13:58:29 root 716 0.0 0.2 33672 2076 ? S 13:51 0:00 /usr/bin/teamd -o -n -U -D -N -t ateam0 -c {"hwaddr": "DE:19:5C:DB:52:DC", "link_watch": {"name": "ethtool"}, "runner": {"name": "activebackup"}} ---------------------snip------------------------ root 25202 0.0 0.0 112660 860 ttyS0 R 14:40 0:00 grep -F /usr/bin/teamd 2019-04-29T14:40:43 root 716 0.1 1.3 45452 13852 ? S 13:51 0:03 /usr/bin/teamd -o -n -U -D -N -t ateam0 -c {"hwaddr": "DE:19:5C:DB:52:DC", "link_watch": {"name": "ethtool"}, "runner": {"name": "activebackup"}} 2019-04-29T14:40:44 root 716 0.1 1.3 45452 13852 ? S 13:51 0:03 /usr/bin/teamd -o -n -U -D -N -t ateam0 -c {"hwaddr": "DE:19:5C:DB:52:DC", "link_watch": {"name": "ethtool"}, "runner": {"name": "activebackup"}} 2019-04-29T14:40:44 root 716 0.1 1.3 45452 13852 ? S 13:51 0:03 /usr/bin/teamd -o -n -U -D -N -t ateam0 -c {"hwaddr": "DE:19:5C:DB:52:DC", "link_watch": {"name": "ethtool"}, "runner": {"name": "activebackup"}} root 25223 0.0 0.0 112660 864 ttyS0 R 14:40 0:00 grep -F /usr/bin/teamd [root@firewalld ~]# tail -2 mem_leak.txt | column -t 2019-04-29T14:40:44 root 716 0.1 1.3 45452 13852 ? S 13:51 0:03 /usr/bin/teamd -o -n -U -D -N -t ateam0 -c {"hwaddr": "DE:19:5C:DB:52:DC", "link_watch": {"name": "ethtool"}, "runner": {"name": "activebackup"}} root 25223 0.0 0.0 112660 864 ttyS0 R 14:40 0:00 grep -F /usr/bin/teamd
I could reproduce this issue easily on my local host, but it seems the memleak detector tool, valgrind can't work on teamd. by reviewing the libteam code, I'm thinking it could be caused by the jasson lib use, need to confirm it later.
The leak happened in teamd_config_actual_dump() when calling get_port_obj(), not sure why the later json_decref() couldn't free all its children objs. It seems a jansson lib issue.
(In reply to Xin Long from comment #5) > The leak happened in teamd_config_actual_dump() when calling get_port_obj(), > not sure why the later json_decref() couldn't free all its children objs. It > seems a jansson lib issue. teamd_config_actual_dump shouldn't have used json_object_set() ,which can json_incref the child object causing later json_decref parent object not to be able to free it. This can be verified by: # cat jmemleak.c #include <stdio.h> #include <jansson.h> int main() { json_t *root, *child, *gchild_1, *gchild_2; root = json_object(); child = json_object(); gchild_1 = json_object(); gchild_2 = json_object(); json_object_set(root, "ports", child); json_object_set(child, "eth1", gchild_1); json_object_set(child, "eth2", gchild_2); json_decref(root); return 0; } # gcc jmemleak.c -o jmemleak -ljansson # valgrind --leak-check=full ./jmemleak ==20039== Memcheck, a memory error detector ==20039== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==20039== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info ==20039== Command: ./jmemleak ==20039== ==20039== ==20039== HEAP SUMMARY: ==20039== in use at exit: 706 bytes in 8 blocks ==20039== total heap usage: 11 allocs, 3 frees, 960 bytes allocated ==20039== ==20039== 706 (72 direct, 634 indirect) bytes in 1 blocks are definitely lost in loss record 8 of 8 ==20039== at 0x4C29E63: malloc (vg_replace_malloc.c:309) ==20039== by 0x4E3EC6A: json_object (value.c:53) ==20039== by 0x4006BC: main (in /tmp/jmemleak) ==20039== ==20039== LEAK SUMMARY: ==20039== definitely lost: 72 bytes in 1 blocks ==20039== indirectly lost: 634 bytes in 7 blocks ==20039== possibly lost: 0 bytes in 0 blocks ==20039== still reachable: 0 bytes in 0 blocks ==20039== suppressed: 0 bytes in 0 blocks ==20039== ==20039== For counts of detected and suppressed errors, rerun with: -v ==20039== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Fix in libteam (change to use json_object_set_new which doesn't do json_incref): diff --git a/teamd/teamd_config.c b/teamd/teamd_config.c index 69b25de..34fef1f 100644 --- a/teamd/teamd_config.c +++ b/teamd/teamd_config.c @@ -84,7 +84,7 @@ static int get_port_obj(json_t **pport_obj, json_t *config_json, ports_obj = json_object(); if (!ports_obj) return -ENOMEM; - err = json_object_set(config_json, "ports", ports_obj); + err = json_object_set_new(config_json, "ports", ports_obj); if (err) { json_decref(ports_obj); return -ENOMEM; @@ -95,7 +95,7 @@ static int get_port_obj(json_t **pport_obj, json_t *config_json, port_obj = json_object(); if (!port_obj) return -ENOMEM; - err = json_object_set(ports_obj, port_name, port_obj); + err = json_object_set_new(ports_obj, port_name, port_obj); if (err) { json_decref(port_obj); return -ENOMEM;
https://lists.fedoraproject.org/archives/list/libteam@lists.fedorahosted.org/thread/6XWQ5GXGZNZUUVI7465S7SGMERW74KJN/
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=23213580
Hi Long, Why i can't reproduce this bz using below script, can you tell me how to reproduce it? while true do /usr/bin/teamdctl ateam0 state > /dev/null usleep 100000 /usr/bin/date "+%Y-%m-%dT%H:%M:%S $(ps -auwwx | fgrep /usr/bin/teamd |fgrep -v fgrep)" done
reproduced: 2019-10-21T04:59:18 root 27348 0.7 0.2 133052 67856 ? S 03:54 0:29 teamd -d -t ateam0 -c {"link_watch" : {"name" : "ethtool"}} ^C [root@hp-dl388g8-22 ~]# rpm -q libteam libteam-1.27-9.el7.x86_64 verified: 2019-10-21T04:59:22 root 28164 1.0 0.0 66576 1396 ? S 03:58 0:38 teamd -d -t ateam0 -c {"link_watch" : {"name" : "ethtool"}} ^C [root@dell-per740-19 ~]# rpm -q libteam libteam-1.29-1.el7.x86_64
reproducer: ip link add aeth0 type dummy teamd -d -t ateam0 -c '{"link_watch" : {"name" : "ethtool"}}' teamdctl ateam0 port add aeth0 while true; do teamdctl ateam0 config dump actual > /dev/null date "+%Y-%m-%dT%H:%M:%S $(ps -auwwx | grep teamd |grep -v grep)" done
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1133