Bug 2091158 - cockpit-bridge --privileged has a memory leak when open for extended periods of time.
Summary: cockpit-bridge --privileged has a memory leak when open for extended periods ...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: cockpit
Version: 8.6
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Martin Pitt
QA Contact: Jan Ščotka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-27 17:11 UTC by Tom Crider
Modified: 2023-06-29 18:39 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-123581 0 None None None 2022-05-27 17:13:03 UTC

Comment 7 Martin Pitt 2022-06-21 11:04:22 UTC
Thank you Tom for your detailed report!

Allison, I have some trouble making semse of any of the leaks. The initial ones ("still reachable") like these:

==00:00:32:23.619 124531== 1 bytes in 1 blocks are still reachable in loss record 1 of 2,364
==00:00:32:23.619 124531==    at 0x4C37135: malloc (vg_replace_malloc.c:381)
==00:00:32:23.620 124531==    by 0x5497475: g_malloc (gmem.c:99)
==00:00:32:23.620 124531==    by 0x54B0DB2: g_strdup (gstrfuncs.c:363)
==00:00:32:23.620 124531==    by 0x5211634: g_param_spec_string (gparamspecs.c:2266)
==00:00:32:23.620 124531==    by 0x15433F: cockpit_web_server_class_init (cockpitwebserver.c:357)
==00:00:32:23.620 124531==    by 0x15433F: cockpit_web_server_class_intern_init (cockpitwebserver.c:85)
==00:00:32:23.620 124531==    by 0x52229DA: type_class_init_Wm (gtype.c:2233)
==00:00:32:23.620 124531==    by 0x52229DA: g_type_class_ref (gtype.c:2948)
==00:00:32:23.620 124531==    by 0x52091C1: g_object_new_valist (gobject.c:2074)
==00:00:32:23.620 124531==    by 0x52092AC: g_object_new (gobject.c:1642)
==00:00:32:23.620 124531==    by 0x130184: cockpit_packages_new (cockpitpackages.c:1233)
==00:00:32:23.620 124531==    by 0x11F4EE: setup_router (bridge.c:296)
==00:00:32:23.620 124531==    by 0x11EC64: run_bridge (bridge.c:396)
==00:00:32:23.620 124531==    by 0x11EC64: main (bridge.c:643)

.. seem harmless: they get allocated at startup, are singletons, and are just *never* freed as they live as long as the bridge is running.

So I took a closer look at the first "definitive" one:

==00:00:32:23.812 124531== 1,161,332 (606,384 direct, 554,948 indirect) bytes in 12,633 blocks are definitely lost in loss record 2,363 of 2,364
==00:00:32:23.812 124531==    at 0x4C37135: malloc (vg_replace_malloc.c:381)
==00:00:32:23.812 124531==    by 0x5497475: g_malloc (gmem.c:99)
==00:00:32:23.812 124531==    by 0x54AF086: g_slice_alloc (gslice.c:1026)
==00:00:32:23.812 124531==    by 0x54CEF55: g_variant_get_child_value (gvariant-core.c:1041)
==00:00:32:23.812 124531==    by 0x12175F: build_json_dictionary (cockpitdbusjson.c:705)
==00:00:32:23.812 124531==    by 0x12175F: build_json (cockpitdbusjson.c:863)
==00:00:32:23.812 124531==    by 0x121CCA: build_json_array_or_tuple (cockpitdbusjson.c:676)
==00:00:32:23.812 124531==    by 0x12165A: build_json (cockpitdbusjson.c:881)
==00:00:32:23.812 124531==    by 0x123380: build_json_body (cockpitdbusjson.c:960)
==00:00:32:23.812 124531==    by 0x123380: build_json_signal (cockpitdbusjson.c:985)
==00:00:32:23.812 124531==    by 0x123380: on_signal_message (cockpitdbusjson.c:2615)
==00:00:32:23.812 124531==    by 0x4F098B7: emit_signal_instance_in_idle_cb (gdbusconnection.c:3721)
==00:00:32:23.812 124531==    by 0x548E27A: g_idle_dispatch (gmain.c:5579)
==00:00:32:23.812 124531==    by 0x549195C: g_main_dispatch (gmain.c:3193)
==00:00:32:23.812 124531==    by 0x549195C: g_main_context_dispatch (gmain.c:3873)
==00:00:32:23.812 124531==    by 0x5491D17: g_main_context_iterate.isra.21 (gmain.c:3946)
==00:00:32:23.812 124531==    by 0x5491DAF: g_main_context_iteration (gmain.c:4007)
==00:00:32:23.812 124531==    by 0x11ED8B: run_bridge (bridge.c:426)
==00:00:32:23.812 124531==    by 0x11ED8B: main (bridge.c:643)

on_signal_message() creates object and unrefs it at the end, so that balances. send_with_barrier() also refs it but on_wait_complete() unrefs it again. Without a batch, cockpit_dbus_cache_barrier() immediately calls the callback (on_wait_complete), so that looks safe. If there is a batch, it gets queued up, and barrier_progress() eventually calls the callback, but in either case, cockpitdbuscache.c never even introspects its userdata.
Does that tell anything to you?

Comment 9 Tom Bouwman 2022-12-26 12:54:50 UTC
I am also having this problem.
I am running Fedora 37.
I recently (2022-12-14) upgraded from Fedora 36. I am not sure whether it was already a problem in F36.

Upgrade dates and versions:
2022-12-01: cockpit-bridge-279-1.fc36.x86_64 --> cockpit-bridge-280-1.fc36.x86_64
2022-12-14: cockpit-bridge-280-1.fc36.x86_64 --> cockpit-bridge-281-1.fc37.x86_64 (through DNF System Upgrade)
2022-12-19: cockpit-bridge-281-1.fc37.x86_64 --> cockpit-bridge-282-1.fc37.x86_64

What kind of info di you like to have?

Comment 10 Tom Bouwman 2023-01-09 20:46:59 UTC
I cannot reproduce the problem after booting to kernel 6.0.15-300.fc37.x86_64 on January 1st 2023

Comment 11 Tom Bouwman 2023-01-23 20:13:40 UTC
The problem is back.
Yesterday (Jan 22nd 2023) cockpit-bridge-282-1.fc37.x86_64 --> cockpit-bridge-283-1.fc37.x86_64 and a reboot to kernel 6.1.6-200.fc37.x86_64.

What kind of info do you like to have?

Comment 12 Tom Bouwman 2023-01-23 20:30:51 UTC
It is relatively easy to reproduce.

1. Login to cockpit
2. This wil bring you to Overview
3. Click on View metrics and history
4. Start top and sort on %MEM
5. Wait some time
6. All available will go to cockpit-bridge

Comment 13 Tom Bouwman 2023-02-01 16:02:31 UTC
top - 17:01:41 up 13:01,  3 users,  load average: 0.27, 0.31, 0.25
Tasks: 466 total,   2 running, 464 sleeping,   0 stopped,   0 zombie
%Cpu0  :  3.5 us,  0.6 sy,  0.0 ni, 95.1 id,  0.8 wa,  0.0 hi,  0.1 si,  0.0 st
%Cpu1  :  5.1 us,  1.5 sy,  0.0 ni, 92.5 id,  1.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  2.2 us,  1.4 sy,  0.0 ni, 95.9 id,  0.5 wa,  0.0 hi,  0.1 si,  0.0 st
%Cpu3  :  3.1 us,  2.1 sy,  0.0 ni, 94.0 id,  0.8 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8100940 total,   137224 free,  7819316 used,   144400 buff/cache
KiB Swap: 24878072 total, 24202668 free,   675404 used.    75276 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                      
  46129 root      20   0 7156720   6.5g   6536 R  10.8  84.6   6:48.96 cockpit-bridge                                                               
  46158 root      20   0  388220 221076   6220 S   0.0   2.7   1:29.46 /usr/libexec/cockpit-pcp                                                     
  20439 root      20   0  995240  63256   9712 S   0.0   0.8   0:31.87 /usr/libexec/packagekitd                                                     
   1513 dbus      20   0   42736  38296    916 S   1.7   0.5   1:47.48 dbus-broker --log 4 --controller 9 --machine-id 37a41eb83b7f43ce8e61400ddda+ 
  46104 cockpit+  20   0  193980  35328   7092 S   0.0   0.4   0:54.56 /usr/libexec/cockpit-ws --for-tls-proxy --port=0                             
      1 root      20   0  173380  12180   4844 S   2.4   0.2   3:07.26 /usr/lib/systemd/systemd rhgb --switched-root --system --deserialize 31      
   1587 root      20   0 1637080  12140   7540 S   0.0   0.1   0:05.10 /usr/sbin/libvirtd --timeout 120                                             
  46117 root      20   0   20192  12112   9160 S   0.2   0.1   0:01.62 /usr/lib/systemd/systemd --user

Comment 14 Tom Bouwman 2023-06-28 12:10:38 UTC
Not able to reproduce on cockpit-bridge-294.1-1.fc37.x86_64 anymore

This may be closed as could not reproduce.

Comment 15 Martin Pitt 2023-06-28 12:17:03 UTC
Thanks Tom Bouwman. Unfortunately I was never able to reproduce it either, but let's leave the last word to Tom Crider as the original reporter.

Going forward, we will soon switch Fedora to our new Python bridge reimplementation, which hopefully fixes this as well :-) But we won't introduce that into RHEL 8, it's too big of a change.

Comment 18 Tom Crider 2023-06-29 18:39:59 UTC
(In reply to Tom Bouwman from comment #14)
> Not able to reproduce on cockpit-bridge-294.1-1.fc37.x86_64 anymore
> 
> This may be closed as could not reproduce.

Hi Tom,

Unfortunately this bug was originally opened for RHEL 8, not Fedora. We cannot currently validate it as resolved on RHEL, and generally a memory leak would be something that would need to be fixed.

While I am glad to hear you're no longer experiencing the issue on Fedora, I don't feel closing the bug would be a good call here without validating whether or not it's still a problem in RHEL.

For future reference a separate bug should be opened for Fedora, with a reference to the RHEL bug, as they are different distributions and versions, and would need to be tracked separately.

As for now regarding the RHEL side, I am working to check if the issue persists as there have been 20+ version revisions in the RHEL cockpit-bridge package since the issue was reported over a year ago.

Martin thank you again for your ongoing work on this, it has been greatly appreciated!



Tom Crider, RHCE
Software Maintenance Engineer
Global Support Services - North America
Red Hat, Inc. 
1.888.GO.REDHAT


Note You need to log in before you can comment on or make changes to this bug.