Bug 1618375
| Summary: | rpc.gssd memory use grows unbounded when user accesses krb5 mount without having kerberos credentials | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Frank Sorenson <fsorenso> | ||||||
| Component: | gssproxy | Assignee: | Robbie Harwood <rharwood> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | ipa-qe <ipa-qe> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 7.4 | CC: | dwysocha, fs-qe, gsierohu, jshivers, ksiddiqu, ndehadra, rharwood, xzhou, yoyang | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| URL: | https://pagure.io/gssproxy/pull-request/233 | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: |
Cause: gssproxy stored statuses were not freed on request termination.
Consequence: Accesses to Kerberized NFS mounts without krb5 credentials caused rpc.gssd to leak memory.
Fix: Change the local storage method for statuses.
Result: No more memory leak.
|
Story Points: | --- | ||||||
| Clone Of: | |||||||||
| : | 1631564 (view as bug list) | Environment: | |||||||
| Last Closed: | 2019-08-06 12:39:15 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1631564 | ||||||||
| Bug Blocks: | 1647919 | ||||||||
| Attachments: |
|
||||||||
I'm thinking this could be due to fragmentation on the memory allocations Created attachment 1484480 [details]
graph of RSS when access attempts fail
this graph shows approximately 2500 measurements of RSS usage by rpc.gssd for over 50,000 nfs access attempts (measured every 20). RSS usage appears to grow linearly
for the 50,500 iteration test, RSS increased from 2648 KB to 17832 KB:
$ echo $((15184000 / 50500))
300
so about 300 bytes/iteration
after running gssd under valgrind memcheck, and making 3238 failing access attempts, RSS increase is not particularly meaningful, due to the memory valgrind itself needs. However, here are the largest per-iteration leaks:
(9 bytes)
==00:00:05:52.270 29112== 29,142 bytes in 3,238 blocks are indirectly lost in loss record 77 of 81
==00:00:05:52.270 29112== at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==00:00:05:52.270 29112== by 0x7AB4BF3: gp_memdup (gp_conv.c:15)
==00:00:05:52.270 29112== by 0x7AB4C8A: gp_conv_octet_string (gp_conv.c:33)
==00:00:05:52.270 29112== by 0x7AB5C64: gp_copy_gssx_status_alloc (gp_conv.c:555)
==00:00:05:52.270 29112== by 0x7AB87D5: gpm_save_status (gpm_display_status.c:18)
==00:00:05:52.270 29112== by 0x7AB93F8: gpm_acquire_cred (gpm_acquire_cred.c:112)
==00:00:05:52.270 29112== by 0x7ABE40F: gssi_acquire_cred_from (gpp_acquire_cred.c:165)
==00:00:05:52.270 29112== by 0x508DADF: gss_add_cred_from (g_acquire_cred.c:455)
==00:00:05:52.270 29112== by 0x508E128: gss_acquire_cred_from (g_acquire_cred.c:190)
==00:00:05:52.270 29112== by 0x508E343: gss_acquire_cred (g_acquire_cred.c:107)
==00:00:05:52.270 29112== by 0x11023E: gssd_acquire_krb5_cred (krb5_util.c:1364)
==00:00:05:52.270 29112== by 0x1122DD: gssd_acquire_user_cred (krb5_util.c:1384)
==00:00:05:52.270 29112== by 0x10F271: krb5_not_machine_creds (gssd_proc.c:508)
==00:00:05:52.270 29112== by 0x10F7F3: process_krb5_upcall (gssd_proc.c:647)
==00:00:05:52.270 29112== by 0x10FF38: handle_gssd_upcall (gssd_proc.c:814)
==00:00:05:52.270 29112== by 0x5C05E24: start_thread (pthread_create.c:308)
==00:00:05:52.270 29112== by 0x5F1234C: clone (clone.S:113)
this is allocated in gp_copy_gssx_status_alloc in gssproxy/src/gp_conv.c
int gp_copy_gssx_status_alloc(gssx_status *in, gssx_status **out)
{
gssx_status *o;
int ret;
o = calloc(1, sizeof(gssx_status));
if (!o) {
return ENOMEM;
}
o->major_status = in->major_status;
o->minor_status = in->minor_status;
if (in->mech.octet_string_len) {
ret = gp_conv_octet_string(in->mech.octet_string_len,
in->mech.octet_string_val,
&o->mech);
(27 bytes)
==00:00:05:52.270 29112== 87,426 bytes in 3,238 blocks are indirectly lost in loss record 79 of 81
==00:00:05:52.270 29112== at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==00:00:05:52.270 29112== by 0x7AB4BF3: gp_memdup (gp_conv.c:15)
==00:00:05:52.270 29112== by 0x7AB5B86: gp_copy_utf8string (gp_conv.c:532)
==00:00:05:52.270 29112== by 0x7AB5CBC: gp_copy_gssx_status_alloc (gp_conv.c:572)
==00:00:05:52.270 29112== by 0x7AB87D5: gpm_save_status (gpm_display_status.c:18)
==00:00:05:52.270 29112== by 0x7AB93F8: gpm_acquire_cred (gpm_acquire_cred.c:112)
==00:00:05:52.270 29112== by 0x7ABE40F: gssi_acquire_cred_from (gpp_acquire_cred.c:165)
==00:00:05:52.270 29112== by 0x508DADF: gss_add_cred_from (g_acquire_cred.c:455)
==00:00:05:52.270 29112== by 0x508E128: gss_acquire_cred_from (g_acquire_cred.c:190)
==00:00:05:52.270 29112== by 0x508E343: gss_acquire_cred (g_acquire_cred.c:107)
==00:00:05:52.270 29112== by 0x11023E: gssd_acquire_krb5_cred (krb5_util.c:1364)
==00:00:05:52.270 29112== by 0x1122DD: gssd_acquire_user_cred (krb5_util.c:1384)
==00:00:05:52.270 29112== by 0x10F271: krb5_not_machine_creds (gssd_proc.c:508)
==00:00:05:52.270 29112== by 0x10F7F3: process_krb5_upcall (gssd_proc.c:647)
==00:00:05:52.270 29112== by 0x10FF38: handle_gssd_upcall (gssd_proc.c:814)
==00:00:05:52.270 29112== by 0x5C05E24: start_thread (pthread_create.c:308)
==00:00:05:52.270 29112== by 0x5F1234C: clone (clone.S:113)
also allocated in gp_copy_gssx_status_alloc in gssproxy/src/gp_conv.c
if (in->minor_status_string.utf8string_len) {
ret = gp_copy_utf8string(&in->minor_status_string,
&o->minor_status_string);
(66 bytes)
==00:00:05:52.270 29112== 213,708 bytes in 3,238 blocks are indirectly lost in loss record 80 of 81
==00:00:05:52.270 29112== at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==00:00:05:52.270 29112== by 0x7AB4BF3: gp_memdup (gp_conv.c:15)
==00:00:05:52.270 29112== by 0x7AB5B86: gp_copy_utf8string (gp_conv.c:532)
==00:00:05:52.270 29112== by 0x7AB5C9C: gp_copy_gssx_status_alloc (gp_conv.c:564)
==00:00:05:52.270 29112== by 0x7AB87D5: gpm_save_status (gpm_display_status.c:18)
==00:00:05:52.270 29112== by 0x7AB93F8: gpm_acquire_cred (gpm_acquire_cred.c:112)
==00:00:05:52.270 29112== by 0x7ABE40F: gssi_acquire_cred_from (gpp_acquire_cred.c:165)
==00:00:05:52.270 29112== by 0x508DADF: gss_add_cred_from (g_acquire_cred.c:455)
==00:00:05:52.270 29112== by 0x508E128: gss_acquire_cred_from (g_acquire_cred.c:190)
==00:00:05:52.270 29112== by 0x508E343: gss_acquire_cred (g_acquire_cred.c:107)
==00:00:05:52.270 29112== by 0x11023E: gssd_acquire_krb5_cred (krb5_util.c:1364)
==00:00:05:52.270 29112== by 0x1122DD: gssd_acquire_user_cred (krb5_util.c:1384)
==00:00:05:52.270 29112== by 0x10F271: krb5_not_machine_creds (gssd_proc.c:508)
==00:00:05:52.270 29112== by 0x10F7F3: process_krb5_upcall (gssd_proc.c:647)
==00:00:05:52.270 29112== by 0x10FF38: handle_gssd_upcall (gssd_proc.c:814)
==00:00:05:52.270 29112== by 0x5C05E24: start_thread (pthread_create.c:308)
==00:00:05:52.270 29112== by 0x5F1234C: clone (clone.S:113)
also allocated in gp_copy_gssx_status_alloc in gssproxy/src/gp_conv.c
if (in->major_status_string.utf8string_len) {
ret = gp_copy_utf8string(&in->major_status_string,
&o->major_status_string);
(198 bytes)
==00:00:05:52.270 29112== 641,124 (310,848 direct, 330,276 indirect) bytes in 3,238 blocks are definitely lost in loss record 81 of 81
==00:00:05:52.271 29112== at 0x4C2B955: calloc (vg_replace_malloc.c:711)
==00:00:05:52.271 29112== by 0x7AB5BEE: gp_copy_gssx_status_alloc (gp_conv.c:546)
==00:00:05:52.271 29112== by 0x7AB87D5: gpm_save_status (gpm_display_status.c:18)
==00:00:05:52.271 29112== by 0x7AB93F8: gpm_acquire_cred (gpm_acquire_cred.c:112)
==00:00:05:52.271 29112== by 0x7ABE40F: gssi_acquire_cred_from (gpp_acquire_cred.c:165)
==00:00:05:52.271 29112== by 0x508DADF: gss_add_cred_from (g_acquire_cred.c:455)
==00:00:05:52.271 29112== by 0x508E128: gss_acquire_cred_from (g_acquire_cred.c:190)
==00:00:05:52.271 29112== by 0x508E343: gss_acquire_cred (g_acquire_cred.c:107)
==00:00:05:52.271 29112== by 0x11023E: gssd_acquire_krb5_cred (krb5_util.c:1364)
==00:00:05:52.271 29112== by 0x1122DD: gssd_acquire_user_cred (krb5_util.c:1384)
==00:00:05:52.271 29112== by 0x10F271: krb5_not_machine_creds (gssd_proc.c:508)
==00:00:05:52.271 29112== by 0x10F7F3: process_krb5_upcall (gssd_proc.c:647)
==00:00:05:52.271 29112== by 0x10FF38: handle_gssd_upcall (gssd_proc.c:814)
==00:00:05:52.271 29112== by 0x5C05E24: start_thread (pthread_create.c:308)
==00:00:05:52.271 29112== by 0x5F1234C: clone (clone.S:113)
int gp_copy_gssx_status_alloc(gssx_status *in, gssx_status **out)
{
gssx_status *o;
int ret;
o = calloc(1, sizeof(gssx_status));
these loss records add up to 300 bytes/iteration:
$ echo $(( 9 + 27 + 66 + 198 ))
300
So these are all allocated in the same function in gssproxy code, and consist of the gssx_status 'out' and allocations attached to that struct.
The copy of the data occurs in gssproxy /src/client/gpm_display_status.c:
__thread gssx_status *tls_last_status = NULL;
/* Thread local storage for return status.
* FIXME: it's not the most portable construct, so may need fixing in future */
void gpm_save_status(gssx_status *status)
{
int ret;
if (tls_last_status) {
xdr_free((xdrproc_t)xdr_gssx_status, (char *)tls_last_status);
free(tls_last_status);
}
ret = gp_copy_gssx_status_alloc(status, &tls_last_status);
if (ret) {
/* make sure tls_last_status is zeored on error */
tls_last_status = NULL;
}
}
Okay, so the tls data is not getting freed.
Moving this over to gssproxy
The thread variable 'tls_last_status' itself is cleaned up when the thread exits, however that's just a single pointer. The heap data allocated for the 'gssx_status' to which it points, as well as the heap data allocated for the contents of the 'gssx_status' are both shared with the main process, so are not freed when the thread exits.
This data is then leaked (300 bytes/thread in a simple test), with gssd RSS usage growing linearly and unbounded
At some point before the thread exits, the data to which 'tls_last_status' points, and 'tls_last_status' itself need to be freed.
In my testing, the memory got allocated after gpm_acquire_cred() made the call to gpm_save_status(), for example:
==00:00:00:26.868 19968== 40,656 bytes in 616 blocks are definitely lost in loss record 77 of 77
==00:00:00:26.868 19968== at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==00:00:00:26.868 19968== by 0x7AB4EC3: gp_memdup (gp_conv.c:15)
==00:00:00:26.868 19968== by 0x7AB5E56: gp_copy_utf8string (gp_conv.c:532)
==00:00:00:26.868 19968== by 0x7AB5F44: gp_copy_gssx_status_alloc (gp_conv.c:560)
==00:00:00:26.868 19968== by 0x7AB8AFE: gpm_save_status (gpm_display_status.c:27)
==00:00:00:26.869 19968== by 0x7AB9728: gpm_acquire_cred (gpm_acquire_cred.c:112)
==00:00:00:26.869 19968== by 0x7ABE73F: gssi_acquire_cred_from (gpp_acquire_cred.c:165)
==00:00:00:26.869 19968== by 0x508DADF: gss_add_cred_from (g_acquire_cred.c:455)
==00:00:00:26.869 19968== by 0x508E128: gss_acquire_cred_from (g_acquire_cred.c:190)
==00:00:00:26.869 19968== by 0x508E343: gss_acquire_cred (g_acquire_cred.c:107)
==00:00:00:26.869 19968== by 0x11025E: gssd_acquire_krb5_cred (krb5_util.c:1364)
==00:00:00:26.869 19968== by 0x1122FD: gssd_acquire_user_cred (krb5_util.c:1384)
==00:00:00:26.869 19968== by 0x10F2B1: krb5_not_machine_creds (gssd_proc.c:508)
==00:00:00:26.869 19968== by 0x10F833: process_krb5_upcall (gssd_proc.c:647)
==00:00:00:26.869 19968== by 0x10FF63: handle_gssd_upcall (gssd_proc.c:814)
==00:00:00:26.869 19968== by 0x5C05E24: start_thread (pthread_create.c:308)
==00:00:00:26.869 19968== by 0x5F1234C: clone (clone.S:113)
versions:
gssproxy-0.7.0-21.el7.x86_64
krb5-libs-1.15.1-34.el7.x86_64
nfs-utils-1.3.0-0.59.el7.x86_64
Created attachment 1484954 [details]
patch to free the tls_last_status memory
This patch frees the memory attached to tls_last_status, as well as tls_last_status itself.
RSS usage shows no increase over 250,000 bad accesses.
I believe it retains the tls_last_status as long as is needed for useful output, however I'm not familiar enough with the code to be certain.
Thanks for your investigation, Frank! This definitely seems a bug in gssproxy. I'll try to get a fix ready in a bit. Verified. rpm version: ============ [root@host-8-245-164 ~]#rpm -q gssproxy gssproxy-0.7.0-26.el7.x86_64 [root@host-8-245-164 ~]# Console output: ================ [root@host-8-245-164 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 1916856 0 1916856 0% /dev tmpfs 1940128 4 1940124 1% /dev/shm tmpfs 1940128 33056 1907072 2% /run tmpfs 1940128 0 1940128 0% /sys/fs/cgroup /dev/vda1 41931756 1839436 40092320 5% / tmpfs 388028 0 388028 0% /run/user/0 host-8-247-42.testrelm.test:/export 41931776 1846784 40084992 5% /nfsdir [root@host-8-245-164 ~]# klist Ticket cache: KEYRING:persistent:0:krb_ccache_YxDhu54 Default principal: host/host-8-245-164.testrelm.test Valid starting Expires Service principal 12/31/1969 19:00:00 12/31/1969 19:00:00 Encrypted/Credentials/v1@X-GSSPROXY: [root@host-8-245-164 ~]# ls /nfsdir test.txt [root@host-8-245-164 ~]# su - testuser1 Last login: Mon May 13 12:17:18 EDT 2019 on pts/0 su: warning: cannot change directory to /home/testuser1: No such file or directory -sh-4.2$ ls /nfsdir ls: cannot access /nfsdir: Permission denied -sh-4.2$ while true ; do ls /nfsdir ; done ls: cannot access /nfsdir: Permission denied ls: cannot access /nfsdir: Permission denied ls: cannot access /nfsdir: Permission denied ls: cannot access /nfsdir: Permission denied ls: cannot access /nfsdir: Permission denied ls: cannot access /nfsdir: Permission denied ... ls: cannot access /nfsdir: Permission denied ^C -sh-4.2$ logout [root@host-8-245-164 ~]#rpm -q gssproxy gssproxy-0.7.0-26.el7.x86_64 [root@host-8-245-164 ~]# [root@host-8-245-164 ~]# while true ; do echo "$(date): $(ps h -C rpc.gssd -o size,vsize,share,rss,sz,trs)" ; sleep 1 ; done Mon May 13 12:23:03 EDT 2019: 74156 137352 - 4544 34338 76 Mon May 13 12:23:04 EDT 2019: 74156 137352 - 4544 34338 76 Mon May 13 12:23:05 EDT 2019: 74156 137352 - 4544 34338 76 Mon May 13 12:23:06 EDT 2019: 74156 137352 - 4544 34338 76 Mon May 13 12:23:07 EDT 2019: 74156 137352 - 4544 34338 76 ...... Mon May 13 12:23:15 EDT 2019: 74156 137352 - 4544 34338 76 Mon May 13 12:23:17 EDT 2019: 74156 137352 - 4544 34338 76 ...... Mon May 13 12:25:03 EDT 2019: 74156 137352 - 4544 34338 76 Mon May 13 12:25:04 EDT 2019: 74156 137352 - 4544 34338 76 Mon May 13 12:25:05 EDT 2019: 74156 137352 - 4544 34338 76 [root@host-8-245-164 ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2050 |
Description of problem: If a user accesses a sec=krb5 mount, but does not have valid credentials, rpc.gssd RSS increases with each attempt to access the mount. Version-Release number of selected component (if applicable): nfs-utils-1.3.0-0.59.el7.x86_64 kernel-3.10.0-693.11.6.el7.x86_64 How reproducible: easy Steps to Reproduce: # mount vm1:/exports /mnt/vm1 -o vers=4.1,sec=krb5 # su - user1 [user1@vm2 ~]$ while true ; do ls /mnt/vm1 ; done ls: cannot access /mnt/vm1: Permission denied ls: cannot access /mnt/vm1: Permission denied ls: cannot access /mnt/vm1: Permission denied ... Actual results: # while true ; do echo "$(date): $(ps h -C rpc.gssd -o size,vsize,share,rss,sz,trs)" ; sleep 1 ; done Thu Aug 16 08:08:09 CDT 2018: 408 42384 - 1320 10596 76 Thu Aug 16 08:08:10 CDT 2018: 408 42384 - 1320 10596 76 Thu Aug 16 08:08:11 CDT 2018: 408 42384 - 1320 10596 76 Thu Aug 16 08:08:12 CDT 2018: 408 42384 - 1320 10596 76 Thu Aug 16 08:08:13 CDT 2018: 408 42384 - 1320 10596 76 Thu Aug 16 08:08:14 CDT 2018: 408 42384 - 1320 10596 76 *** 'ls' loop started Thu Aug 16 08:08:15 CDT 2018: 74140 116116 - 1996 29029 76 Thu Aug 16 08:08:16 CDT 2018: 74140 116116 - 2072 29029 76 Thu Aug 16 08:08:18 CDT 2018: 74140 116116 - 2156 29029 76 Thu Aug 16 08:08:19 CDT 2018: 74140 116116 - 2240 29029 76 Thu Aug 16 08:08:20 CDT 2018: 74140 116116 - 2324 29029 76 Thu Aug 16 08:08:21 CDT 2018: 74140 116116 - 2392 29029 76 Thu Aug 16 08:08:22 CDT 2018: 74140 116116 - 2468 29029 76 Thu Aug 16 08:08:23 CDT 2018: 74140 116116 - 2552 29029 76 ... Thu Aug 16 08:12:24 CDT 2018: 74140 116116 - 15852 29029 76 Thu Aug 16 08:12:25 CDT 2018: 74140 116116 - 15920 29029 76 Thu Aug 16 08:12:26 CDT 2018: 74140 116116 - 15988 29029 76 ... Thu Aug 16 08:28:40 CDT 2018: 288268 330244 - 83300 82561 76 Thu Aug 16 08:28:41 CDT 2018: 288268 330244 - 83388 82561 76 Thu Aug 16 08:28:42 CDT 2018: 288268 330244 - 83472 82561 76 Thu Aug 16 08:28:43 CDT 2018: 288268 330244 - 83560 82561 76 Thu Aug 16 08:28:44 CDT 2018: 288268 330244 - 83644 82561 76 *** 'ls' loop stopped Thu Aug 16 08:28:45 CDT 2018: 288268 330244 - 83704 82561 76 Thu Aug 16 08:28:46 CDT 2018: 288268 330244 - 83704 82561 76 ^^^^ RSS Expected results: memory usage remains constant, or at least does not continue increasing Additional info: memory growth does not occur when rpc.gssd started directly; only when started through systemd: rpc.gssd started directly from command line: Thu Aug 16 08:30:48 CDT 2018: 408 42384 - 1308 10596 76 Thu Aug 16 08:30:49 CDT 2018: 408 42384 - 1308 10596 76 Thu Aug 16 08:30:50 CDT 2018: 408 42384 - 1308 10596 76 Thu Aug 16 08:30:51 CDT 2018: 408 42384 - 1308 10596 76 Thu Aug 16 08:30:52 CDT 2018: 74140 116116 - 1788 29029 76 Thu Aug 16 08:30:53 CDT 2018: 74140 116116 - 1788 29029 76 Thu Aug 16 08:30:54 CDT 2018: 74140 116116 - 1788 29029 76 Thu Aug 16 08:30:55 CDT 2018: 74140 116116 - 1788 29029 76 ... Thu Aug 16 08:33:05 CDT 2018: 74140 116116 - 1788 29029 76 Thu Aug 16 08:33:06 CDT 2018: 74140 116116 - 1788 29029 76 Thu Aug 16 08:33:07 CDT 2018: 74140 116116 - 1788 29029 76 ^^^^ RSS memory grows when user process begins accessing the mount, but RSS does not continually increase