Red Hat Bugzilla – Bug 855357
cat /proc/pid/num_maps seems to lock processes while generating it's data which is disruptive
Last modified: 2014-06-02 09:22:31 EDT
Description of problem:
We run database (mysqld) servers with quite a large amount of memory (192GB) and have been having problems with accessing /proc/pid/numa_maps interfering with the mysqld process that was being "monitored".
Version-Release number of selected component (if applicable):
[ Running CentOS but reporting upstream. ]
# cat /etc/redhat-release
CentOS release 5.6 (Final)
Linux my-hostname 2.6.18-238.el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
Completely. Several severs were affected by this.
Sorry I don't have a better test case but this is what we see:
Steps to Reproduce:
1. Start mysqld with innodb_buffer_size = 160G (on a 192GB box)
2. Have a process doing continual connects (with a 1 second timeout) to the mysqld server
3. Run numa-maps-summary.pl < /proc/<mysqld_pid>/numa_maps
Observe how this generates connect timeouts while the numa-maps script is running, and not while it isn't.
Ideally cat'ing the /proc/pid/numa_maps file should not block, and thus the pid of which the /proc/pid/numa_maps file refers to should not be blocked. While this may be expected kernel behaviour, reading the numa_maps information can be important for debugging memory usage and if this process is disruptive that prevents it being used.
This is not a mysqld bug as the issue only occurs when the proc file is being accessed. My guess is that doing the cat of the numa_maps file generates dynamically the required information and while doing so locks the relevant process. This might not be noticed normally but with the short connect timeout this is quite disruptive.
I dont think cat'ing the /proc/pid/numa_maps locks the process being inspected for long periods of time. It does and must however take the mm->page_table_lock while its looking at the pages mapped into the vma for each region so it cant change underneith while its walking. I dont have much of a reproducer to see this problem happening, can you come upp with something that stand-alone? Also, RHEL6 looks pretty siomilar to the upstream kernell in this area, are you seeing this problem with the upstream kernel as well?
My guess is the problem is simply caused by the page walk of the ~170GB process taking longer than the 2-second connect timeout configured on the mysql client.
That said, considering that the mysql connects are normally accepted in ms the change is intrusive. I'd like to be able to read the numa maps memory layout in a way which does not have this side affect.
Let me see if I can reproduce this on a similarly configured CentOS 6.2 server.
And at the same time I'll get a large/~256GB system, write a program that maps most of the memory and time a cat /proc/<pid>/numa_maps of that process and if the time is excessive evaluate where its hanging out. I just dont know what can be done about it since there is locking requirements involved.
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).