Description of problem: The default values of /proc/sys/net/ipv4/udp_mem are calculated from available physical memory without taking into consideration the hugepages= value on the boot line. This can lead to system lockups when large numbers of UDP sockets with large maximum socket buffers are active as network buffers can consume all available physical memory. This happened to one of our systems and we are now configuring this setting via /etc/sysctl.conf. Version-Release number of selected component (if applicable): 2.6.18-194.8.1.el5 2.6.18-238.5.1.el5 How reproducible: Configure a system to boot with hugepages= set to half of physical memory. Observe the values in /proc/sys/net/ipv4/udp_mem and notice that they do not take account of the fact that half of physical memory is unavailable. Additional info: That hugepages can be increased and decreased after boot can cause further trouble, but in practice it's usually best to set hugepages= in the boot parameters and most administrators probably do it this way. The tunable should take the boot value into consideration, but probably not track changes made to the huge page allocation after boot. The /proc/sys/net/ipv4/tcp_mem value does not appear to have as aggressive a default as the UDP value and does not appear to be affected by this issue.
Problem is worse than it looks. Turns out that differential 'slab' memory consumption per /proc/meminfo is double the apparent socket memory consumption as reported by netstat -nau | awk '/^udp/ {t+= $2} END {print t}' so even without hugepages= in the mix, the default 'udp_mem' limit can easily lead to a frozen system. The commit limit reported by /proc/meminfo appears to be a good starting point for setting 'udp_mem'. Now taking this value, subtracting 1GB from it, dividing this by two and using that for 'udp_mem'. Of course one has to divide by 4096 for the final value. The "min" and "pressure" settings appear to have no obvious effect. Setting them both to 7/8 of the max just so something rational appears in these buckets.
Do you have an example of this in action? Just because those values don't scale back if you allocate lots of hugepage memory doesn't imply an OOM condition. UDP streams will simply drop frames if additional ram cannot be allocated to hold incomming data buffers. Can you please provide a sysrq-t and sysrq-m output (or better still a vmcore), of a system that was hung under the described circumstances?
Locking up a system was how I found this--was not an academic exercise. It requires that all customers running our application tune this parameter. Forgot to reference the upstream report I submitted. The kernel developers created a fix for this some time ago. Putting the link in the referenced-bug section.
I understand that locking up the system was how you found this, I'm asking for the notes from that lock up so we can see if theres a better fix for the problem, other than just running away from it, which is all that patch you reference does. Its still entirely possible to exhaust system memory with that change. you just have to allocate hugepages on an idle system after tcp/udp/etc initalize during boot (or do any of the other things that Eric suggested in the thread accompanying the change: http://marc.info/?l=linux-mm&m=131001118631770&w=2 Its really too late in the RHEL5 life cycle to tweak system wide defaults without a very clear problem and a correspondingly clear fix. This doesn't really fit the bill. If you can demonstrate what the lockup was however, we might be able to manage fixing that more directly.
The problem is fixed easily enough by tuning 'udp_mem'. No other way to solve the issue as we have about a thousand UDP sockets that all must have large socket buffer allocations, so memory is potentially oversubscribed. In extreme high-load scenarios where the hardware under-provisioned (an unfortunate reality) lockups have occurred in production. The point of the bug report is that the default value of 'udp_mem' is way too high (i.e. useless) and leaves systems vulnerable to lockups when certain types of loads are configured. Perhaps put the change in 6 if it's not there already. We don't need an official fix. Submitted this as a courtesy.
Ok, well, thank you for the courtesy, but what is too high for you is potentially not high enough for other existing users, and as such adjusting defaults for your environment may lead to regressions for other users. since the fix in either case is to appropriately tune system resources, lets just leave it all alone