Bug 575874
Summary: | ntpd can die due to ulimit on 64bit arch | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Martin Poole <mpoole> | ||||||||
Component: | ntp | Assignee: | Miroslav Lichvar <mlichvar> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | qe-baseos-daemons | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 5.4 | CC: | aaron.grewell, azelinka, dswegen, dushy2010, edoutreleau, jbastian, jwest, mcermak, ovasik, pveiga, rdassen, tao | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: |
The ntpd daemon could terminate unexpectedly due to a low memory lock limit. With this update, the memory lock limit has been doubled.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2011-07-21 06:42:46 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Martin Poole
2010-03-22 15:46:53 UTC
The memory locking is enabled only when -m option is used. This was fixed in Fedora by doubling the limit. Actually in the 4.2.2p1 version memory is locked unconditionally. It might be useful to backport the option as running ntpd in locked memory is rarely needed. This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Any chance we can get an update released for this? Given the proliferation of x86_64 hosts it's only becoming a bigger problem as time goes on. This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. Hi folks, Could you let us know the version in which the fix would be provided? Or if fix for this bugzilla is already available in some version, please provide the version. Thanks, Dushyant Created attachment 474441 [details]
increase memlock rlimit
*** Bug 672571 has been marked as a duplicate of this bug. *** Hi Miroslav, Adding the patch hasn't solved the issue, ntpd still dies with the following errors: 15 Feb 16:00:44 ntpd[6400]: synchronized to 172.22.32.6, stratum 3 15 Feb 16:40:44 ntpd[6400]: Exiting: No more memory! 15 Feb 17:23:45 ntpd[4382]: Attemping to register mDNS 15 Feb 17:23:45 ntpd[4382]: Unable to register mDNS 15 Feb 17:23:45 ntpd[4384]: logging to file /localdisk1/var/log/ntpd.log Is there anything else that could be done to solve the issue? If it wasn't already clear, we're using an x86_64 cluster. Thanks, Dushyant What values do you see in /proc/`pidof ntpd`/status in the VmSize field? Is it increasing over time? Created attachment 483227 [details]
ntpd.log during the stage when it dies
I've attached a few lines of the ntpd.log during the stage it dies.
I notice that for each scenario in ntpd.log where the ntpd dies
on "Exiting: No more memory" then in the preceeding lines, we see the following message
"ntp_io:estimated max descriptors: 1024, initial socket boundary: 16".
Comparing to the scenarios where the ntpd is restarted and does not fail - we can see that the same "ntp_io" line is different - the new healthy daemon has this contents
"ntp_io: estimated max descriptors: 65536, initial socket boundary: 16"
Is there something fishy about why before it dies, the estimated max descriptors is 1024 and later at 65536 when it is restarted?
The "estimated max descriptors" number is result from the getdtablesize() glibc call and it's not used in ntpd for anything other than printing in the log message. What VmSize values do you see, are they close to 64MB? The VmSize values are not close to 64MB yet, about 39-40MB right now. The ulimit is not staying persistent at 65536 on reboots. The patch that you had attached, I believe doesn't work if the system is rebooted b'cos after reboot, the max descriptors/ulimit goes back to 1024 which leads to Exiting: No more memory! and ntpd daemon dying in a while. I tried changing a parameter in /etc/security/limits.conf as follows : soft nofile 65536 But after reboot, the value changes to 1024 again in ntp but ulimit shows correct value of 65536. Could you let me know from which configuration file ntp(getdtablesize() function) reads the value 1024 and where we could modify to keep the value of open file descriptors persistent across reboots so that ntp shows correct/modified value. The maximum number of descriptors shouldn't be related to the memory lock limit. ntpd needs only few descriptors, one for each local address and maybe few for logs, etc. How exactly is the ntpd service restarted? Is it possible that a different ntpd binary is running after reboot? ntpd daemon is started automatically once the system is rebooted. We observed that the max descriptors that ntp takes after reboot is 1024. After some time we get "Exiting: No more memory" and ntpd dies. We restarted ntp daemon using 'service ntpd restart' . After the restart the max descriptors was changed to 65536. Can you please attach your ntp.conf? Created attachment 486964 [details]
ntp.conf
ntp.conf file attached
dushy2010, it is probably a different limit than memlock which is causing the problem. It seems the limits are different when the service is started on boot and when from root shell. Are there any modifications in /etc/securit/limits.conf or any ulimit calls in shell configuration files such as /etc/profile? Can you please post the content of /proc/`pidof ntpd`/limits when ntpd is started on boot and when it's restarted? [root@cu1admin1 ~]# ps aux|grep ntp ntp 4384 0.0 0.0 32112 7580 ? SLs Feb15 4:14 ntpd -u ntp:ntp -p /var/run/ntpd.pid -x -l /localdisk1/var/log/ntpd.log root 5711 0.0 0.0 61148 788 pts/190 S+ 07:12 0:00 grep ntp [root@cu1admin1 ~]# cat /proc/4384/status Name: ntpd State: S (sleeping) SleepAVG: 98% Tgid: 4384 Pid: 4384 PPid: 1 TracerPid: 0 Uid: 38 38 38 38 Gid: 38 38 38 38 FDSize: 64 Groups: 38 VmPeak: 32132 kB VmSize: 32112 kB VmLck: 32112 kB VmHWM: 7600 kB VmRSS: 7580 kB VmData: 2996 kB VmStk: 84 kB VmExe: 460 kB VmLib: 3800 kB VmPTE: 84 kB StaBrk: 2b2d56d3d000 kB Brk: 2b2d56d80000 kB StaStk: 7fffbfe6ee30 kB Threads: 1 SigQ: 0/4096 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 0000000180006a47 CapInh: 0000000002000000 CapPrm: 0000000002000000 CapEff: 0000000002000000 Cpus_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000fff Mems_allowed: 00000000,00000003 [root@cu1admin1 ~]# This output is taken from a stable "restarted" ntp daemon. I looked at the collectl logs for shortly after this daemon was started manually - it showed 31MB VSZ and 7MB RSZ on 15th Feb, and collectl today shows the same quantities. I don't believe there is an increase over time. But remember that I mentioned that this "restarted" daemon always seems stable and on retart of the server, it eventually dies. So - looking at the collectl data to understand the memory sizing of the original init.d daemon started automatically during boot of cu0admin1, then I see that it began life after system boot with 39MB VSZ and 15MB VSZ. It continued to have that same sizing until eventually it died. btw looking elsewhere, 39MB/15MB seems to be standard sizing for an ntp daemon started by init.d Whereas 31MB/7MB seems to be standard sizing for an ntp daemon started manually by root. And the difference in /proc/`pidof ntpd`/limits? According to the ntpd log, at least the maximum number of descriptors should be different, 1024 and 65536. Before even applying your patch on a fresh server, please have a look at the limits: # cat /proc/`pidof ntpd`/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 204800 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 36351 36351 processes Max open files 1024 1024 files Max locked memory 33554432 33554432 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 36351 36351 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 After applying your patch, have a look at limits: # cat /proc/`pidof ntpd`/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 204800 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 36351 36351 processes Max open files 1024 8192 files Max locked memory 33554432 33554432 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 36351 36351 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 As you can see the open files changes to 8192. And after restart of the server, please have a look at the limits: # cat /proc/`pidof ntpd`/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 204800 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 36351 36351 processes Max open files 1024 1024 files Max locked memory 33554432 33554432 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 36351 36351 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 So, there's something wrong that's happening after a machine is restarted and the value that the patch changes is not persistent. May be changing the value of of ulimit inside the init.d/ntpd script would permanently keep ulimit at 8192/65536 or whatever higher value than 1024. That's odd. According to the ntpd log, max open files should be 65536, but your output shows only 1024 (the soft limit is the one enforced by kernel, hard limit is just the allowed maximum for soft limit). Also, with the patch applied max locked memory should 64MB, not 32MB. This is the output I get here, and it doesn't change after manually restarting the service. Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 204800 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 4095 4095 processes Max open files 1024 1024 files Max locked memory 67108864 67108864 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 4095 4095 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 For the moment this is what I've done: ---------------------------------------------------------------------------------- # diff -ruN /etc/init.d/ntpd.orig /etc/init.d/ntpd --- /etc/init.d/ntpd.orig 2011-03-10 14:52:44.000000000 +0530 +++ /etc/init.d/ntpd 2011-03-10 14:54:12.000000000 +0530 @@ -91,7 +91,8 @@ [ "$NETWORKING" = "no" ] && exit 1 readconf; - + # Modifying ulimit value so ntp takes 65536 as max descriptors + ulimit -n 65536 if [ -n "$dostep" ]; then echo -n $"$prog: Synchronizing with time server: " /usr/sbin/ntpdate $dropstr -s -b $NTPDATE_OPTIONS $tickers &>/dev/null ---------------------------------------------------------------------------------- Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The ntpd daemon could terminate unexpectedly due to a low memory lock limit. With this update, the memory lock limit has been doubled. Just a brief note: If anyone wonders this bug seems to already have been addressed for RHEL6 in ntp-4.2.4p8-2.el6. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0980.html |