Description of problem: Dnscache process gets 100% CPU usage after some minutes running. top - 10:04:50 up 3 days, 19:25, 1 user, load average: 1.01, 1.04, 1.03 Tasks: 74 total, 2 running, 72 sleeping, 0 stopped, 0 zombie Cpu(s): 24.2%us, 43.5%sy, 0.0%ni, 31.9%id, 0.1%wa, 0.0%hi, 0.0%si, 0.2%st Mem: 501240k total, 422808k used, 78432k free, 138028k buffers Swap: 2064376k total, 0k used, 2064376k free, 181440k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1335 daemon 20 0 9908 6308 524 R 99.0 1.3 2466:49 dnscache dns:~# strace -c -p 1335 Process 1335 attached - interrupt to quit ^CProcess 1335 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 63.23 0.092923 0 276703 poll 35.54 0.052223 0 278385 gettimeofday 0.24 0.000360 1 295 write 0.24 0.000354 3 106 sendto 0.21 0.000315 3 106 close 0.19 0.000276 3 106 bind 0.11 0.000167 1 147 recvmsg 0.07 0.000107 1 106 socket 0.04 0.000066 0 146 sendmsg 0.04 0.000063 1 106 connect 0.04 0.000059 0 212 fcntl 0.03 0.000037 0 147 stat 0.00 0.000000 0 106 recvfrom 0.00 0.000000 0 106 setsockopt ------ ----------- ----------- --------- --------- ---------------- 100.00 0.146950 556777 total I'm seeing too many gettimeofday calls Version-Release number of selected component (if applicable): ndjbdns-1.05.9-1.el6.x86_64 CentOS 6.5 How reproducible: Steps to Reproduce: 1. Restart dnscache 2. Wait a few minutes, the process receives 10-25 request/sec 3. dnscache process will set to 100% CPU usage Actual results: DNSCACHE process gets 100% CPU usage Expected results: dnscache process has low CPU usage with that number of requests. Additional info: I have downgraded a server to ndjbdns-1.05.8-1.el6.x86_64, and with that version I'm not seeing 100% CPU usage dns:~# strace -c -p 18682 Process 18682 attached - interrupt to quit ^CProcess 18682 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 86.52 0.033993 28 1216 poll 3.94 0.001549 1 1745 write 3.38 0.001327 0 8532 gettimeofday 0.90 0.000353 0 731 stat 0.86 0.000336 1 454 close 0.76 0.000300 1 454 connect 0.76 0.000299 0 755 sendmsg 0.76 0.000298 1 454 bind 0.54 0.000214 0 731 recvmsg 0.46 0.000179 0 454 socket 0.42 0.000164 0 425 recvfrom 0.28 0.000111 0 454 sendto 0.28 0.000110 0 908 fcntl 0.14 0.000056 0 454 setsockopt 0.00 0.000000 0 1 restart_syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.039289 17768 total
Hello Francisco, Thank you for reporting this issue. Though it is not easily reproducible on my machine. Is there any specific setting/condition that needs to occur for it to hit the 99% cpu usage level? 'gettimeofday' call are made when processing new requests or when request times out. I'll try to reproduce it to debug more. Thank you.
I'm able to reproduce it now. It seems to happen over a slow or saturated(~40% packet loss) connection, with MERGEQUERIES enabled(ie. non-null).
Hi, I don't see any error in the interface, but it's a virtualized server over XenServer, and my clients are over GPRS and 3G networks so they could be slow. UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:16719340 errors:0 dropped:0 overruns:0 frame:0 TX packets:1087759 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5936484279 (5.5 GiB) TX bytes:142185683 (135.5 MiB) Interrupt:247 This is my dnscache.conf, and yes MERGEQUERIES is enabled: DATALIMIT=8000000 CACHESIZE=5000000 IP=10.2.19.81 IPSEND=0.0.0.0 UID=2 GID=2 ROOT=/etc/ndjbdns HIDETTL= FORWARDONLY= MERGEQUERIES=1 DEBUG_LEVEL=1 I'm going to test disabling MERGEQUERIES... Regards,
Setting MERGEQUERIES=0 doesn't help. After a few minutes dnscache CPU is 100% again. Removing the line MERGEQUERIES helps, I'm not seeing high CPU usage. So, it seems to be a change between 1.05.8 and 1.05.9 related with MERGEQUERIES.
(In reply to Francisco Miguel Biete from comment #4) > Setting MERGEQUERIES=0 doesn't help. After a few minutes dnscache CPU is > 100% again. Setting it to 0 would not disable it, but not setting any value at all would. Ie. it needs to be NULL to disable it. MERGEQUERIES= > Removing the line MERGEQUERIES helps, I'm not seeing high CPU usage. Right. It needs to be NULL to disable it. > So, it seems to be a change between 1.05.8 and 1.05.9 related with > MERGEQUERIES. I think the issue is with the MERGEQUERIES function. I'm debugging it more to see how to fix it best. Thank you so much for the update and for confirming the issue. Thank you.
Seeing this issue here as well with ndjbdns 1.06: after a brief period of time dnscache starts using 100% of a cpu and won't come down. Initially I thought it was from cache eviction, but DEBUG=3 confirmed we aren't using all the cache. Disabling MERGEQUERIES seems to have fixed it.
I have submitted a patch to the ndjbdns project on GitHub that may resolve this issue. It can be found at https://github.com/pjps/ndjbdns/pull/29.
Hello Tim, (In reply to Tim Stewart from comment #7) > I have submitted a patch to the ndjbdns project on GitHub that may resolve > this issue. It can be found at https://github.com/pjps/ndjbdns/pull/29. Great! Thank you so much for the patch. I'll pull and merge it soon. Thank you.
Proposed patch(above) from Tim has been merged upstream -> https://github.com/pjps/ndjbdns/commit/73527fbe28c6e18229b28f9d437be0ab5960c21f
Package has been retired, project is unmaintained since 2017.