Bug 1388759

Summary: Deferred name resolving of NTP servers fails
Product: Red Hat Enterprise Linux 7 Reporter: Frank Büttner <bugzilla>
Component: ntpAssignee: Miroslav Lichvar <mlichvar>
Status: CLOSED INSUFFICIENT_DATA QA Contact: qe-baseos-daemons
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.2CC: avaddara, bugzilla, kvolny, orion, thozza, va
Target Milestone: rcFlags: mlichvar: needinfo?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-16 16:43:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1477664    
Attachments:
Description Flags
The stack traced. none

Description Frank Büttner 2016-10-26 06:35:58 UTC
Description of problem:
ntp starts before the network is available.

Version-Release number of selected component (if applicable):
ntp-4.2.6p5-22

How reproducible:
Every time


Steps to Reproduce:
1. boot the server


Actual results:
The network is coming up after ntp starts.

Expected results:
That ntp starts after the network.

Additional info:
So the journal are full with:
ntpd_intres[3379]: recv() fails: No route to host
only an systemctl restart ntpd will stop the flood and get ntpd back to working.

temporary fix:
create /etc/systemd/system/ntpd.service.d/network.conf
with the content:
[Unit]
After=network.target syslog.target ntpdate.service sntp.service

Comment 2 Vipul Agarwal 2016-10-26 19:16:12 UTC
I ran into this problem. The ntpd service has no dependency on network or NetworkManger service. However, it does have dependency on ntpdate which further depends on network service. ntpdate is not enabled by default.

On boot up (with only ntpd enabled), the boot order is wrong with ntpd loads before network service: http://vagarwal.net/f/systemd-analyze_plot-problem.svg

If one enables ntpdate service, the boot order is fixed and problem is resolved:
http://vagarwal.net/f/systemd-analyze_plot-fixed.svg

Comment 3 Miroslav Lichvar 2016-10-27 11:50:38 UTC
The ntpd service doesn't wait for network, because ntpd can be useful without network (e.g. with reference clocks) and also to not delay restoring the frequency of the system clock from the driftfile.

I don't see any recv() errors reported by ntpd in my log, just deferred resolving of hostnames. Can you please post your ntp.conf?

Comment 4 Frank Büttner 2016-10-28 04:42:09 UTC
It will only contains this lines:

server foo.server
server bar.server
server xxx.server

Comment 8 Miroslav Lichvar 2017-04-13 14:45:55 UTC
(In reply to Frank Büttner from comment #0)
> So the journal are full with:
> ntpd_intres[3379]: recv() fails: No route to host

This error indicates the resolving process of ntpd is not able to connect to 127.0.0.1. Does the loopback interface have an unusual configuration, e.g. no IPv4 address?

Comment 9 Frank Büttner 2017-04-18 05:29:38 UTC
No, it looks normal:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever

Comment 10 Miroslav Lichvar 2017-04-18 11:18:17 UTC
Interesting. I'm assuming ntpd is configured to listen on all interfaces (which is the default). Does the following command work for you?

# ntpdc -c authinfo 127.0.0.1

Comment 11 Frank Büttner 2017-04-19 07:07:13 UTC
It will result in:
ntpdc -c authinfo 127.0.0.1
time since reset:     150
stored keys:          1
free keys:            11
key lookups:          9
keys not found:       1
uncached keys:        2
encryptions:          0
decryptions:          4
expired keys:         0

Comment 12 Miroslav Lichvar 2017-04-19 15:44:21 UTC
Hm, I'm running out of ideas. There is only one report for this error, so it's most likely something specific to the configuration of the machine.

I'm not sure how useful this will really be, but can you please consider running ntpd in strace and posting the logs? If you change the ExecStart line in the ntpd unit file to

/usr/bin/strace -ff -ttt -o /tmp/ntpd.strace /usr/sbin/ntpd -u ntp:ntp

the logs should be saved in /tmp/systemd-private-*-ntpd.service-*/tmp.

Comment 13 Frank Büttner 2017-04-20 05:27:19 UTC
Created attachment 1272853 [details]
The stack traced.

I have create the stack strace.
I hope it will help.

Comment 14 Miroslav Lichvar 2017-04-20 13:36:37 UTC
Thanks. Except the recv() error, I don't see anything wrong in the logs. The main ntpd process is bound to 127.0.0.1:123 as expected. The name resolving process is trying to send messages to that port, but for some reason it fails and the main process doesn't get any messages.

Maybe it is an issue with firewall configuration? If there was a rule using the owner match, it might explain why ntpq/ntpdc can connect to ntpd, but ntpd itself cannot. Does it work when firewall is disabled?

Comment 15 Frank Büttner 2017-04-21 09:02:03 UTC
firewalld is not used here.
Only iptables, an for the lo interface, all traffic is allowed.

Comment 17 Orion Poplawski 2017-06-30 19:47:32 UTC
Even ntpdate can fail because it can start before the network is up:

Jun 26 15:36:12 aspen ntpdate[641]: Can't find host ntp.cora.nwra.com: Name or service not known (-2)
Jun 26 15:36:23 aspen systemd: Reached target Network is Online.

It should have After=network-online.target instead of network.target, see https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

Comment 18 Orion Poplawski 2017-06-30 20:19:15 UTC
Ah, I see. /usr/libexec/ntpdate-wrapper keeps trying until the network is up.  Seems like that is just a work around though.  Sorry to hijack this report - thought it was more pertinent initially.

Comment 19 Tomáš Hozza 2018-11-16 16:43:52 UTC
Feel free to reopen with reproducer.