Bug 1028176 - ntpd does not update dstadr when the routing table changes
ntpd does not update dstadr when the routing table changes
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: ntp (Show other bugs)
19
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Miroslav Lichvar
Fedora Extras Quality Assurance
:
Depends On:
Blocks: 1048132
  Show dependency treegraph
 
Reported: 2013-11-07 15:32 EST by Andrew J. Schorr
Modified: 2014-01-03 18:23 EST (History)
3 users (show)

See Also:
Fixed In Version: ntp-4.2.6p5-18.fc20
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1048132 (view as bug list)
Environment:
Last Closed: 2014-01-03 18:23:23 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Network Time Protocol 2506 None None None Never

  None (edit)
Description Andrew J. Schorr 2013-11-07 15:32:59 EST
Description of problem: We are running keepalived to implement VRRP on some servers.  When a secondary VRRP address is removed, ntpd gets confused and loses contact with its peers because it sets the wrong interface address for communicating with the peers.


Version-Release number of selected component (if applicable):
ntp-4.2.6p5-11.fc19.x86_64


How reproducible: Start ntpd, then remove a secondary IP address and watch it set the "dstaddr" to the wrong value.  This can be seen using the ntpq "pstatus" command.  Here is an excerpt from the journalctl log:

Nov 06 17:06:17 ti101 ntpd[1301]: Deleting interface #6 lan0.1009, 192.168.99.1#123, interface stats: received=0, sent=0, dropped=0, active_time=3 secs
Nov 06 17:06:17 ti101 ntpd[1301]: Deleting interface #4 lan0.1004, 50.74.236.198#123, interface stats: received=0, sent=0, dropped=0, active_time=3 secs
Nov 06 17:06:17 ti101 ntpd[1301]: 192.168.79.25 interface 192.168.99.101 -> (none)
Nov 06 17:06:17 ti101 ntpd[1301]: 192.168.79.27 interface 192.168.99.101 -> (none)
Nov 06 17:06:17 ti101 ntpd[1301]: 192.168.59.29 interface 192.168.99.101 -> (none)
Nov 06 17:06:17 ti101 ntpd[1301]: 192.168.59.30 interface 192.168.99.101 -> (none)
Nov 06 17:06:17 ti101 ntpd[1301]: peers refreshed

In ntpd/ntp_io.c, the update_interfaces() function calls remove_interface and then set_peerdstadr() for all affected peers.  The dstadr address is then sometimes set to (none) and sometimes to the address of another interface that is incorrect for communicating with the peer.  The association then goes bad, and we see this output from ntpq:

id      st t when poll reach   delay   offset  jitter
==============================================================================
 ti25            .XFAC.          16 u    - 1024    0    0.000    0.000   0.000
 ti27            .XFAC.          16 u    - 1024    0    0.000    0.000   0.000
 ti29            .XFAC.          16 u    - 1024    0    0.000    0.000   0.000
 ti30            .XFAC.          16 u    - 1024    0    0.000    0.000   0.000



Steps to Reproduce:
1. Use keepalived to run VRRP, or add a secondary IP address manually.
2. Start ntpd
3. Remove secondary IP address either manually or by causing this host to become the VRRP backup.

Actual results: Peer associations are lost.


Expected results: Peer associations should not be lost.


Additional info:
Comment 1 Andrew J. Schorr 2013-11-12 10:42:57 EST
It is no longer clear to me that this bug is related to the presence of secondary IP addresses on a given interface.  I think there may be a more general bug where ntpd chooses the wrong dstadr address whenever a system has more than one IP address.  I run ntpd on several routers with multiple network interfaces, and it often seems to choose the wrong dstadr value (as shown by the "lopeers" command).  In other words, it picks the wrong interface.  Restarting ntpd seems to fix the problem.  I guess that maybe it is choosing the dstadr before the ospf routing software has populated the routing table.  I suspect that the logic in ntpd chooses the dstadr value by examining the routing table, I guess by doing the equivalent of an "ip route get".

So I think the bug is that ntpd does not fix the dstadr when the routing table changes.
Comment 2 Miroslav Lichvar 2013-11-12 11:27:02 EST
It looks like it could be related to ntp-4.2.6p4-rtnetlink.patch from the ntp package. Can you please try this test package, which disables that patch, and see if you can still reproduce the problem?

http://koji.fedoraproject.org/koji/taskinfo?taskID=6170678

If that doesn't help, could you try it with the latest upstream development snapshot from http://www.ntp.org/downloads.html?
Comment 3 Andrew J. Schorr 2013-11-12 14:14:29 EST
The test package did not help.  It still chose the wrong interface.
I will see if I can package upstream.
Comment 4 Andrew J. Schorr 2013-11-12 15:45:21 EST
I downloaded ntp-dev-4.2.7p395.tar.gz.  First, I tried to
build it using the spec file from ntp-4.2.6p5-17.fc19.1.src.rpm.  I removed
all the patches that did not apply.  But this build failed here:

gcc -DHAVE_CONFIG_H -I. -I..  -I../include -I../lib/isc/include -I../lib/isc/pthreads/include -I../lib/isc/unix/include    -ffunction-sections -fdata-sections -Wall -Wcast-align -Wcast-qual -Wmissing-prototypes -Wpointer-arith -Wshadow -Winit-self -Wstrict-overflow   -Wno-strict-prototypes -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -fPIE -fno-strict-aliasing -fno-strict-overflow -c refclock_acts.c
ntp_signd.c: In function 'write_all':
ntp_signd.c:69:13: warning: cast discards '__attribute__((const))' qualifier from pointer target type [-Wcast-qual]
   buf = n + (char *)buf;
             ^
In file included from cmd_args.c:13:0:
ntpd-opts.h:59:3: error: #error option template version mismatches autoopts/options.h header
 # error option template version mismatches autoopts/options.h header
   ^
ntpd-opts.h:60:3: error: unknown type name 'Choke'
   Choke Me.
   ^
ntpd-opts.h:60:11: error: expected '=', ',', ';', 'asm' or '__attribute__' before '.' token
   Choke Me.


Then I tried to build it without using rpmbuild.
I copied the "configure" command generated by rpmbuild, like so:

bash$ sed -i 's|$CFLAGS -Wstrict-overflow|$CFLAGS|' configure sntp/configure
bash$ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --sysconfdir=/etc/ntp/crypto --with-openssl-libdir=/usr/lib64 --without-ntpsnmpd --enable-all-clocks --enable-parse-clocks --enable-ntp-signd=/var/run/ntp_signd --disable-local-libopts

But the make fails here:

gcc -DHAVE_CONFIG_H -I. -I..  -I../include -I../lib/isc/include -I../lib/isc/pthreads/include -I../lib/isc/unix/include    -ffunction-sections -fdata-sections -Wall -Wcast-align -Wcast-qual -Wmissing-prototypes -Wpointer-arith -Wshadow -Winit-self -Wstrict-overflow   -Wno-strict-prototypes -g -O2 -c cmd_args.c
In file included from cmd_args.c:13:0:
ntpd-opts.h:59:3: error: #error option template version mismatches autoopts/options.h header
 # error option template version mismatches autoopts/options.h header
   ^
ntpd-opts.h:60:3: error: unknown type name â
   Choke Me.
   ^
ntpd-opts.h:60:11: error: expected â, â, â, â or â before â token
   Choke Me.
           ^
So I then tried a vanilla "./configure && make".  That works.
I copied the ntpd/ntpd binary to /usr/sbin/ntpd and rebooted.  But bad news: the dev snapshot has the same bug.

I'm not sure where that leaves us.  I guess there's a bug in the source.
Comment 5 Miroslav Lichvar 2013-11-13 03:39:35 EST
Ok, thanks for the information.

As the bug is present also in the upstream sources, it should be reported in the upstream bugzilla (https://bugs.ntp.org/). It would be better if you reported it so you can provide feedback there, but if you want I can copy your report from this bug.
Comment 6 Andrew J. Schorr 2013-11-13 10:43:02 EST
Upstream bug opened here:

http://bugs.ntp.org/show_bug.cgi?id=2504
Comment 7 Andrew J. Schorr 2013-11-17 20:16:00 EST
The upstream bug is now # 2506 found here:

http://bugs.ntp.org/show_bug.cgi?id=2506

Miroslav -- can you please grab those 2 patches for the Fedora rpm?  That should fix the problem.  It seems to be working for me.

But I guess there may really need to be more discussion of ntp bug # 992 to come to a conclusion on this.  It is certainly necessary to listen for IPv4 routing updates.  I don't know enough about IPv6 to have an opinion on that aspect...

Thanks,
Andy
Comment 8 Andrew J. Schorr 2013-11-18 16:34:37 EST
Just to be clear, I think the Fedora patch ntp-4.2.6p4-rtnetlink.patch is a problem.  That tells ntpd to ignore changes in the routing table.  Can you please consider removing that patch or modifying it to allow ntpd to monitor at least RTMGRP_IPV4_ROUTE?

It is also confusingly labeled.  It says in the spec file:

# ntpbz #992
Patch8: ntp-4.2.6p4-rtnetlink.patch

But ntp bug #992 was really about adding netlink support, not removing monitoring of routing updates.  If you think this belongs upstream, shouldn't a new bug be opened?

Thanks,
Andy
Comment 9 Miroslav Lichvar 2013-11-19 08:18:39 EST
I'll remove the rtnetlink patch and include the new upstream patch once it's in ntp-dev. I kept that part of the rtnetlink patch to avoid what I thought were unnecessary updates. Adding back only IPv4 route updates seems wrong to me, so I'll just drop it completely.

Thanks.
Comment 10 Fedora Update System 2013-12-09 11:41:56 EST
ntp-4.2.6p5-18.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/ntp-4.2.6p5-18.fc20
Comment 11 Fedora Update System 2013-12-09 15:26:28 EST
Package ntp-4.2.6p5-18.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing ntp-4.2.6p5-18.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-23048/ntp-4.2.6p5-18.fc20
then log in and leave karma (feedback).
Comment 12 Fedora Update System 2014-01-03 18:23:23 EST
ntp-4.2.6p5-18.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.