Bug 1462081 - /etc/NetworkManager/dispatcher.d/20-chrony is executed result in the source force to offline which is not expected.
/etc/NetworkManager/dispatcher.d/20-chrony is executed result in the source f...
Status: ON_QA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: chrony (Show other bugs)
7.1
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Miroslav Lichvar
qe-baseos-daemons
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-16 02:54 EDT by mezhang
Modified: 2017-09-19 10:50 EDT (History)
1 user (show)

See Also:
Fixed In Version: chrony-3.2-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description mezhang 2017-06-16 02:54:22 EDT
[issue]

When I set one of NICs down(the other NIC can connect to the chrony source), 
the /etc/NetworkManager/dispatcher.d/20-chrony is executed, result in the source is force to offline.

[version]

RHEL7.1
chrony-2.4-4.fc24.x86_64

[Env]

chrony server:

two NICs with different network.

~~~
inet 192.168.122.131/24 brd 192.168.122.255 scope global eth0
   inet 192.168.100.189/24 brd 192.168.100.255 scope global bond2

# ip r
192.168.100.0/24 dev bond2  proto kernel  scope link  src 192.168.100.189  metric 300 
192.168.122.0/24 dev eth0  proto kernel  scope link  src 192.168.122.131  metric 100
~~~

chrony source's IP: 192.168.122.111.

eth0's network is the same as source's network, so the default is unnecessary. 

How reproducible:

Steps to Reproduce:

1. check the status
~~~
[root@localhost ~]# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* r511
                         11   6    77    41  -8410ms[ -37.8s] +/-   13ms
~~~

2. make the bond2 down
# ifdown bond2

2. check the status

~~~
[root@localhost ~]# chronyc activity
200 OK
0 sources online
1 sources offline  <---  the source is offline due to the down bond2.
0 sources doing burst (return to online)
0 sources doing burst (return to offline)
0 sources with unknown address
[root@localhost ~]# chronyc sources

210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* r511                         11   6   177   446   +221ms[-8410ms] +/-   13ms  
 
 ^
this result tell us that the sync is normal.
~~~

3. change the source time, so we can check the source is offline or not. I got that result.

~~~
[root@localhost ~]# chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* r511                         11   6   177  1018   +221ms[-8410ms] +/-   13ms
~~~

The "Reach" is not get bigger, and the "LastRx" is getting bigger. The time is not be synced.

Actual results:

Making Unrelated NIC down, cause the source offline. 
 
Expected results:

I still have an NIC that can connect to the source, when I set the Unrelated NIC to down. 
The source shouldn't get offline.

I don't really understand why this script should be executed.

Can anyone give some advice.

Thanks 

mengyi
Comment 2 Miroslav Lichvar 2017-06-16 03:25:36 EDT
The chrony dispatcher script (incorrectly) assumes that a default route is needed to reach all NTP sources. That works in most cases, but may fail with local NTP servers.

The simple solution would be to keep all sources in the online state. This would slow down resynchronization of sources that are not reachable when the network is down.

A better solution would probably be to check each source separately if it isn't in a local network. I'm not sure how feasible that would be in a shell script.
Comment 4 Miroslav Lichvar 2017-08-09 04:47:03 EDT
A fix for this issue was included in upstream git:

https://git.tuxfamily.org/chrony/chrony.git/commit/?id=ae82bbbacecec56e1f893fd038ca10323f8760f0

Note You need to log in before you can comment on or make changes to this bug.