Description of problem: If you start frr with a certain set of daemons enabled in /etc/frr/daemons, and then you disable one or more by changing it from yes to no, running "systemctl frr reload" does not stop the daemons that were disabled. Version-Release number of selected component (if applicable): frr-7.5.1-3.fc34.x86_64 How reproducible: Always Steps to Reproduce: 1. Edit /etc/frr/daemons to set bgpd=yes and run "systemctl frr start" 2. Edit /etc/frr/daemons to change bgpd=no and run "systemctl frr reload" 3. Actual results: bgpd is still running Expected results: The reload should stop bgpd. Additional info: After the reload, one sees: # systemctl status frr ● frr.service - FRRouting Loaded: loaded (/usr/lib/systemd/system/frr.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2021-10-07 10:49:08 EDT; 32s ago Docs: https://frrouting.readthedocs.io/en/latest/setup.html Process: 17125 ExecStart=/usr/libexec/frr/frrinit.sh start (code=exited, status=0/SUCCESS) Process: 17204 ExecReload=/usr/libexec/frr/frrinit.sh reload (code=exited, status=0/SUCCESS) Status: "FRR Operational" Tasks: 13 (limit: 19004) Memory: 16.0M CPU: 1.895s CGroup: /system.slice/frr.service ├─17142 /usr/libexec/frr/zebra -d -F traditional -A 127.0.0.1 -s 90000000 ├─17149 /usr/libexec/frr/staticd -d -F traditional -A 127.0.0.1 ├─17180 /usr/libexec/frr/bgpd -d -F traditional -A 127.0.0.1 └─17216 /usr/libexec/frr/watchfrr -d -F traditional zebra staticd Oct 07 10:49:38 asny-nuc watchfrr[17172]: Terminating on signal Oct 07 10:49:38 asny-nuc frrinit.sh[17204]: Stopped watchfrr Oct 07 10:49:38 asny-nuc watchfrr[17216]: watchfrr 7.5.1 starting: vty@0 Oct 07 10:49:38 asny-nuc watchfrr[17216]: zebra state -> up : connect succeeded Oct 07 10:49:38 asny-nuc watchfrr[17216]: staticd state -> up : connect succeeded Oct 07 10:49:38 asny-nuc watchfrr[17216]: all daemons up, doing startup-complete notify Oct 07 10:49:38 asny-nuc frrinit.sh[17204]: Started watchfrr Oct 07 10:49:38 asny-nuc frrinit.sh[17219]: /usr/libexec/frr/frr-reload.py:805: SyntaxWarning: "is not" with a literal. D> Oct 07 10:49:38 asny-nuc frrinit.sh[17219]: if line is not "exit-vrf": Oct 07 10:49:39 asny-nuc systemd[1]: Reloaded FRRouting. This problem is also present in CentOS 8. It seems that watchfrr cares only about the daemons specified on its command-line, so it is blissfully unaware that bgpd is still running. The logic in the wrapper script should probably be updated to make sure that daemons set to "no" are stopped.
Hi Andrew, I think you misunderstand the reload script in FRR. As is stated in the documentation: "Reloading applies the differential between on-disk configuration and the current effective configuration of running FRR processes. This includes starting daemons that were previously stopped and any changes made to individual or unified daemon configuration files." and "Currently there is no way to stop or restart an individual daemon. This is because FRR’s monitoring program cannot currently distinguish between a crashed / killed daemon versus one that has been intentionally stopped or restarted. The closest that can be achieved is to remove all configuration for the daemon, and set its line in /etc/frr/daemons to =no. Once this is done, the daemon will be stopped the next time FRR is restarted." To stop bgpd the way you describe, you can set it to =no in the daemons file and then you need to use restart, not reload. Hope this helps. Michal
Hi Michal, Thanks for digging in to this. But I still think it's a bug. If I start frr, then change bgpd=yes in the daemons file and run "reload", it starts bgpd. But if I then set bgpd=no and run "reload", it does not stop it. I believe this can be fixed with sufficient script wizardry. And calling "restart" is not a substitute -- that would kill and restart all of my routing daemons, which would be really disruptive to my routing tables. As it is, I am forced to kill bgpd manually. To be clear, here's my current logic, which is related to keepalived state transitions: 1. become master and start bgpd: sed -i -e 's/^bgpd=.*/bgpd=yes/' /etc/frr/daemons systemctl reload frr 2. go into backup mode, stopping bgpd: sed -i -e 's/^bgpd=.*/bgpd=no/' /etc/frr/daemons systemctl reload frr # ugh: reload does not notice that bgpd has been disabled, so # it keeps running unless we kill it explicitly # https://bugzilla.redhat.com/show_bug.cgi?id=2011868 bgpid=/run/frr/bgpd.pid [ -s $bgpid ] && kill -s INT `cat $bgpid` I feel it ought to be symmetrical. Calling "restart" is not an option, since that would kill and restart other routing daemons such as ospfd that should not be disturbed. The right fix is to check which daemons watchfrr is managing before the reload operation, then see which of those are now disabled in the daemons file, and then kill them. This should be done in the /usr/lib/frr/frrinit.sh reload function. Or watchfrr could be enhanced to take a list of daemons that should not be running in addition to the list of those that should be running. Maybe I should hack it in, since I wrote this program in the first place some 17 years ago. :-) I'm not even going to get into the other issue that the reload operation fails when not using integrated config files, saying "Unable to read new configuration file /etc/frr/frr.conf". It should fail more gently when not using an integrated config, I think. But it doesn't really affect things. Regards, Andy
Hi Andrew, you are absolutely right that it should be symmetrical like you describe and I agree with that logic. And also with the fact that reload does not work for non-integrated config files, I've hit this problem before as well. I don't think that the upstream is looking into any of this at this point, because like it says in the comment #1 - Currently there is no way to stop or restart an individual daemon. It might be good to bring this up with the upstream. Currently I don't have the space to hack the script so if you're willing to try, that would be awesome. Regards, Michal
Thanks Michal. I may raise it upstream. Maybe they'll take me seriously since I wrote watchquagga (now watchfrr) in the first place. I'm starting to think that the correct fix is to hack watchfrr to take a list of daemons that should not be running, but I'll leave it to them.
I opened it upstream here: https://github.com/FRRouting/frr/issues/9775
FYI, I submitted a patch upstream and created a pull request here: https://github.com/FRRouting/frr/pull/9805
FEDORA-2022-715ffbee02 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-715ffbee02
FEDORA-2022-fbbd0d22ad has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2022-fbbd0d22ad
FEDORA-2022-fbbd0d22ad has been pushed to the Fedora 35 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-fbbd0d22ad` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-fbbd0d22ad See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2022-715ffbee02 has been pushed to the Fedora 36 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-715ffbee02` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-715ffbee02 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2022-dd7466613b has been pushed to the Fedora 35 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-dd7466613b` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-dd7466613b See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2022-dd7466613b has been pushed to the Fedora 35 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2022-715ffbee02 has been pushed to the Fedora 36 stable repository. If problem still persists, please make note of it in this bug report.