Bug 2011868 - systemctl frr reload does not stop daemons that are not enabled in /etc/frr/daemons
Summary: systemctl frr reload does not stop daemons that are not enabled in /etc/frr/d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: frr
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Michal Ruprich
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-07 14:58 UTC by Andrew Schorr
Modified: 2022-03-26 15:17 UTC (History)
1 user (show)

Fixed In Version: frr-8.2.2-1.fc35 frr-8.2-1.fc36
Clone Of:
Environment:
Last Closed: 2022-03-24 16:15:07 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github FRRouting frr issues 9775 0 None closed systemctl frr reload does not stop daemons that are disabled in /etc/frr/daemons 2022-03-08 13:43:44 UTC
Github FRRouting frr pull 9805 0 None closed patch reload script logic to stop daemons no longer enabled in /etc/frr/daemons 2022-03-08 13:43:48 UTC

Description Andrew Schorr 2021-10-07 14:58:56 UTC
Description of problem:
If you start frr with a certain set of daemons enabled in /etc/frr/daemons, and then you disable one or more by changing it from yes to no, running
"systemctl frr reload" does not stop the daemons that were disabled.

Version-Release number of selected component (if applicable):
frr-7.5.1-3.fc34.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Edit /etc/frr/daemons to set bgpd=yes and run "systemctl frr start"
2. Edit /etc/frr/daemons to change bgpd=no and run "systemctl frr reload"
3.

Actual results:
bgpd is still running

Expected results:
The reload should stop bgpd.

Additional info:
After the reload, one sees:

# systemctl status frr
● frr.service - FRRouting
     Loaded: loaded (/usr/lib/systemd/system/frr.service; disabled; vendor preset: disabled)
     Active: active (running) since Thu 2021-10-07 10:49:08 EDT; 32s ago
       Docs: https://frrouting.readthedocs.io/en/latest/setup.html
    Process: 17125 ExecStart=/usr/libexec/frr/frrinit.sh start (code=exited, status=0/SUCCESS)
    Process: 17204 ExecReload=/usr/libexec/frr/frrinit.sh reload (code=exited, status=0/SUCCESS)
     Status: "FRR Operational"
      Tasks: 13 (limit: 19004)
     Memory: 16.0M
        CPU: 1.895s
     CGroup: /system.slice/frr.service
             ├─17142 /usr/libexec/frr/zebra -d -F traditional -A 127.0.0.1 -s 90000000
             ├─17149 /usr/libexec/frr/staticd -d -F traditional -A 127.0.0.1
             ├─17180 /usr/libexec/frr/bgpd -d -F traditional -A 127.0.0.1
             └─17216 /usr/libexec/frr/watchfrr -d -F traditional zebra staticd

Oct 07 10:49:38 asny-nuc watchfrr[17172]: Terminating on signal
Oct 07 10:49:38 asny-nuc frrinit.sh[17204]: Stopped watchfrr
Oct 07 10:49:38 asny-nuc watchfrr[17216]: watchfrr 7.5.1 starting: vty@0
Oct 07 10:49:38 asny-nuc watchfrr[17216]: zebra state -> up : connect succeeded
Oct 07 10:49:38 asny-nuc watchfrr[17216]: staticd state -> up : connect succeeded
Oct 07 10:49:38 asny-nuc watchfrr[17216]: all daemons up, doing startup-complete notify
Oct 07 10:49:38 asny-nuc frrinit.sh[17204]: Started watchfrr
Oct 07 10:49:38 asny-nuc frrinit.sh[17219]: /usr/libexec/frr/frr-reload.py:805: SyntaxWarning: "is not" with a literal. D>
Oct 07 10:49:38 asny-nuc frrinit.sh[17219]:   if line is not "exit-vrf":
Oct 07 10:49:39 asny-nuc systemd[1]: Reloaded FRRouting.


This problem is also present in CentOS 8. It seems that watchfrr cares only
about the daemons specified on its command-line, so it is blissfully unaware
that bgpd is still running. The logic in the wrapper script should probably
be updated to make sure that daemons set to "no" are stopped.

Comment 1 Michal Ruprich 2021-10-07 18:37:57 UTC
Hi Andrew,

I think you misunderstand the reload script in FRR. As is stated in the documentation:

"Reloading applies the differential between on-disk configuration and the current effective configuration of running FRR processes. This includes starting daemons that were previously stopped and any changes made to individual or unified daemon configuration files."

and 

"Currently there is no way to stop or restart an individual daemon. This is because FRR’s monitoring program cannot currently distinguish between a crashed / killed daemon versus one that has been intentionally stopped or restarted. The closest that can be achieved is to remove all configuration for the daemon, and set its line in /etc/frr/daemons to =no. Once this is done, the daemon will be stopped the next time FRR is restarted."

To stop bgpd the way you describe, you can set it to =no in the daemons file and then you need to use restart, not reload.

Hope this helps.

Michal

Comment 2 Andrew Schorr 2021-10-07 18:53:15 UTC
Hi Michal,

Thanks for digging in to this. But I still think it's a bug. If I start frr, then change bgpd=yes in the daemons file
and run "reload", it starts bgpd. But if I then set bgpd=no and run "reload", it does not stop it. I believe
this can be fixed with sufficient script wizardry. And calling "restart" is not a substitute -- that would kill
and restart all of my routing daemons, which would be really disruptive to my routing tables. As it is, I am
forced to kill bgpd manually.

To be clear, here's my current logic, which is related to keepalived state transitions:

1. become master and start bgpd:

            sed -i -e 's/^bgpd=.*/bgpd=yes/' /etc/frr/daemons
            systemctl reload frr

2. go into backup mode, stopping bgpd:

            sed -i -e 's/^bgpd=.*/bgpd=no/' /etc/frr/daemons
            systemctl reload frr
            # ugh: reload does not notice that bgpd has been disabled, so
            # it keeps running unless we kill it explicitly
            # https://bugzilla.redhat.com/show_bug.cgi?id=2011868
            bgpid=/run/frr/bgpd.pid
            [ -s $bgpid ] && kill -s INT `cat $bgpid`


I feel it ought to be symmetrical. Calling "restart" is not an option, since that would kill and
restart other routing daemons such as ospfd that should not be disturbed. The right fix
is to check which daemons watchfrr is managing before the reload operation, then see which
of those are now disabled in the daemons file, and then kill them. This should be done in the
/usr/lib/frr/frrinit.sh reload function. Or watchfrr could be enhanced to take a list
of daemons that should not be running in addition to the list of those that should
be running. Maybe I should hack it in, since I wrote this program in the first place
some 17 years ago. :-)

I'm not even going to get into the other issue that the reload operation fails when not
using integrated config files, saying "Unable to read new configuration file /etc/frr/frr.conf".
It should fail more gently when not using an integrated config, I think. But it doesn't
really affect things.

Regards,
Andy

Comment 3 Michal Ruprich 2021-10-07 18:59:25 UTC
Hi Andrew,

you are absolutely right that it should be symmetrical like you describe and I agree with that logic. And also with the fact that reload does not work for non-integrated config files, I've hit this problem before as well. I don't think that the upstream is looking into any of this at this point, because like it says in the comment #1 - Currently there is no way to stop or restart an individual daemon. It might be good to bring this up with the upstream. Currently I don't have the space to hack the script so if you're willing to try, that would be awesome.

Regards,
Michal

Comment 4 Andrew Schorr 2021-10-07 19:03:14 UTC
Thanks Michal. I may raise it upstream. Maybe they'll take me seriously since I wrote watchquagga (now watchfrr)
in the first place. I'm starting to think that the correct fix is to hack watchfrr to take a list of daemons
that should not be running, but I'll leave it to them.

Comment 5 Andrew Schorr 2021-10-07 19:46:30 UTC
I opened it upstream here: https://github.com/FRRouting/frr/issues/9775

Comment 6 Andrew Schorr 2021-10-12 15:03:07 UTC
FYI, I submitted a patch upstream and created a pull request here:

https://github.com/FRRouting/frr/pull/9805

Comment 7 Fedora Update System 2022-03-10 12:40:47 UTC
FEDORA-2022-715ffbee02 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-715ffbee02

Comment 8 Fedora Update System 2022-03-10 12:40:48 UTC
FEDORA-2022-fbbd0d22ad has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2022-fbbd0d22ad

Comment 9 Fedora Update System 2022-03-11 15:45:06 UTC
FEDORA-2022-fbbd0d22ad has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-fbbd0d22ad`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-fbbd0d22ad

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 10 Fedora Update System 2022-03-11 19:24:36 UTC
FEDORA-2022-715ffbee02 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-715ffbee02`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-715ffbee02

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 11 Fedora Update System 2022-03-16 16:25:32 UTC
FEDORA-2022-dd7466613b has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-dd7466613b`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-dd7466613b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 12 Fedora Update System 2022-03-24 16:15:07 UTC
FEDORA-2022-dd7466613b has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 13 Fedora Update System 2022-03-26 15:17:06 UTC
FEDORA-2022-715ffbee02 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.