Description of problem: When Neutron is killed with SIGTERM (like via systemctl), when using ML2/OVN neutron workers do not exit and instead are eventually killed with SIGKILL when the graceful timeout is reached (often around 1 minute). This is happening due to the signal handlers for SIGTERM. There are multiple issues. 1) oslo_service, ml2/ovn mech_driver, and ml2/ovo_rpc.py all call signal.signal(signal.SIGTERM, ...) overwriting each others signal handlers. 2) SIGTERM is handled in the main thread, and running blocking code there causes AssertionErrors in eventlet which also prevents the process from exiting. 3) The ml2/ovn cleanup code doesn't cause the process to end, so itinterrupts the killing of the process. oslo_service has a singleton SignalHandler class that solves all of these issues Version-Release number of selected component (if applicable): 17.1.3 How reproducible: 95%-ish Steps to Reproduce: 1. Start neutron_api 2. Stop neutron_api Actual results: Neutron only exits after ~42 seconds Expected results: Neutron exits after a few seconds Additional info:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHOSP 17.1.4 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:9974