Created attachment 1104182 [details] strace from systemd while in broken event loop state Description of problem: The currently available version of systemd-219 for Fedora-22 is subject to a malfunction in the sd-event pending_prioq_compare function which can swap a disabled event source with an enabled one, causing a broken event loop followed by systemctl hangs eventually requiring a reboot. The issue has been identified and fixed upstream: http://lists.freedesktop.org/archives/systemd-devel/2015-September/034356.html https://github.com/systemd/systemd/pull/1366 Version-Release number of selected component (if applicable): systemd-219-25.fc22.x86_64 How reproducible: 6/10 With syscall tracing we were able to observe (under production load) the epoll_wait POLLOUT looping after our monitoring system noticed 'systemctl' processes piling up. The systems in question run between 5000~7000 units. I've attached the strace. We were _not_ able to reproduce per the systemd mailing list post (which described a future version - 227, ahead of 219) Steps to Reproduce: 1. gdb, attach to #1, b pending_prioq_compare 2. break the sd-event queue 3. inspect x, y locals, look for disabled event source, if not, continue (gdb script can help) 4. strace #1 to observe POLLOUT infinite loop / broken sd-event loop Actual results: - non-deterministic piling up systemctl processes - heavy epoll_wait activity by #1 with an infinitely-increasing POLLOUT list Expected results: - systemctl processes not piling up - normal (paired) epoll_wait2 behavior from #1, no disabled event sources swapped with enabled ones Additional info: We have deployed a custom build with this patch and have so far not been able to observe the infinite loop/sd-event malfunction under load.
Would it be possible to get this patch back-ported to fedora-22's systemd-219 rpm? (the patch in the github pull request link)
Created attachment 1104183 [details] Patch from systemd/systemd#1366 in .patch format Downloaded the .patch fromhttps://github.com/pocek/systemd/commit/8046c4576a68977a1089d2585866bfab8152661b.patch, uploaded to here.
https://github.com/msekletar/systemd-fedora/commit/63ff6add2fa596572e24fc4181a47042e37144c1
systemd-219-27.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-7365dd5df4
systemd-219-27.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-7365dd5df4
systemd-219-27.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.