Created attachment 1297786 [details] Kickstart file Description of problem: When booting a fresh Fedora, sometimes starting auditd.service would time out after about 3 minutes, delaying the startup by this much. The OS is Fedora 26 with only @core and @base package groups installed, running in x86_64 VirtualBox instance (BIOS). The kickstart file I used is attached. I found no reliable way to reproduce this bug, but it pops up after several reboots of the OS. For instance, this log was taken after 0 reboots - when first booting the installed Fedora. Thank you! Select log entries (full log attached): systemd[1]: Starting Security Auditing Service... auditd[331]: Started dispatcher: /sbin/audispd pid: 333 audispd[333]: priority_boost_parser called with: 4 audispd[333]: max_restarts_parser called with: 10 audispd[333]: No plugins found, exiting ... (3 minutes later) systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd). systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd). systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd). systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd). systemd[1]: auditd.service: Start operation timed out. Terminating. systemd[1]: auditd.service: State 'stop-sigterm' timed out. Killing. systemd[1]: auditd.service: Killing process 329 (auditd) with signal SIGKILL. systemd[1]: auditd.service: Killing process 331 (auditd) with signal SIGKILL. systemd[1]: auditd.service: Control process exited, code=killed status=9 systemd[1]: Failed to start Security Auditing Service. systemd[1]: auditd.service: Unit entered failed state.
Created attachment 1297792 [details] Complete log
I am wondering if the issue is that it needs sssd to resolve uid/gid mappings? Maybe try setting log_format = raw in /etc/audit/auditd.conf. This is strange because auditd is pretty noisy when it has problems and this seems like we have a syscall that is hung. Also, when this happens do you have anything in the audit logs? There should be a DAEMON_START event.
I have finally managed to snapshot the VM state when this bug occurred. 1. log_format is RAW by default in /etc/audit/auditd.conf 2. /var/log/audit/audit.log is empty. There are audit messages in syslog (from the kernel, i imagine). 3. sssd.service unit is started only after auditd.service times out (as you can see in the attached log). When the OS boots properly, sssd starts after auditd as well. 4. Starting auditd.service manually afterwards succeeds.
I have no idea where it could be hanging. If something failed, auditd is pretty good at logging the reason. But hanging? About the only suggestion I can think of is to put a bunch of syslog statements in the initialization sequence. Would you be willing to build and run a debug version of auditd? I can attach a patch to log something during various steps after the dispatcher was started.
Sure, I could do that. Building C projects never goes smoothly for me, but I will try my best. Also, I did not mention that this issue happens about once every 30 boots. Before each boot I reverted back to the snapshot I took before the first boot. So the initial state of the VM was always identical. This is probably a result of some sort of race condition.
OK. Then what I did is pushed a scratch build into koji. Its still building but will be here: https://koji.fedoraproject.org/koji/buildinfo?buildID=918802 All that you need is the audit package. There are no changes in the libs. This should let me know where its hanging. Thanks for testing!
Created attachment 1298559 [details] Syslog (auditd with debug) Thank you, your build made it much easier! :) Attaching the complete syslog with debug messages from auditd. I have also used dnf-upgrade'ed fedora here, but it made no difference.
OK. Its hanging on setting audit enabled. That's a big help. I'll have to look into this more. There is a kernel communication bug that I think is fixed in the newest 4.12 kernel. but I'm not sure that is it.
Looks like it called set enabled, there was an audit event saying audit is enabled. Then it seems like auditd never got the ack saying it was completed.
This bug is probably related to the netlink communication bug fixed in the 4.12 kernel. It doesn't look like Fedora 26 isn't ready to move to that kernel just yet. When it becomes available, you should probably retest.
Out of curiosity, is this problem still happening?
Thanks for reporting this bug. I think it was fixed in the 4.12 kernel. If the problem recurs, feel free to re-open this bug report.
Created attachment 1335830 [details] Complete journal log from boot until auditd service timeout Hi, I'm seeing this in approximately 50% of boots on a fresh minimal fc26 installation (kernel 4.13.4-200, audit 2.7.8-1). I couldn't reproduce it with the 2.7.7-2 audit build from koji after 10 reboots, so this may be a different issue.