Bug 1470810 - Starting auditd.service times out on a fresh system
Starting auditd.service times out on a fresh system
Status: NEW
Product: Fedora
Classification: Fedora
Component: audit (Show other bugs)
26
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Steve Grubb
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-13 13:50 EDT by Fillarios
Modified: 2017-07-18 12:30 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Kickstart file (332 bytes, text/plain)
2017-07-13 13:50 EDT, Fillarios
no flags Details
Complete log (138.77 KB, text/x-vhdl)
2017-07-13 13:52 EDT, Fillarios
no flags Details
Syslog (auditd with debug) (140.21 KB, text/x-vhdl)
2017-07-14 14:02 EDT, Fillarios
no flags Details

  None (edit)
Description Fillarios 2017-07-13 13:50:16 EDT
Created attachment 1297786 [details]
Kickstart file

Description of problem:

When booting a fresh Fedora, sometimes starting auditd.service would time out after about 3 minutes, delaying the startup by this much.

The OS is Fedora 26 with only @core and @base package groups installed, running in x86_64 VirtualBox instance (BIOS). The kickstart file I used is attached.

I found no reliable way to reproduce this bug, but it pops up after several reboots of the OS. For instance, this log was taken after 0 reboots - when first booting the installed Fedora.

Thank you!

Select log entries (full log attached):
systemd[1]: Starting Security Auditing Service...
auditd[331]: Started dispatcher: /sbin/audispd pid: 333
audispd[333]: priority_boost_parser called with: 4
audispd[333]: max_restarts_parser called with: 10
audispd[333]: No plugins found, exiting
... (3 minutes later)
systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd).
systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd).
systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd).
systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd).
systemd[1]: auditd.service: Start operation timed out. Terminating.
systemd[1]: auditd.service: State 'stop-sigterm' timed out. Killing.
systemd[1]: auditd.service: Killing process 329 (auditd) with signal SIGKILL.
systemd[1]: auditd.service: Killing process 331 (auditd) with signal SIGKILL.
systemd[1]: auditd.service: Control process exited, code=killed status=9
systemd[1]: Failed to start Security Auditing Service.
systemd[1]: auditd.service: Unit entered failed state.
Comment 1 Fillarios 2017-07-13 13:52 EDT
Created attachment 1297792 [details]
Complete log
Comment 2 Steve Grubb 2017-07-13 16:18:10 EDT
I am wondering if the issue is that it needs sssd to resolve uid/gid mappings? Maybe try setting log_format = raw in /etc/audit/auditd.conf. This is strange because auditd is pretty noisy when it has problems and this seems like we have a syscall that is hung.

Also, when this happens do you have anything in the audit logs? There should be a DAEMON_START event.
Comment 3 Fillarios 2017-07-14 10:03:04 EDT
I have finally managed to snapshot the VM state when this bug occurred.

1. log_format is RAW by default in /etc/audit/auditd.conf
2. /var/log/audit/audit.log is empty. There are audit messages in syslog (from the kernel, i imagine).
3. sssd.service unit is started only after auditd.service times out (as you can see in the attached log). When the OS boots properly, sssd starts after auditd as well.
4. Starting auditd.service manually afterwards succeeds.
Comment 4 Steve Grubb 2017-07-14 10:24:03 EDT
I have no idea where it could be hanging. If something failed, auditd is pretty good at logging the reason. But hanging? About the only suggestion I can think of is to put a bunch of syslog statements in the initialization sequence. Would you be willing to build and run a debug version of auditd? I can attach a patch to log something during various steps after the dispatcher was started.
Comment 5 Fillarios 2017-07-14 10:39:50 EDT
Sure, I could do that. Building C projects never goes smoothly for me, but I will try my best.

Also, I did not mention that this issue happens about once every 30 boots. Before each boot I reverted back to the snapshot I took before the first boot. So the initial state of the VM was always identical.
This is probably a result of some sort of race condition.
Comment 6 Steve Grubb 2017-07-14 11:15:30 EDT
OK. Then what I did is pushed a scratch build into koji. Its still building but will be here:

https://koji.fedoraproject.org/koji/buildinfo?buildID=918802

All that you need is the audit package. There are no changes in the libs. This should let me know where its hanging. Thanks for testing!
Comment 7 Fillarios 2017-07-14 14:02 EDT
Created attachment 1298559 [details]
Syslog (auditd with debug)

Thank you, your build made it much easier! :)

Attaching the complete syslog with debug messages from auditd.
I have also used dnf-upgrade'ed fedora here, but it made no difference.
Comment 8 Steve Grubb 2017-07-14 15:24:52 EDT
OK. Its hanging on setting audit enabled. That's a big help. I'll have to look into this more. There is a kernel communication bug that I think is fixed in the newest 4.12 kernel. but I'm not sure that is it.
Comment 9 Steve Grubb 2017-07-14 15:26:36 EDT
Looks like it called set enabled, there was an audit event saying audit is enabled. Then it seems like auditd never got the ack saying it was completed.
Comment 10 Steve Grubb 2017-07-18 12:30:37 EDT
This bug is probably related to the netlink communication bug fixed in the 4.12 kernel. It doesn't look like Fedora 26 isn't ready to move to that kernel just yet. When it becomes available, you should probably retest.

Note You need to log in before you can comment on or make changes to this bug.