Bug 1470810 - Starting auditd.service times out on a fresh system
Summary: Starting auditd.service times out on a fresh system
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: audit
Version: 26
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Steve Grubb
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-13 17:50 UTC by Eveline Raine
Modified: 2017-12-30 19:00 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-09-29 14:49:21 UTC
Type: Bug
Embargoed:
eveline: needinfo-


Attachments (Terms of Use)
Kickstart file (332 bytes, text/plain)
2017-07-13 17:50 UTC, Eveline Raine
no flags Details
Complete log (138.77 KB, text/x-vhdl)
2017-07-13 17:52 UTC, Eveline Raine
no flags Details
Syslog (auditd with debug) (140.21 KB, text/x-vhdl)
2017-07-14 18:02 UTC, Eveline Raine
no flags Details
Complete journal log from boot until auditd service timeout (138.67 KB, text/plain)
2017-10-08 11:42 UTC, Lucas Maneos
no flags Details

Description Eveline Raine 2017-07-13 17:50:16 UTC
Created attachment 1297786 [details]
Kickstart file

Description of problem:

When booting a fresh Fedora, sometimes starting auditd.service would time out after about 3 minutes, delaying the startup by this much.

The OS is Fedora 26 with only @core and @base package groups installed, running in x86_64 VirtualBox instance (BIOS). The kickstart file I used is attached.

I found no reliable way to reproduce this bug, but it pops up after several reboots of the OS. For instance, this log was taken after 0 reboots - when first booting the installed Fedora.

Thank you!

Select log entries (full log attached):
systemd[1]: Starting Security Auditing Service...
auditd[331]: Started dispatcher: /sbin/audispd pid: 333
audispd[333]: priority_boost_parser called with: 4
audispd[333]: max_restarts_parser called with: 10
audispd[333]: No plugins found, exiting
... (3 minutes later)
systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd).
systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd).
systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd).
systemd[1]: Received SIGRTMIN+20 from PID 205 (plymouthd).
systemd[1]: auditd.service: Start operation timed out. Terminating.
systemd[1]: auditd.service: State 'stop-sigterm' timed out. Killing.
systemd[1]: auditd.service: Killing process 329 (auditd) with signal SIGKILL.
systemd[1]: auditd.service: Killing process 331 (auditd) with signal SIGKILL.
systemd[1]: auditd.service: Control process exited, code=killed status=9
systemd[1]: Failed to start Security Auditing Service.
systemd[1]: auditd.service: Unit entered failed state.

Comment 1 Eveline Raine 2017-07-13 17:52:11 UTC
Created attachment 1297792 [details]
Complete log

Comment 2 Steve Grubb 2017-07-13 20:18:10 UTC
I am wondering if the issue is that it needs sssd to resolve uid/gid mappings? Maybe try setting log_format = raw in /etc/audit/auditd.conf. This is strange because auditd is pretty noisy when it has problems and this seems like we have a syscall that is hung.

Also, when this happens do you have anything in the audit logs? There should be a DAEMON_START event.

Comment 3 Eveline Raine 2017-07-14 14:03:04 UTC
I have finally managed to snapshot the VM state when this bug occurred.

1. log_format is RAW by default in /etc/audit/auditd.conf
2. /var/log/audit/audit.log is empty. There are audit messages in syslog (from the kernel, i imagine).
3. sssd.service unit is started only after auditd.service times out (as you can see in the attached log). When the OS boots properly, sssd starts after auditd as well.
4. Starting auditd.service manually afterwards succeeds.

Comment 4 Steve Grubb 2017-07-14 14:24:03 UTC
I have no idea where it could be hanging. If something failed, auditd is pretty good at logging the reason. But hanging? About the only suggestion I can think of is to put a bunch of syslog statements in the initialization sequence. Would you be willing to build and run a debug version of auditd? I can attach a patch to log something during various steps after the dispatcher was started.

Comment 5 Eveline Raine 2017-07-14 14:39:50 UTC
Sure, I could do that. Building C projects never goes smoothly for me, but I will try my best.

Also, I did not mention that this issue happens about once every 30 boots. Before each boot I reverted back to the snapshot I took before the first boot. So the initial state of the VM was always identical.
This is probably a result of some sort of race condition.

Comment 6 Steve Grubb 2017-07-14 15:15:30 UTC
OK. Then what I did is pushed a scratch build into koji. Its still building but will be here:

https://koji.fedoraproject.org/koji/buildinfo?buildID=918802

All that you need is the audit package. There are no changes in the libs. This should let me know where its hanging. Thanks for testing!

Comment 7 Eveline Raine 2017-07-14 18:02:00 UTC
Created attachment 1298559 [details]
Syslog (auditd with debug)

Thank you, your build made it much easier! :)

Attaching the complete syslog with debug messages from auditd.
I have also used dnf-upgrade'ed fedora here, but it made no difference.

Comment 8 Steve Grubb 2017-07-14 19:24:52 UTC
OK. Its hanging on setting audit enabled. That's a big help. I'll have to look into this more. There is a kernel communication bug that I think is fixed in the newest 4.12 kernel. but I'm not sure that is it.

Comment 9 Steve Grubb 2017-07-14 19:26:36 UTC
Looks like it called set enabled, there was an audit event saying audit is enabled. Then it seems like auditd never got the ack saying it was completed.

Comment 10 Steve Grubb 2017-07-18 16:30:37 UTC
This bug is probably related to the netlink communication bug fixed in the 4.12 kernel. It doesn't look like Fedora 26 isn't ready to move to that kernel just yet. When it becomes available, you should probably retest.

Comment 11 Steve Grubb 2017-09-28 02:29:48 UTC
Out of curiosity, is this problem still happening?

Comment 12 Steve Grubb 2017-09-29 14:49:21 UTC
Thanks for reporting this bug. I think it was fixed in the 4.12 kernel. If the problem recurs, feel free to re-open this bug report.

Comment 13 Lucas Maneos 2017-10-08 11:42:44 UTC
Created attachment 1335830 [details]
Complete journal log from boot until auditd service timeout

Hi,

I'm seeing this in approximately 50% of boots on a fresh minimal fc26 installation (kernel 4.13.4-200, audit 2.7.8-1).   I couldn't reproduce it with the 2.7.7-2 audit build from koji after 10 reboots, so this may be a different issue.


Note You need to log in before you can comment on or make changes to this bug.