Bug 1727058 - audit RPM can't be removed
Summary: audit RPM can't be removed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: audit
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Steve Grubb
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: IoT 1592330
TreeView+ depends on / blocked
 
Reported: 2019-07-04 10:40 UTC by Lubomir Rintel
Modified: 2020-04-24 08:21 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-05 17:08:12 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Lubomir Rintel 2019-07-04 10:40:10 UTC
A fix for bug #1698130 broke some more things:

Let's try removing the rpm:

  [root@30-1 lkundrak]# rpm -e audit
  error: %preun(audit-3.0-0.9.20190507gitf58ec40.fc30.x86_64) scriptlet failed, exit status 127
  error: audit-3.0-0.9.20190507gitf58ec40.fc30.x86_64: erase failed
  [root@30-1 lkundrak]#

That is because it tried to stop auditd with /sbin/service, but it does not exist:

  [root@30-1 lkundrak]# /sbin/service auditd stop
  bash: /sbin/service: No such file or directory
  [root@30-1 lkundrak]#

Install it.

  [root@30-1 lkundrak]# dnf -y install /sbin/service
  ...
  Installed:
    initscripts-10.01-2.fc30.x86_64

  Complete!
  [root@30-1 lkundrak]#

Cool. Try again:

  [root@30-1 lkundrak]# rpm -e audit
  error: %preun(audit-3.0-0.9.20190507gitf58ec40.fc30.x86_64) scriptlet failed, exit status 1
  error: audit-3.0-0.9.20190507gitf58ec40.fc30.x86_64: erase failed
  [root@30-1 lkundrak]#

Still won't work.

  [root@30-1 lkundrak]# /sbin/service auditd stop
  /sbin/service: line 3: /etc/init.d/functions: No such file or directory
  /usr/libexec/initscripts/legacy-actions/auditd/stop: line 10: /etc/init.d/functions: No such file or directory
  [root@30-1 lkundrak]#

Oh, both initscripts and the audit stop helper are sourcing the file from a wrong location when there's no /etc/init.d symlink.

  [root@30-1 lkundrak]# dnf -y install chkconfig
  ...
  Installed:
    chkconfig-1.11-4.fc30.x86_64

  Complete!
  [root@30-1 lkundrak]#

Works now.

  [root@30-1 lkundrak]# rpm -e audit
  [root@30-1 lkundrak]#

Comment 1 Lubomir Rintel 2019-07-04 10:45:27 UTC
(In reply to Lubomir Rintel from comment #0)
...
>   [root@30-1 lkundrak]# /sbin/service auditd stop
>   /sbin/service: line 3: /etc/init.d/functions: No such file or directory
>   /usr/libexec/initscripts/legacy-actions/auditd/stop: line 10:
> /etc/init.d/functions: No such file or directory
>   [root@30-1 lkundrak]#

This is https://github.com/fedora-sysv/initscripts/pull/263

A similar change would be needed for audit-userspace. Nevertheless, that would still not fix the whole mess.

It seems to me that the proper way to fix this would be to remove RefuseManualStop=yes and rely on %systemd_preun auditd.service and  %systemd_postun_with_restart auditd.service. Otherwise the package would have to depend on initscripts (and chkconfig, until initscripts get fixed), which is not appropriate for a package a base installation.

Comment 2 Steve Grubb 2019-07-05 14:58:53 UTC
"rpm -q --requires audit"  shows that it requires chkconfig. I looks like initscripts isn't being picked up, perhaps that can be improved.

Comment 3 Steve Grubb 2019-07-05 16:40:18 UTC
OK, I pushed a build into rawhide that has initscripts added. This is an absolute requirement of audit. It cannot use systemctl because it cannot audit who sent the shutdown command. As for chkconfig, I'll see what I can do upstream to fix that so that it doesn't need the symlink. Thanks for pointing this out.

Comment 4 Steve Grubb 2019-07-05 16:59:14 UTC
upstream commit d1c80e0 should fix the chkconfig dependency.

Comment 5 Peter Robinson 2019-07-08 07:08:13 UTC
(In reply to Steve Grubb from comment #3)
> OK, I pushed a build into rawhide that has initscripts added. This is an
> absolute requirement of audit. It cannot use systemctl because it cannot
> audit who sent the shutdown command. As for chkconfig, I'll see what I can
> do upstream to fix that so that it doesn't need the symlink. Thanks for
> pointing this out.

Is there an upstream systemctl bug report for this? SysV is legacy so the solution should be to fix the current tools not force legacy ones.

Comment 6 Steve Grubb 2019-07-09 22:12:58 UTC
(In reply to Peter Robinson from comment #5)
> (In reply to Steve Grubb from comment #3)
> > OK, I pushed a build into rawhide that has initscripts added. This is an
> > absolute requirement of audit. It cannot use systemctl because it cannot
> > audit who sent the shutdown command. As for chkconfig, I'll see what I can
> > do upstream to fix that so that it doesn't need the symlink. Thanks for
> > pointing this out.
> 
> Is there an upstream systemctl bug report for this? SysV is legacy so the
> solution should be to fix the current tools not force legacy ones.

There is a mandatory requirement that we audit any attempt to start or stop the audit system:
https://www.niap-ccevs.org/MMO/PP/-442-/#fau

So, why not systemctl? The problem is dbus activation. It loses the audit loginuid on the backend and we cannot tell who did it. It looks like root sent the signal - which is true, but who's acting as root? One solution to this was creating the kernel dbus subsystem where the loginuid could be transferred between end points and security policy could verify that there is no forgery/credential kidnapping attempt. That was shot down. So, here we are. If k-dbus ever got upstream, that would change everything.

Comment 7 Ben Cotton 2019-08-13 16:51:16 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 8 Ben Cotton 2019-08-13 19:04:35 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 9 Peter Robinson 2019-08-28 13:02:06 UTC
Zbigniew what is the solution to this in systemd without having to rely on Sys-V initscripts? 

> There is a mandatory requirement that we audit any attempt to start or stop
> the audit system:
> https://www.niap-ccevs.org/MMO/PP/-442-/#fau
> 
> So, why not systemctl? The problem is dbus activation. It loses the audit
> loginuid on the backend and we cannot tell who did it. It looks like root
> sent the signal - which is true, but who's acting as root? One solution to
> this was creating the kernel dbus subsystem where the loginuid could be
> transferred between end points and security policy could verify that there
> is no forgery/credential kidnapping attempt. That was shot down. So, here we
> are. If k-dbus ever got upstream, that would change everything.

Comment 10 Zbigniew Jędrzejewski-Szmek 2019-08-28 14:45:33 UTC
I don't think the thing with calling /sbin/service has any effect.
/sbin/service is a simple wrapper that sets some variables and then calls
exec /bin/systemctl ${ACTION} ${OPTIONS} ${SERVICE_MANGLED}.
If the service script can do this, the scriptlet in %pre can do the
exact same thing. So please just kill all the references to chkconfig, initscripts,
/sbin/service with fire.

Also, dbus activation doesn't seem relevant here. If auditd.service is running,
then systemd is also running (because smth must have started the service).
So /sbin/service a.k.a. systemctl simply makes a dbus call to PID1.

Comment 11 Zbigniew Jędrzejewski-Szmek 2019-09-30 07:56:10 UTC
Can we please get this resolved? Audit is installed pretty much everywhere, and it is rather
sad that it is pulling in initscripts, a 1MB package with various dependencies.

Comment 12 Steve Grubb 2019-09-30 16:45:02 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #10)
> I don't think the thing with calling /sbin/service has any effect.
> /sbin/service is a simple wrapper that sets some variables and then calls
> exec /bin/systemctl ${ACTION} ${OPTIONS} ${SERVICE_MANGLED}.

Only in some cases. In others it sends the signal in the user's login context.


> If the service script can do this, the scriptlet in %pre can do the
> exact same thing. So please just kill all the references to chkconfig,
> initscripts,
> /sbin/service with fire.

chkconfig dependencies should be eliminated. /usr/sbin/service, we really can't. It has to be done this way.


> Also, dbus activation doesn't seem relevant here. If auditd.service is
> running,
> then systemd is also running (because smth must have started the service).
> So /sbin/service a.k.a. systemctl simply makes a dbus call to PID1.

Only in some cases. If there are legacy initscripts provided, then /sbin/service sends signals like it did during SysVinit days. 

The best path forward was the kernel-dbus work as that would have fixed auditing for the whole desktop. Without that, we have large parts of OS functionality that is broken by the desktop's design. For example, setting time by the gui does not show who set the time. It shows root did it.

Comment 13 Zbigniew Jędrzejewski-Szmek 2019-09-30 20:20:54 UTC
Let's not talk hypotheticals, but simply what is now feasible in Fedora.
/sbin/service is a simple script that has no suid bit, no file capabilities, and
no xattrs. So in all cases it can be replaced by a different script and/or
simply a direct call.

The %preun scriptlet looks like this:
if [ $1 -eq 0 ] && [ -x /usr/bin/systemctl ] ; then 
        # Package removal, not upgrade 
        /usr/bin/systemctl --no-reload disable --now auditd.service || : 
fi 

if [ $1 -eq 0 ]; then
   /sbin/service auditd stop > /dev/null 2>&1
fi

That second part is completely redundant with the first one. 
If /sbin/service is installed, all it does is call systemctl stop auditd.
If /sbin/service is not installed, it causes an error.
Why can't you just remove it?

Comment 14 Steve Grubb 2019-09-30 20:51:57 UTC
What you are saying is true whenever there are no legacy actions. The audit package provides a number of legacy actions. That means that the test in service

  elif [ -n "${ACTION}" ] && [ -x "${ACTIONDIR}/${SERVICE}/${ACTION}" ]; then

gets taken because audit provides a stop action. If that is not true, then it falls into the next elif which redirects to systemctl for the actions that systemctl supports. Again, the reason why this is done is because the audit system is used to meet common criteria requirements which mandate that we get the loginuid of anyone that attempts to manipulate the audit trail. The kernel saves this information which the audit daemon retrieves to record who is doing this. When systemctl is used, the information is lost.

Comment 15 Zbigniew Jędrzejewski-Szmek 2019-09-30 22:01:42 UTC
Sure, OK. So /sbin/service calls /usr/libexec/initscripts/legacy-actions/auditd/stop, which
is another shell script, that apparently tries to send SIGTERM to the daemon.
But since /usr/bin/systemctl --no-reload disable --now auditd.service was already called,
and the daemon is already stopped, then this script is not really doing anything useful, no?
So my suggestion from comment #13 still stands.

(/usr/libexec/initscripts/legacy-actions/auditd/stop does __pids_pidof auditd, which of course
is vastly less reliable than the service tracking that systemd does. We really don't need
another reimplementation of service tracking in a shell script.)

> audit system is used to meet common criteria requirements which mandate that we get the
> loginuid of anyone that attempts to manipulate the audit trail

This only works if someone calls /sbin/service? I'd expect most people nowadays try
systemctl first, which means that this tracking doesn't work anyway, right? My suggestion
would be to remove the dependency on initscripts from audit.rpm. A 'Suggests: initscripts'
tag would be appropriate. Then if somebody wishes to make calls to "/sbin/service auditd ..."
they can install the initscripts package and do that.

Comment 16 Steve Grubb 2019-09-30 22:26:52 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #15)
> Sure, OK. So /sbin/service calls
> /usr/libexec/initscripts/legacy-actions/auditd/stop, which
> is another shell script, that apparently tries to send SIGTERM to the daemon.
> But since /usr/bin/systemctl --no-reload disable --now auditd.service was
> already called,
> and the daemon is already stopped, then this script is not really doing
> anything useful, no?

Yes, it is working fine.

# ausearch --start today -m daemon_end --format text
At 16:36:51 09/30/2019 sgrubb successfully shutdown-audit 
At 16:37:37 09/30/2019 sgrubb successfully shutdown-audit 
At 16:38:10 09/30/2019 sgrubb successfully shutdown-audit 
At 16:38:42 09/30/2019 sgrubb successfully shutdown-audit 
At 16:39:57 09/30/2019 sgrubb successfully shutdown-audit 

You can try it on your system. I don't think what you describe above is actually happening. If auditd was terminated, it would not be writing the daemon_end event. Let's see what is really happening:

strace -f /sbin/service auditd stop 2>&1 | grep execve | grep -v resumed | tr '\(\"' ' ' | awk '{ print $4 }'
/bin/tty
/bin/grep
/bin/cat
/bin/basename
/bin/basename
/bin/env
/usr/libexec/initscripts/legacy-actions/auditd/stop
/bin/id
/bin/tty
/bin/grep
/bin/cat
/sbin/pidof
/usr/bin/dirname
/bin/timeout
/sbin/tail
/usr/sbin/tail
/bin/tail

I don't see any systemd related commands being called.


> So my suggestion from comment #13 still stands.
> 
> (/usr/libexec/initscripts/legacy-actions/auditd/stop does __pids_pidof
> auditd, which of course
> is vastly less reliable than the service tracking that systemd does. We
> really don't need
> another reimplementation of service tracking in a shell script.)
> 
> > audit system is used to meet common criteria requirements which mandate that we get the
> > loginuid of anyone that attempts to manipulate the audit trail
> 
> This only works if someone calls /sbin/service? I'd expect most people
> nowadays try systemctl first, which means that this tracking doesn't work anyway, right?

We intentionally block that in the service file so that they have to use the service command.
RefuseManualStop=yes


> My suggestion would be to remove the dependency on initscripts from audit.rpm. A
> 'Suggests: initscripts'
> tag would be appropriate. Then if somebody wishes to make calls to
> "/sbin/service auditd ..."
> they can install the initscripts package and do that.

I'm afraid it doesn't work that way. We have to meet requirements. This means blocking systemctl since it loses the information we need. Then we provide the legacy scripts that do what is required by certification. We have to set this up so that users have a functioning audit system if they need one.

I'm not sure what the real intent is here. Auditd has to work. It can only work if the signal is sent from the user's context. If systemctl would do what legacy actions does, then we could drop initscripts dependency. I don't care what command sends the signal, it just needs to be done from the user's context so that we see who is tampering with the audit trail. HTH...

Comment 17 Colin Walters 2019-10-24 14:44:48 UTC
So, this bug caused initscripts to be dragged back into RHEL CoreOS, and we definitely don't want anything to do with initscripts there.

Here's one approach - just don't stop auditd in the `%preun`.

Here's another: inline whatever code it is you want into this package.  It sounds like what you want boils down to sending SIGTERM to the pid directly or something?

And finally...I think nowadays `systemctl` doesn't go over the system DBus daemon, it talks to a private socket in `/run`.  So surely we could change systemd to capture the loginuid of that request, if it isn't doing so today even.

(Just strace -f systemctl stop and you'll see, while it uses the DBus *protocol* it doesn't talk to /run/dbus/system_bus_socket)

Comment 19 Steve Grubb 2019-10-30 20:08:05 UTC
(In reply to Colin Walters from comment #17)
> So, this bug caused initscripts to be dragged back into RHEL CoreOS, and we
> definitely don't want anything to do with initscripts there.
> 
> Here's one approach - just don't stop auditd in the `%preun`.

That is the fix for bug 1698130.


> Here's another: inline whatever code it is you want into this package.  It
> sounds like what you want boils down to sending SIGTERM to the pid directly
> or something?

Yes. Exactly. Why can't systemctl do that? I have no loyalty to the "service" command. However, it's the only thing that works right. If there was another standard way of interacting with daemons that did the right thing, I'd be happy to recommend it.


> And finally...I think nowadays `systemctl` doesn't go over the system DBus
> daemon, it talks to a private socket in `/run`.  So surely we could change
> systemd to capture the loginuid of that request, if it isn't doing so today
> even.

We need to cover a variety of ways that someone could tamper with the audit trail. The common denominator is the SIGTERM signal. The kernel caches information (more than auid) about who sent it so that a proper audit event can be formulated. If systemd sent it, everything about the event would be wrong because the kernel supplements what user space sends. Take a look at the service_stop events. So, searching/reporting would not work. Additionally, there would be a race between systemd sending the signal and it sending the sigterm.This then gets even messier when you consider automated scripts that stop and start immediately when the command finishes. See bug 1643567.

Comment 20 Colin Walters 2019-11-01 13:25:15 UTC
> Yes. Exactly. Why can't systemctl do that?

Because...systemd gets a lot of its power from centralizing control over the services it manages in pid 1 itself.  Take for example `systemctl status`.  That just calls out via DBus to ask pid 1 about the canonical state it knows about the service.  So it's correct.
Compare with classic initscripts, particularly pre-cgroups which would have the command itself go out and load a PID file and send a test signal.  That's very racy if the service is stopping/restarting and the PID file is rewritten, etc.

You can see a really great example of this problem in audit'd condrestart script:

/usr/libexec/initscripts/legacy-actions/auditd/stop
sleep 1
echo "Redirecting start to /bin/systemctl start auditd.service"
/bin/systemctl start auditd.service

That "sleep 1" screams "race condition", and it is.  systemd restarting a service gets this fully correct.

>  If there was another standard way of interacting with daemons that did the right thing, I'd be happy to recommend it.

No other software has this requirement.

Comment 21 Steve Grubb 2019-11-01 14:04:49 UTC
> That "sleep 1" screams "race condition", and it is.  systemd restarting a service gets this fully correct.

That was a failed attempt at working around a systemd problem. During restart, it was starting auditd without noticing that it was still alive. The sleep was replaced with a tail command and it should work better now.

> > If there was another standard way of interacting with daemons that did the right thing, I'd be happy to recommend it.
> No other software has this requirement.

Right. It's like the full set of requirements were not gathered when systemd was designed. This requirement has existed way before systemd existed. It's mandated by Common Criteria and PCI-DSS. Its worked properly since 2004. Additionally, systemd does not follow the system state requirements that have been published https://github.com/linux-audit/audit-documentation/wiki/SPEC-System-Lifecycle-Events since 2014. It is supposed to be sending AUDIT_SYSTEM_SHUTDOWN whenever its told to shutdown the system, not when its completed it and auditd is dead. This causes all audit reports of system uptime to show that the system crashed when it was a graceful shutdown. 

Going back to solving the problem, one way this could work is systemctl ask systemd to terminate auditd, it could reply back saying you do it with this signal at this pid. Then systemctl sends the specified signal to the specified pid. Systemd observes that it received a sigchld and it takes auditd off the books.

Comment 22 Colin Walters 2019-11-01 17:25:56 UTC
> Going back to solving the problem, one way this could work is systemctl ask systemd to terminate auditd, it could reply back saying you do it with this signal at this pid

It already has an API to give you the PID:

[root@coreos ~]# systemctl start rpm-ostreed
[root@coreos ~]# systemctl show -p MainPID rpm-ostreed
MainPID=2038
[root@coreos ~]# systemctl is-active rpm-ostreed
active
[root@coreos ~]# kill -TERM 2038
[root@coreos ~]# systemctl is-active rpm-ostreed
inactive
[root@coreos ~]# systemctl show -p MainPID rpm-ostreed
MainPID=0
[root@coreos ~]#

Comment 23 Colin Walters 2019-11-01 17:40:45 UTC
Do note of course that doing this introduces a race condition where the process could stop and the PID could be reused by something else.

These types of very classic Unix race conditions again are exactly why systemd is designed the way it is.

(That said in relatively modern kernels there's finally https://lwn.net/Articles/801319/ but that just pushes the race to acquiring the pidfd unless you're very careful)

Comment 24 Steve Grubb 2019-11-05 17:04:56 UTC
Colin, thanks for pointing that out. I suspect the race would be very small (3 syscalls: write, read, kill). If its a unix socket for communication, the fd can be passed to systemctl which should make the race go away.

Comment 25 Steve Grubb 2019-11-05 17:08:12 UTC
I'm going to close this issue as the original problem with chkconfig is solved with the release of audit-3.0-0.15.20191104git1c2f876. If anyone wants to follow the initscripts dependency, please join bug 1768815.


Note You need to log in before you can comment on or make changes to this bug.