821254 – systemd kill bash before this is able save it's history file

Bug 821254 - systemd kill bash before this is able save it's history file

Summary: systemd kill bash before this is able save it's history file

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	16
Hardware:	i686
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	811046 844399 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-05-13 15:32 UTC by Frantisek Hanzlik
Modified:	2012-10-12 11:50 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-08-01 16:35:08 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace (48.72 KB, text/plain) 2012-07-31 14:09 UTC, udo	no flags	Details
View All

Description Frantisek Hanzlik 2012-05-13 15:32:00 UTC

Description of problem:
In some cases systemd perhaps kill bash process, so that isn't able save it's
history file. This may occur randomly perhaps when machine is rebooted too (but
i'm not able simulate it now, it's only speculation).
But below described sequence give me 100% results.

Version-Release number of selected component (if applicable):
systemd-37-19.fc16.i686
bash-4.2.24-1.fc16.i686
screen-4.1.0-0.7.20110328git8cf5ef.fc16.i686
kernel-3.3.4-3.fc16.i686.PAE

Steps to Reproduce:
1. from console shell run "screen" program. The screen start bash for user.
2. in this subshell type several commands
3. then with C-a k leave "screen"
4. history file (~/.bash_history) should contain entered commands appended, but
these not there.

Additional info:
what is interesting, when before point 3) above I tell systemd do
debugging (by "/bin/kill -s SIGRTMIN+22 1" command), then bash suffice to save
its history - although in /var/log/messages are no systemd debug messages (with the exception of "systemd[1]: Setting log level to debug." line).
And when I then will send "/bin/kill -s SIGRTMIN+23 1" command), and then
repeate steps 1-4 above, then in messages is only "systemd[1]: Setting log
level to info." line, and in history file are omitted subshell commands.

In F14 with upstart init this is of course working fine and history is always safe saved.

This problem is quite equivalent to bug 811046 (https://bugzilla.redhat.com/show_bug.cgi?id=811046), which I reported recently.

Comment 1 Michal Schmidt 2012-06-15 09:53:23 UTC

I can't reproduce this. The history gets saved for me.

(In reply to comment #0)
> Steps to Reproduce:
> 1. from console shell run "screen" program. The screen start bash for user.
> 2. in this subshell type several commands

At this point try stracing the bash process from another terminal:
strace -o mytrace.txt -p $PID_OF_THE_BASH_UNDER_SCREEN

> 3. then with C-a k leave "screen"
> 4. history file (~/.bash_history) should contain entered commands appended,
> but these not there.

If the problem is reproduced, attach mytrace.txt.

Comment 2 udo 2012-07-31 13:07:54 UTC

See https://bugzilla.redhat.com/show_bug.cgi?id=844399 
Similar issue?

Comment 3 Michal Schmidt 2012-07-31 13:29:27 UTC

Could be related.

Still waiting for the information requested in comment #1.

Comment 4 udo 2012-07-31 14:08:48 UTC

I did:

# strace -o t.t su - mythtv 
Support your local Search and Rescue unit -- get lost.
[mythtv@recorder ~]$ logout
 ...killed.
[root@recorder mythtv]# cat t.t

t.t is attached

Comment 5 udo 2012-07-31 14:09:56 UTC

Created attachment 601535 [details]
strace

Comment 6 udo 2012-07-31 14:11:51 UTC

(strace...)
setregid(4294967295, 0)                 = 0
close(5)                                = 0
getuid()                                = 0
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 4
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=27360, si_uid=0} ---
rt_sigreturn()                          = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=2231, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe49878c000
read(4, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2231
close(4)                                = 0
munmap(0x7fe49878c000, 4096)            = 0
getuid()                                = 0
sendto(3, "<86>Jul 31 16:07:33 su: pam_unix"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
socket(PF_NETLINK, SOCK_RAW, 9)         = -1 EPROTONOSUPPORT (Protocol not supported)
socket(PF_NETLINK, SOCK_RAW, 9)         = -1 EPROTONOSUPPORT (Protocol not supported)
socket(PF_NETLINK, SOCK_RAW, 9)         = -1 EPROTONOSUPPORT (Protocol not supported)
munmap(0x7fe490cb7000, 2101296)         = 0
munmap(0x7fe490870000, 2109632)         = 0
munmap(0x7fe490657000, 2196352)         = 0
munmap(0x7fe49043e000, 2198200)         = 0
munmap(0x7fe49023c000, 2101264)         = 0
munmap(0x7fe490038000, 2109616)         = 0
munmap(0x7fe48fe35000, 2105496)         = 0
munmap(0x7fe48fc0d000, 2118208)         = 0
munmap(0x7fe48fa00000, 2149504)         = 0
munmap(0x7fe48f7fd000, 2105480)         = 0
munmap(0x7fe48f5f8000, 2113840)         = 0
munmap(0x7fe48f3ee000, 2136776)         = 0
munmap(0x7fe48f1ea000, 2109632)         = 0
munmap(0x7fe48efe6000, 2111936)         = 0
munmap(0x7fe48eda2000, 2374032)         = 0
munmap(0x7fe48e779000, 2113872)         = 0
munmap(0x7fe48e574000, 2114032)         = 0
munmap(0x7fe490a74000, 2230304)         = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[KILL ALRM TERM STOP RTMIN RT_1], 8) = 0
nanosleep({2, 0}, 0x7fff4a174120)       = 0
kill(15079, SIGKILL)                    = -1 ESRCH (No such process)
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=2512, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe49878c000
read(4, "# Locale name alias data base.\n#"..., 4096) = 2512
read(4, "", 4096)                       = 0
close(4)                                = 0
munmap(0x7fe49878c000, 4096)            = 0
open("/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, " ...killed.\n", 12)           = 12
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++
[root@recorder mythtv]# ps -ef|grep 27360
root     15154  7583  0 16:11 pts/0    00:00:00 grep --color=auto 27360
root     27360     1  0 Jul05 ?        00:00:01 /usr/lib/systemd/systemd-logind

Comment 7 udo 2012-07-31 15:38:39 UTC

So we can see that the su process is sent a SIGTERM by systemd-logind.
Judging from the output this is a bit too early as su should be allowed to exit normally.
Same stuff could happen in a bash-related situation like this bug mentions.

Can someone explain why and under what conditions a SIGTERM is sent so that su cannot complete the exit?

What exactly is the function of systemd-logind? I do not recall a similar process in the pre-systemd days.

Comment 8 Michal Schmidt 2012-07-31 16:41:41 UTC

Please check the output of this command before running su in the root shell and then the same command from the shell of the mythtv user:
  grep "" /proc/self/{sessionid,loginuid}

systemd-logind sends SIGTERM to the leader process of a user session when that session is stopping. However, 'su' is not expected to spawn user sessions.

"man systemd-logind" has some basic explanation about what it is and provides a few links for more information. In pre-systemd days we had ConsoleKit.

Comment 9 udo 2012-07-31 16:45:16 UTC

[root@recorder xf86-video-ati]# grep "" /proc/self/{sessionid,loginuid}
grep: /proc/self/sessionid: No such file or directory
grep: /proc/self/loginuid: No such file or directory
[root@recorder xf86-video-ati]# su - mythtv
Three may keep a secret, if two of them are dead.
		-- Benjamin Franklin
[mythtv@recorder ~]$ grep "" /proc/self/{sessionid,loginuid}
grep: /proc/self/sessionid: No such file or directory
grep: /proc/self/loginuid: No such file or directory
[mythtv@recorder ~]$

Comment 10 udo 2012-07-31 16:47:05 UTC

See http://0pointer.de/blog/projects/ids.html
A great way to have certain newish tools lean on needed functionality that is not advertised, not really needed, etc.

Comment 11 udo 2012-07-31 16:48:38 UTC

Other box:

[udo@epia ~]$ grep "" /proc/self/{sessionid,loginuid}
/proc/self/sessionid:12856
/proc/self/loginuid:500
[udo@epia ~]$ su -
Password: 
[root@epia ~]# grep "" /proc/self/{sessionid,loginuid}
/proc/self/sessionid:12856
/proc/self/loginuid:500
[root@epia ~]# logout
 ...killed.
[udo@epia ~]$ 

So even without these the functionality is broken.
Questions:
Why do we need this?

Comment 12 Michal Schmidt 2012-08-01 09:18:02 UTC

Making the system work without audit support in the kernel (as on the 'recorder' machine) is of low priority to me. So I would like to focus the debugging effort on the 'epia' machine, where audit seems to be working.

After using su, I'd like to see the output of:
  loginctl
Then for each listed session, let's have a look at:
  loginctl session-status $SESSION

Comment 13 udo 2012-08-01 13:04:31 UTC

[root@epia linux]# loginctl
   SESSION        UID USER             SEAT            
         4        500 udo                              
       c34        500 udo                              
     12856        500 udo                              

3 sessions listed.
[root@epia linux]# loginctl session-status 4
4 - udo (500)
	   Since: Thu, 05 Jul 2012 16:42:09 +0200; 3 weeks and 5 days ago
	    Type: unspecified
; class user	   State: closing
	  CGroup: name=systemd:/user/udo/4
		  └ 5651 ldattach 18 /dev/ttyS0
[root@epia linux]# loginctl session-status c34
c34 - udo (500)
	   Since: Mon, 30 Jul 2012 15:15:41 +0200; 1 day and 23h ago
	  Leader: 32098 (sshd)
	     TTY: ssh
	  Remote: surflank2
	 Service: sshd; type tty; class user
	   State: active
	  CGroup: name=systemd:/user/udo/c34
		  ├ 32098 sshd: udo [priv]
		  ├ 32121 sshd: udo@pts/2
		  └ 32134 -bash
[root@epia linux]# loginctl session-status 12856
12856 - udo (500)
	   Since: Tue, 31 Jul 2012 18:54:09 +0200; 20h ago
	  Leader: 15857 (su)
	     TTY: pts/2
	  Remote: user udo
	 Service: su-l; type tty; class user
	   State: active
	  CGroup: name=systemd:/user/udo/12856
		  ├ 15857 su -
		  ├ 15866 -bash
		  └ 20971 loginctl session-status 12856

Comment 14 Michal Schmidt 2012-08-01 14:06:53 UTC

(In reply to comment #13)
> [root@epia linux]# loginctl session-status c34
> c34 - udo (500)
> 	   Since: Mon, 30 Jul 2012 15:15:41 +0200; 1 day and 23h ago
> 	  Leader: 32098 (sshd)
> 	     TTY: ssh
> 	  Remote: surflank2
> 	 Service: sshd; type tty; class user

This session is interesting, because its ID "c34" indicates that it was initialized while the sessionid was not set for the process.

Please check /etc/pam.d/sshd. Does it call pam_loginuid.so for the session before pam_systemd.so (which is usually included via password-auth)?

Comment 15 udo 2012-08-01 14:21:01 UTC

[root@epia ~]# cat /etc/pam.d/sshd 
#%PAM-1.0
auth       include      system-auth
account    required     pam_nologin.so
account    required     pam_access.so # udo
account    include      system-auth
password   include      system-auth
# pam_selinux.so close should be the first session rule
session    required     pam_selinux.so close
session    include      system-auth
session    required     pam_loginuid.so
# pam_selinux.so open should only be followed by sessions to be executed in the user context
session    required     pam_selinux.so open env_params
session    optional     pam_keyinit.so force revoke
[root@epia ~]# cat /etc/pam.d/sshd.rpmnew 
#%PAM-1.0
auth	   required	pam_sepermit.so
auth       substack     password-auth
auth       include      postlogin
account    required     pam_nologin.so
account    include      password-auth
password   include      password-auth
# pam_selinux.so close should be the first session rule
session    required     pam_selinux.so close
session    required     pam_loginuid.so
# pam_selinux.so open should only be followed by sessions to be executed in the user context
session    required     pam_selinux.so open env_params
session    optional     pam_keyinit.so force revoke
session    include      password-auth
session    include      postlogin

Yes, a seemingly newer rpmnew file lingering around.
So I moved /etc/pam.d/sshd.rpmnew over /etc/pam.d/sshd.

Comment 16 Michal Schmidt 2012-08-01 14:43:06 UTC

(In reply to comment #15)
> So I moved /etc/pam.d/sshd.rpmnew over /etc/pam.d/sshd.

Well, did it achieve the desired result?

Comment 17 udo 2012-08-01 14:51:53 UTC

Only for new sessions.
So I log out of the box, make a new connextion.
`su -` to root.
When I do a control-D then, then the su process is not killed and exits gracefully.

So we must have a certain pam config and AUDIT_whatever support in the kernel to have this basic functionality, right?
Or did we miss certain requirements?

Comment 18 Michal Schmidt 2012-08-01 16:35:08 UTC

(In reply to comment #17)
> Only for new sessions.

OK, that is expected. Thanks for testing.

> So we must have a certain pam config and AUDIT_whatever support in the kernel

As far as I am concerned, yes. I am not willing to put much effort into making the thing work perfectly without audit or loginuid. If anyone else wants to work on it, I'm happy to accept their patches.


We haven't heard from František, but I suspect his problem with screen may have had a similar cause. I'm closing this bug as WORKSFORME, to express that the bug is not expected to happen with proper kernel and PAM configuration. Reopen if it happens anyway.

Comment 19 Michal Schmidt 2012-08-01 16:38:15 UTC

*** Bug 844399 has been marked as a duplicate of this bug. ***

Comment 20 udo 2012-08-02 14:26:20 UTC

So Michal, wo do you think could make the thing work perfectly without audit or loginuid?

And why should systemd-logind even kill processes?
(I never ever explicitly configured anything to do so)

Comment 21 Jindrich Novy 2012-09-11 16:20:41 UTC

*** Bug 811046 has been marked as a duplicate of this bug. ***

Comment 22 udo 2012-09-11 16:26:11 UTC

Why should systemd-logind even kill processes?
No one ever explained that one.

Comment 23 Nate Clark 2012-10-11 18:11:45 UTC

I am seeing a very similar issue on Fedora 17 with the default pam configuration from the rpms and fedora kernel.

I still see su being killed when the child process exits. It doesn't happen when I run su by hand but when it is run by some monitoring scripts being executed by a daemon.

I also see that sshd is being killed when its child exits. This has left hundreds to thousands of ssh agent directories abandoned in /tmp.

The root cause of these kills are coming from systemd login/logind-session.c:session_terminate_cgroup this is called when stop session is called.

(gdb) bt
#0  session_terminate_cgroup (s=0x1ab8350) at src/login/logind-session.c:631
#1  session_stop (s=s@entry=0x1ab8350) at src/login/logind-session.c:693
#2  0x000000000040771c in manager_pipe_notify_eof (fd=11, m=0x1aa6010) at src/login/logind.c:989
#3  manager_run (m=m@entry=0x1aa6010) at src/login/logind.c:1376
#4  0x0000000000404d6d in main (argc=<optimized out>, argv=<optimized out>) at src/login/logind.c:1447

The only way have found to prevent these premature kills is by disabling the pam_systemd module.

These are the versions of the rpms:
systemd-44-17.fc17.x86_64
kernel-3.5.5-2.fc17.x86_64
pam-1.1.5-7.fc17.x86_64
openssh-5.9p1-26.fc17.x86_64

Comment 24 udo 2012-10-12 11:50:16 UTC

Even with or without 'proper' implementation of the configuration aspects I consider it a bug that processes that are exiting are not given the time to exit at all.
In my case, shells that I was logging out of were killed.
This means that a normal unix action of a process, exiting itself, was interrupted for no good reason.
Where does what process signal what control group?
What happens when we do not do this for the stuff we mentioned?
It feels like unneeded feature creep at this point that just causes issues.
Same case as the swap issue and systemd. (still not fixed)

Note You need to log in before you can comment on or make changes to this bug.