Description of problem: In some cases systemd perhaps kill bash process, so that isn't able save it's history file. This may occur randomly perhaps when machine is rebooted too (but i'm not able simulate it now, it's only speculation). But below described sequence give me 100% results. Version-Release number of selected component (if applicable): systemd-37-19.fc16.i686 bash-4.2.24-1.fc16.i686 screen-4.1.0-0.7.20110328git8cf5ef.fc16.i686 kernel-3.3.4-3.fc16.i686.PAE Steps to Reproduce: 1. from console shell run "screen" program. The screen start bash for user. 2. in this subshell type several commands 3. then with C-a k leave "screen" 4. history file (~/.bash_history) should contain entered commands appended, but these not there. Additional info: what is interesting, when before point 3) above I tell systemd do debugging (by "/bin/kill -s SIGRTMIN+22 1" command), then bash suffice to save its history - although in /var/log/messages are no systemd debug messages (with the exception of "systemd[1]: Setting log level to debug." line). And when I then will send "/bin/kill -s SIGRTMIN+23 1" command), and then repeate steps 1-4 above, then in messages is only "systemd[1]: Setting log level to info." line, and in history file are omitted subshell commands. In F14 with upstart init this is of course working fine and history is always safe saved. This problem is quite equivalent to bug 811046 (https://bugzilla.redhat.com/show_bug.cgi?id=811046), which I reported recently.
I can't reproduce this. The history gets saved for me. (In reply to comment #0) > Steps to Reproduce: > 1. from console shell run "screen" program. The screen start bash for user. > 2. in this subshell type several commands At this point try stracing the bash process from another terminal: strace -o mytrace.txt -p $PID_OF_THE_BASH_UNDER_SCREEN > 3. then with C-a k leave "screen" > 4. history file (~/.bash_history) should contain entered commands appended, > but these not there. If the problem is reproduced, attach mytrace.txt.
See https://bugzilla.redhat.com/show_bug.cgi?id=844399 Similar issue?
Could be related. Still waiting for the information requested in comment #1.
I did: # strace -o t.t su - mythtv Support your local Search and Rescue unit -- get lost. [mythtv@recorder ~]$ logout ...killed. [root@recorder mythtv]# cat t.t t.t is attached
Created attachment 601535 [details] strace
(strace...) setregid(4294967295, 0) = 0 close(5) = 0 getuid() = 0 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 4 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=27360, si_uid=0} --- rt_sigreturn() = 4 fstat(4, {st_mode=S_IFREG|0644, st_size=2231, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe49878c000 read(4, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2231 close(4) = 0 munmap(0x7fe49878c000, 4096) = 0 getuid() = 0 sendto(3, "<86>Jul 31 16:07:33 su: pam_unix"..., 78, MSG_NOSIGNAL, NULL, 0) = 78 socket(PF_NETLINK, SOCK_RAW, 9) = -1 EPROTONOSUPPORT (Protocol not supported) socket(PF_NETLINK, SOCK_RAW, 9) = -1 EPROTONOSUPPORT (Protocol not supported) socket(PF_NETLINK, SOCK_RAW, 9) = -1 EPROTONOSUPPORT (Protocol not supported) munmap(0x7fe490cb7000, 2101296) = 0 munmap(0x7fe490870000, 2109632) = 0 munmap(0x7fe490657000, 2196352) = 0 munmap(0x7fe49043e000, 2198200) = 0 munmap(0x7fe49023c000, 2101264) = 0 munmap(0x7fe490038000, 2109616) = 0 munmap(0x7fe48fe35000, 2105496) = 0 munmap(0x7fe48fc0d000, 2118208) = 0 munmap(0x7fe48fa00000, 2149504) = 0 munmap(0x7fe48f7fd000, 2105480) = 0 munmap(0x7fe48f5f8000, 2113840) = 0 munmap(0x7fe48f3ee000, 2136776) = 0 munmap(0x7fe48f1ea000, 2109632) = 0 munmap(0x7fe48efe6000, 2111936) = 0 munmap(0x7fe48eda2000, 2374032) = 0 munmap(0x7fe48e779000, 2113872) = 0 munmap(0x7fe48e574000, 2114032) = 0 munmap(0x7fe490a74000, 2230304) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], ~[KILL ALRM TERM STOP RTMIN RT_1], 8) = 0 nanosleep({2, 0}, 0x7fff4a174120) = 0 kill(15079, SIGKILL) = -1 ESRCH (No such process) open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 4 fstat(4, {st_mode=S_IFREG|0644, st_size=2512, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe49878c000 read(4, "# Locale name alias data base.\n#"..., 4096) = 2512 read(4, "", 4096) = 0 close(4) = 0 munmap(0x7fe49878c000, 4096) = 0 open("/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) write(2, " ...killed.\n", 12) = 12 close(1) = 0 close(2) = 0 exit_group(0) = ? +++ exited with 0 +++ [root@recorder mythtv]# ps -ef|grep 27360 root 15154 7583 0 16:11 pts/0 00:00:00 grep --color=auto 27360 root 27360 1 0 Jul05 ? 00:00:01 /usr/lib/systemd/systemd-logind
So we can see that the su process is sent a SIGTERM by systemd-logind. Judging from the output this is a bit too early as su should be allowed to exit normally. Same stuff could happen in a bash-related situation like this bug mentions. Can someone explain why and under what conditions a SIGTERM is sent so that su cannot complete the exit? What exactly is the function of systemd-logind? I do not recall a similar process in the pre-systemd days.
Please check the output of this command before running su in the root shell and then the same command from the shell of the mythtv user: grep "" /proc/self/{sessionid,loginuid} systemd-logind sends SIGTERM to the leader process of a user session when that session is stopping. However, 'su' is not expected to spawn user sessions. "man systemd-logind" has some basic explanation about what it is and provides a few links for more information. In pre-systemd days we had ConsoleKit.
[root@recorder xf86-video-ati]# grep "" /proc/self/{sessionid,loginuid} grep: /proc/self/sessionid: No such file or directory grep: /proc/self/loginuid: No such file or directory [root@recorder xf86-video-ati]# su - mythtv Three may keep a secret, if two of them are dead. -- Benjamin Franklin [mythtv@recorder ~]$ grep "" /proc/self/{sessionid,loginuid} grep: /proc/self/sessionid: No such file or directory grep: /proc/self/loginuid: No such file or directory [mythtv@recorder ~]$
See http://0pointer.de/blog/projects/ids.html A great way to have certain newish tools lean on needed functionality that is not advertised, not really needed, etc.
Other box: [udo@epia ~]$ grep "" /proc/self/{sessionid,loginuid} /proc/self/sessionid:12856 /proc/self/loginuid:500 [udo@epia ~]$ su - Password: [root@epia ~]# grep "" /proc/self/{sessionid,loginuid} /proc/self/sessionid:12856 /proc/self/loginuid:500 [root@epia ~]# logout ...killed. [udo@epia ~]$ So even without these the functionality is broken. Questions: Why do we need this?
Making the system work without audit support in the kernel (as on the 'recorder' machine) is of low priority to me. So I would like to focus the debugging effort on the 'epia' machine, where audit seems to be working. After using su, I'd like to see the output of: loginctl Then for each listed session, let's have a look at: loginctl session-status $SESSION
[root@epia linux]# loginctl SESSION UID USER SEAT 4 500 udo c34 500 udo 12856 500 udo 3 sessions listed. [root@epia linux]# loginctl session-status 4 4 - udo (500) Since: Thu, 05 Jul 2012 16:42:09 +0200; 3 weeks and 5 days ago Type: unspecified ; class user State: closing CGroup: name=systemd:/user/udo/4 └ 5651 ldattach 18 /dev/ttyS0 [root@epia linux]# loginctl session-status c34 c34 - udo (500) Since: Mon, 30 Jul 2012 15:15:41 +0200; 1 day and 23h ago Leader: 32098 (sshd) TTY: ssh Remote: surflank2 Service: sshd; type tty; class user State: active CGroup: name=systemd:/user/udo/c34 ├ 32098 sshd: udo [priv] ├ 32121 sshd: udo@pts/2 └ 32134 -bash [root@epia linux]# loginctl session-status 12856 12856 - udo (500) Since: Tue, 31 Jul 2012 18:54:09 +0200; 20h ago Leader: 15857 (su) TTY: pts/2 Remote: user udo Service: su-l; type tty; class user State: active CGroup: name=systemd:/user/udo/12856 ├ 15857 su - ├ 15866 -bash └ 20971 loginctl session-status 12856
(In reply to comment #13) > [root@epia linux]# loginctl session-status c34 > c34 - udo (500) > Since: Mon, 30 Jul 2012 15:15:41 +0200; 1 day and 23h ago > Leader: 32098 (sshd) > TTY: ssh > Remote: surflank2 > Service: sshd; type tty; class user This session is interesting, because its ID "c34" indicates that it was initialized while the sessionid was not set for the process. Please check /etc/pam.d/sshd. Does it call pam_loginuid.so for the session before pam_systemd.so (which is usually included via password-auth)?
[root@epia ~]# cat /etc/pam.d/sshd #%PAM-1.0 auth include system-auth account required pam_nologin.so account required pam_access.so # udo account include system-auth password include system-auth # pam_selinux.so close should be the first session rule session required pam_selinux.so close session include system-auth session required pam_loginuid.so # pam_selinux.so open should only be followed by sessions to be executed in the user context session required pam_selinux.so open env_params session optional pam_keyinit.so force revoke [root@epia ~]# cat /etc/pam.d/sshd.rpmnew #%PAM-1.0 auth required pam_sepermit.so auth substack password-auth auth include postlogin account required pam_nologin.so account include password-auth password include password-auth # pam_selinux.so close should be the first session rule session required pam_selinux.so close session required pam_loginuid.so # pam_selinux.so open should only be followed by sessions to be executed in the user context session required pam_selinux.so open env_params session optional pam_keyinit.so force revoke session include password-auth session include postlogin Yes, a seemingly newer rpmnew file lingering around. So I moved /etc/pam.d/sshd.rpmnew over /etc/pam.d/sshd.
(In reply to comment #15) > So I moved /etc/pam.d/sshd.rpmnew over /etc/pam.d/sshd. Well, did it achieve the desired result?
Only for new sessions. So I log out of the box, make a new connextion. `su -` to root. When I do a control-D then, then the su process is not killed and exits gracefully. So we must have a certain pam config and AUDIT_whatever support in the kernel to have this basic functionality, right? Or did we miss certain requirements?
(In reply to comment #17) > Only for new sessions. OK, that is expected. Thanks for testing. > So we must have a certain pam config and AUDIT_whatever support in the kernel As far as I am concerned, yes. I am not willing to put much effort into making the thing work perfectly without audit or loginuid. If anyone else wants to work on it, I'm happy to accept their patches. We haven't heard from František, but I suspect his problem with screen may have had a similar cause. I'm closing this bug as WORKSFORME, to express that the bug is not expected to happen with proper kernel and PAM configuration. Reopen if it happens anyway.
*** Bug 844399 has been marked as a duplicate of this bug. ***
So Michal, wo do you think could make the thing work perfectly without audit or loginuid? And why should systemd-logind even kill processes? (I never ever explicitly configured anything to do so)
*** Bug 811046 has been marked as a duplicate of this bug. ***
Why should systemd-logind even kill processes? No one ever explained that one.
I am seeing a very similar issue on Fedora 17 with the default pam configuration from the rpms and fedora kernel. I still see su being killed when the child process exits. It doesn't happen when I run su by hand but when it is run by some monitoring scripts being executed by a daemon. I also see that sshd is being killed when its child exits. This has left hundreds to thousands of ssh agent directories abandoned in /tmp. The root cause of these kills are coming from systemd login/logind-session.c:session_terminate_cgroup this is called when stop session is called. (gdb) bt #0 session_terminate_cgroup (s=0x1ab8350) at src/login/logind-session.c:631 #1 session_stop (s=s@entry=0x1ab8350) at src/login/logind-session.c:693 #2 0x000000000040771c in manager_pipe_notify_eof (fd=11, m=0x1aa6010) at src/login/logind.c:989 #3 manager_run (m=m@entry=0x1aa6010) at src/login/logind.c:1376 #4 0x0000000000404d6d in main (argc=<optimized out>, argv=<optimized out>) at src/login/logind.c:1447 The only way have found to prevent these premature kills is by disabling the pam_systemd module. These are the versions of the rpms: systemd-44-17.fc17.x86_64 kernel-3.5.5-2.fc17.x86_64 pam-1.1.5-7.fc17.x86_64 openssh-5.9p1-26.fc17.x86_64
Even with or without 'proper' implementation of the configuration aspects I consider it a bug that processes that are exiting are not given the time to exit at all. In my case, shells that I was logging out of were killed. This means that a normal unix action of a process, exiting itself, was interrupted for no good reason. Where does what process signal what control group? What happens when we do not do this for the stuff we mentioned? It feels like unneeded feature creep at this point that just causes issues. Same case as the swap issue and systemd. (still not fixed)