Bug 488147
Summary: | no screen lock (unix_chkpwd seg fault) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Marius Ileana <mariusid> |
Component: | pam | Assignee: | Tomas Mraz <tmraz> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 10 | CC: | jakub, mariusid, security-response-team, tmraz, vdanen |
Target Milestone: | --- | Keywords: | Security |
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 1.0.4-4.fc9 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-04-14 15:53:03 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Marius Ileana
2009-03-02 21:21:17 UTC
You can make it public. It is a security related bug, but not a "Security Sensitive Bug". 10x Pasting your /etc/pam.d/system-auth file would be helpful. Something is segfaulting, so it would be good to know your authentication sources (i.e. if you're using LDAP or NIS, etc.). FWIW, I don't get this behaviour with F10/x86_64 using local authentication, and I didn't see it with a F10/x86 box using LDAP authentication. As for a time "estimate", would you say it happens at least daily, or every few hours? More or less often? Also please attach the /etc/nsswitch.conf. It would really help if you could get the backtrace from the crash although I am afraid that this will not be an easy task. I see 2 problems here: 1. the crash - I suppose this crash happens in some nss module (what is your glibc package version-release)? 2. pam_unix module doesn't guard against crash in unix_chkpwd - it should probably rather disallow access than allow it in case of the crash. Also when this problem with screensaver happens to you what happens when you try to login on the text console? Hi Vincent, hi Tomas, Thanks for the quick feedback. Here's the data requested: $ cat /etc/pam.d/system-auth #%PAM-1.0 auth required pam_shells.so auth required pam_unix.so likeauth nullok try_first_pass auth optional pam_ecryptfs.so unwrap auth required pam_nologin.so # account required pam_unix.so # password optional pam_ecryptfs.so password required pam_passwdqc.so min=disabled,8,8,8,8 passphrase=0 enforce=users password sufficient pam_unix.so remember=8 nullok use_authtok md5 shadow password required pam_deny.so # session required pam_limits.so session required pam_unix.so password optional pam_ecryptfs.so unwrap session optional pam_console.so [snap@thoth ~]$ I'm using local authentication. This problem happens daily. Later I'll try to find in which case it starts happening. When the problem starts I think only a reboot solve it (I'll check later if a logoff/logon is a workaround or not). Glibc version is 2.9.3 $ rpm -q glibc glibc-2.9-3.i686 $ I don't think that will affect the text console (I'll get back to you with other details if available). Please tell me what else can I do to help you in this case. Again, thanks a lot for the feedback. Oh, and the content of nsswitch.conf file also: $ cat /etc/nsswitch.conf # # /etc/nsswitch.conf # # An example Name Service Switch config file. This file should be # sorted with the most-used services at the beginning. # # The entry '[NOTFOUND=return]' means that the search for an # entry should stop if the search in the previous entry turned # up nothing. Note that if the search failed due to some other reason # (like no NIS server responding) then the search continues with the # next entry. # # Legal entries are: # # nisplus or nis+ Use NIS+ (NIS version 3) # nis or yp Use NIS (NIS version 2), also called YP # dns Use DNS (Domain Name Service) # files Use the local files # db Use the local database (.db) files # compat Use NIS on compat mode # hesiod Use Hesiod for user lookups # [NOTFOUND=return] Stop searching if not found so far # # To use db, put the "db" in front of "files" for entries you want to be # looked up first in the databases # # Example: #passwd: db files nisplus nis #shadow: db files nisplus nis #group: db files nisplus nis passwd: files shadow: files group: files #hosts: db files nisplus nis dns hosts: files mdns4_minimal [NOTFOUND=return] dns # Example - obey only what nisplus tells us... #services: nisplus [NOTFOUND=return] files #networks: nisplus [NOTFOUND=return] files #protocols: nisplus [NOTFOUND=return] files #rpc: nisplus [NOTFOUND=return] files #ethers: nisplus [NOTFOUND=return] files #netmasks: nisplus [NOTFOUND=return] files bootparams: nisplus [NOTFOUND=return] files ethers: files netmasks: files networks: files protocols: files rpc: files services: files netgroup: nisplus publickey: nisplus automount: files nisplus aliases: files nisplus $ Do you have nscd enabled and running or not? No, Tomas, the service is stopped. # service nscd status nscd is stopped # I'm having this problem almost all the time. Other interesting thing is that things are running fine after a reboot only (but not after a logoff/logon) and when exiting the screensaver it prompts me for a password or I can manually lock screen. But after a while I discovered that even if it is working (screensaver exists with the password dialog, if I let it run then the next screensaver exit place me back to the desktop without any password dialog. This means that unix_chkpwd starts to crash on both cases: when I manually lock the screen (which starts the screensaver, but after pressing a key returns directly to the desktop) or when exiting the screensaver (pressing any key while running). In case of a trace or something else, please provide me with the steps to follow and I'll try to provide you further details. Thank you. At the time the unix_chkpwd starts crashing in the screensaver can you try it to run manually: echo -ne '<password>\0000' | /sbin/unix_chkpwd <user> nullok ; echo $? Replace the <password> with the password of your account and the <user> with the user name. Does it crash too? If it does, can you find any related entries in the audit log - use ausearch -x unix_chkpwd. Does it still crash if you replace nullok with some bogus word? It should return 4 as exit value and add an audit entry and of course not crash. Well, it doesn't crash in this case: [snap@thoth ~]$ echo -ne '******\0000' | /sbin/unix_chkpwd snap nullok ; echo $? 0 [snap@thoth ~]$ Generally, When it crashes I get also the two aforementioned messages in /var/log/messages. Of course, it didn't crash in this case since no such messages were reported in messages log file. And audit log shows only old records about it, no current one: # ausearch -i | grep unix_chkpwd | tail -n 5 type=ANOM_ABEND msg=audit(03/03/2009 21:14:30.477:92) : auid=snap uid=snap gid=snap ses=1 pid=12474 comm=unix_chkpwd sig=Segmentation fault type=ANOM_ABEND msg=audit(03/03/2009 21:14:30.483:93) : auid=snap uid=snap gid=snap ses=1 pid=12475 comm=unix_chkpwd sig=Segmentation fault type=ANOM_ABEND msg=audit(03/03/2009 21:23:44.502:95) : auid=snap uid=snap gid=snap ses=1 pid=12805 comm=unix_chkpwd sig=Segmentation fault type=USER_AUTH msg=audit(03/03/2009 21:24:35.600:96) : user pid=12819 uid=snap auid=snap ses=1 msg='op=PAM:unix_chkpwd acct=snap exe=/sbin/unix_chkpwd (hostname=?, addr=?, terminal=pts/0 res=success)' type=ANOM_ABEND msg=audit(03/03/2009 21:23:44.499:94) : auid=snap uid=snap gid=snap ses=1 pid=12804 comm=unix_chkpwd sig=Segmentation fault # When replacing nullok with some bogus word: [snap@thoth ~]$ echo -ne 'id5n4pIT\0000' | /sbin/unix_chkpwd snap fakeword ; echo $? bash: echo: write error: Broken pipe 4 [snap@thoth ~]$ indeed it returns 4 and no crash is reported: messages log file is clean and audit log reports: type=ANOM_EXEC msg=audit(03/03/2009 21:30:21.705:97) : user pid=12965 uid=snap auid=snap ses=1 msg='op=PAM:unix_chkpwd acct=snap exe=/sbin/unix_chkpwd (hostname=?, addr=?, terminal=pts/0 res=failed)' That's it for now. Thanks again Tomas. Please download and install testing packages pam-1.0.2-2.1debug.fc10 from http://people.redhat.com/tmraz/testing The unix_chkpwd will write debug messages to /var/log/secure when it runs. What messages from unix_chkpwd do you see in /var/log/secure when the unix_chkpwd starts crashing? Currently I have pam-1.0.2-2.fc10.i386 installed. But I can't find the debug version of it in your repo. I'll look in some other places too. The URL mentioned above does not point to a YUM repository. You have to download and install the pam-1.0.2-2.1debug.fc10 packages manually with rpm. Yes, but i need i386 version. I see only the debug version for X86_64. I'm using i386 so I need that version. Am I missing something? $ uname -a Linux thoth.<domain> 2.6.27.19-170.2.35.fc10.i686.PAE #1 SMP Mon Feb 23 13:09:26 EST 2009 i686 i686 i386 GNU/Linux $ Tomas, I do need to install pam-1.0.2-2.1debug.fc10.i386.rpm package, right? However, I can't find it for now... Can you please upload this version too? 10x You can download them now from the http://people.redhat.com/tmraz/testing/ I had some problems building the i386 packages in mock. Ok, so the debug version of pam is installed: # rpm -ivh pam-1.0.2-2.1debug.fc10.i386.rpm --force Preparing... ########################################### [100%] 1:pam ########################################### [100%] Updating /etc/pam.d/system-auth... # (the other package pam-debuginfo-1.0.2-2.1debug.fc10.i386.rpm was not installed). First case: Having the system started from hibernate (which means the problem should still be there), using System > Lock Screen it blanks the screen, screensaver doesn't start (I've notice that in the past also, but that's not a problem anyway), leaving it a couple of seconds then pressing a key, here are the related messages in /var/log/secure: Mar 5 20:59:50 thoth gnome-screensaver-dialog: PAM unable to dlopen(/lib/security/pam_ecryptfs.so): /lib/security/pam_ecryptfs.so: cannot open shared object file: No such file or directory Mar 5 20:59:50 thoth gnome-screensaver-dialog: PAM adding faulty module: /lib/security/pam_ecryptfs.so Mar 5 20:59:50 thoth gnome-screensaver-dialog: pam_unix(gnome-screensaver:account): read unix_chkpwd output error 0: Success Indeed, there is no /lib/security/pam_ecryptfs.so file. But as you can see, no unix_chkpwd crash or debug data yet. Second case: Now I was rebooting the system because I was curious how come it's working at the startup. So it's simply working as it should. Now trying again to lock the screen, of course it's working (since again, after a normal startup or reboot is working for a while), and these are the debug messages generated: Mar 5 21:21:12 thoth gnome-screensaver-dialog: PAM unable to dlopen(/lib/security/pam_ecryptfs.so): /lib/security/pam_ecryptfs.so: cannot open shared object file: No such file or directory Mar 5 21:21:12 thoth gnome-screensaver-dialog: PAM adding faulty module: /lib/security/pam_ecryptfs.so Mar 5 21:21:12 thoth unix_chkpwd[4763]: Started. Mar 5 21:21:12 thoth unix_chkpwd[4763]: Signals set up. Mar 5 21:21:12 thoth unix_chkpwd[4763]: Not at tty. Mar 5 21:21:12 thoth unix_chkpwd[4763]: User from getuidname(): snap Mar 5 21:21:12 thoth unix_chkpwd[4763]: Passwords read: 1 Mar 5 21:21:12 thoth unix_chkpwd[4763]: Hash obtained: 0 Mar 5 21:21:12 thoth unix_chkpwd[4763]: Hash verified: 7 Mar 5 21:21:12 thoth unix_chkpwd[4763]: Password verification result: 7 Mar 5 21:21:19 thoth unix_chkpwd[4764]: Started. Mar 5 21:21:19 thoth unix_chkpwd[4764]: Signals set up. Mar 5 21:21:19 thoth unix_chkpwd[4764]: Not at tty. Mar 5 21:21:19 thoth unix_chkpwd[4764]: User from getuidname(): snap Mar 5 21:21:19 thoth unix_chkpwd[4764]: Passwords read: 1 Mar 5 21:21:19 thoth unix_chkpwd[4764]: Hash obtained: 0 Mar 5 21:21:19 thoth unix_chkpwd[4764]: Hash verified: 0 Mar 5 21:21:19 thoth unix_chkpwd[4764]: Password verification result: 0 Mar 5 21:21:19 thoth unix_chkpwd[4765]: Started. Mar 5 21:21:19 thoth unix_chkpwd[4765]: Signals set up. Mar 5 21:21:19 thoth unix_chkpwd[4765]: Not at tty. Mar 5 21:21:19 thoth unix_chkpwd[4765]: User from getuidname(): snap I didn't have the patience to wait for the issue to appear after reboot, so I installed ecryptfs-utils-61-0.fc10.i386.rpm (and trousers-0.3.1-9.fc10.i386.rpm as required) in order to have /lib/security/pam_ecryptfs.so file. Indeed, no complains about this file anymore. Unfortunately, for now there is still no unix_chkpwd crash anymore. I'll get back to you asap in case of the next crash. I think if I don't get the issue again, then I'll install back the standard pam version. I don't think that the missing .so file influenced the crash (I'll check that later, when I'll install back again the standard pam version if the crash doesn't happen). Should I do something else? Again, many thanks. I really appreciate your support! The issue started to happen again (alright, way later than it happend in the past, but it still happens). Messages related to it are the following: - in /var/log/messages: Mar 6 12:36:47 thoth kernel: unix_chkpwd[29542] general protection ip:468978 sp:bfea2c74 error:0 in ld-2.9.so[450000+20000] Mar 6 12:36:47 thoth kernel: unix_chkpwd[29544] general protection ip:c40900 sp:bf9b0784 error:0 in ld-2.9.so[c28000+20000] -in /var/log/secure: Mar 6 12:36:47 thoth gnome-screensaver-dialog: pam_unix(gnome-screensaver:account): read unix_chkpwd output error 0: Success - in the audit log: type=ANOM_ABEND msg=audit(03/06/2009 12:36:47.036:178) : auid=snap uid=snap gid=snap ses=1 pid=29542 comm=unix_chkpwd sig=Segmentation fault type=ANOM_ABEND msg=audit(03/06/2009 12:36:47.039:179) : auid=snap uid=snap gid=snap ses=1 pid=29544 comm=unix_chkpwd sig=Segmentation fault I also installed pam-debuginfo package: # rpm -ivh pam-debuginfo-1.0.2-2.1debug.fc10.i386.rpm Preparing... ########################################### [100%] 1:pam-debuginfo ########################################### [100%] # but I think I should provide the debug versions in /etc/pam.d/system-auth, right? Should I do other changes as well to get further details? Please advice. Thank you. Unfortunately the logs or rather nonexistence of any message from the unix_chkpwd in /var/log/secure means that the crash happens during the load of the shared libraries. I'm reassigning the bug to glibc. Hello, Can somebody continue on this issue please? As I said, I am willing to help on this problem since I really need to have this working properly. Thank you The crash is likely in __libc_check_standard_fds. If a setuid/setgid/other AT_SECURE process is started without the standard file descriptors open (0, 1, 2), glibc during startup will try to open /dev/full on fd 0 and /dev/null on fd 1 and 2 (whichever is closed during exec), to avoid various exploits. The hlt insn is executed if it failed to open /dev/full resp. /dev/null, or if they aren't character devices, or if they don't have the expected major/minor number. So most likely after a while something screws up your /dev/null device, e.g. replaces it with a normal file or removes it etc. The bug is in whatever corrupts /dev/null resp. /dev/full and partially in whatever starts suid/sgid binaries knowingly with one or more of the standard file descriptors closed. Hi Jakub, very good info, thanks a lot for the feedback. You are right, one of the apps (I need to find out which one) corrupts the /dev/null device. After recreating it as it should be, screen locking works again. :-) # ls -l /dev/null -rw-r--r-- 1 root root 0 2009-03-13 00:20 /dev/null # # rm -f /dev/null; mknod -m 666 /dev/null c 1 3 # # ls -l /dev/null crw-rw-rw- 1 root root 1, 3 2009-03-13 00:21 /dev/null # And of course, since I still have pam-debug installed, this action is captured in /var/log/secure with the following info: Mar 13 00:22:43 thoth unix_chkpwd[22656]: Started. Mar 13 00:22:43 thoth unix_chkpwd[22656]: Signals set up. Mar 13 00:22:43 thoth unix_chkpwd[22656]: Not at tty. Mar 13 00:22:43 thoth unix_chkpwd[22656]: User from getuidname(): snap Mar 13 00:22:43 thoth unix_chkpwd[22656]: Passwords read: 1 Mar 13 00:22:43 thoth unix_chkpwd[22656]: Hash obtained: 0 Mar 13 00:22:43 thoth unix_chkpwd[22656]: Hash verified: 7 Mar 13 00:22:43 thoth unix_chkpwd[22656]: Password verification result: 7 Mar 13 00:22:51 thoth unix_chkpwd[22662]: Started. Mar 13 00:22:51 thoth unix_chkpwd[22662]: Signals set up. Mar 13 00:22:51 thoth unix_chkpwd[22662]: Not at tty. Mar 13 00:22:51 thoth unix_chkpwd[22662]: User from getuidname(): snap Mar 13 00:22:51 thoth unix_chkpwd[22662]: Passwords read: 1 Mar 13 00:22:51 thoth unix_chkpwd[22662]: Hash obtained: 0 Mar 13 00:22:51 thoth unix_chkpwd[22662]: Hash verified: 0 Mar 13 00:22:51 thoth unix_chkpwd[22662]: Password verification result: 0 Mar 13 00:22:51 thoth unix_chkpwd[22663]: Started. Mar 13 00:22:51 thoth unix_chkpwd[22663]: Signals set up. Mar 13 00:22:51 thoth unix_chkpwd[22663]: Not at tty. Mar 13 00:22:51 thoth unix_chkpwd[22663]: User from getuidname(): snap In conclusion, I think I should run a small 'monitor' (maybe a simple C prog) to collect the list of processes from time to time, catch de change of /dev/null character device and compare the result to discover the culprit. From your point of view, should I still leave pam-debug package in my system or not? I'm still willing to help in this case as long I don't brake the system: it's my main production laptop. ;-) Thanks again. W4F. You can leave the debug packages in the system if you don't mind the excessive logging to /var/log/secure. I will make an update package in Fedora soon anyway but of course you will still need to find out what application messes up the /dev/null. pam-1.0.4-2.fc9 has been submitted as an update for Fedora 9. http://admin.fedoraproject.org/updates/pam-1.0.4-2.fc9 pam-1.0.4-2.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/pam-1.0.4-2.fc10 pam-1.0.4-2.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update pam'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-2783 pam-1.0.4-2.fc9 has been pushed to the Fedora 9 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing-newkey update pam'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2009-2849 Package was updated on my system (together with this update, I removed pam-debug version). I've noticed that /dev/null didn't get corrupted anymore, although I didn't discover which app was doing it. Now my question is: in case /dev/null becomes corrupted again (as a normal file instead of a char device), will I be able to login to my system? Or I have to reboot is or something to make it work? Thanks for the feedback! I've tried to allow this but you should test this. pam-1.0.4-3.fc9 has been submitted as an update for Fedora 9. http://admin.fedoraproject.org/updates/pam-1.0.4-3.fc9 pam-1.0.4-3.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update pam'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-2783 pam-1.0.4-3.fc9 has been pushed to the Fedora 9 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing-newkey update pam'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2009-3061 pam-1.0.4-4.fc9 has been submitted as an update for Fedora 9. http://admin.fedoraproject.org/updates/pam-1.0.4-4.fc9 pam-1.0.4-4.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/pam-1.0.4-4.fc10 pam-1.0.4-4.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update pam'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-3204 pam-1.0.4-4.fc9 has been pushed to the Fedora 9 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing-newkey update pam'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2009-3231 pam-1.0.4-4.fc10 has been pushed to the Fedora 10 stable repository. If problems still persist, please make note of it in this bug report. pam-1.0.4-4.fc9 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report. The app that corrupted /dev/null is Vodafone Mobile Connect client, installed with the package vodafone-mobile-connect-1.99.17-8.noarch. I didn't used it, therefore the issue dissapeared. I have pam updated to the latest version: pam-1.0.4-4.fc10.i386. Now, testing again by corrupting /dev/null char device: # ls -l /dev/null crw-rw-rw- 1 root root 1, 3 2009-04-18 14:59 /dev/null # # rm -f /dev/null; touch /dev/null; ls -l /dev/null -rw-r--r-- 1 root root 0 2009-04-20 15:04 /dev/null # "Lock Screen" works fine! No related messages are generated in /var/log/messages or /var/log/secure. I just created back the device as designed: # ls -l /dev/null -rw-r--r-- 1 root root 0 2009-04-20 15:04 /dev/null # # rm -f /dev/null; mknod -m 666 /dev/null c 1 3 # # ls -l /dev/null crw-rw-rw- 1 root root 1, 3 2009-04-20 15:12 /dev/null # I consider this problem solved! Again, thank you very much for the support and the quick feedback. It was a pleasure to help the community support to improve the experience! Keep up the good work! |