Bug 464899
Summary: | unix_chkpwd fails with VMWare Server 2.0 on x86_64 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michal Piotrowski <bastian_knight> | ||||||||
Component: | pam | Assignee: | Tomas Mraz <tmraz> | ||||||||
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 12 | CC: | bloch, charles.tryon, clarke.barry, crashradtke, dtumpic, emisca, eparis, gczarcinski, iny, jason.whipp, john.hanauer, josh.kayse, misek, paul, ptiseo, rkhadgar, tmraz, travsaf, valdis.kletnieks | ||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-11-26 16:09:45 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Michal Piotrowski
2008-10-01 09:05:23 UTC
I would need a strace from the login process. Do not use valuable password when producing the strace. Use 'strace -o trace -f -s 200 -p <pid of the login process>' to produce the strace. Created attachment 319133 [details]
strace of VMWare authentication deamon while checking password
Hi Also here on rawhide with kernel 2.6.27-0.391.rc8.git7.fc10.x86_64 - there's some assorted speculation about this at http://communities.vmware.com/message/1068432. I have selinux enforcing and have made the change that 'tigeli' posited: /etc/pam.d/vmware-authd.bad #%PAM-1.0 auth required pam_unix.so shadow nullok account required pam_unix.so /etc/pam.d/vmware-authd.good #%PAM-1.0 auth required pam_permit.so shadow nullok account required pam_permit.so This allows the VI webapp to authenticate me and apparently work as expected. If I understand correctly, pam_permit just says 'yes' to anything, so maybe that isn't as helpful as it could be... Barry You remove all authentication with such PAM configuration. It will allow any user/password. Unfortunately I do not see the child process (which execs the unix_chkpwd) in the strace you have posted at all. What I see is that the password the login process is trying to send to the child process is empty. Is that right? Could you try again to produce the strace - perhaps wait some longer time - ensure that you see the exec of unix_chkpwd in the strace. Tomas This is slightly new stuff to me - I've used VMWare server products before but 2.0 uses a different management interface; previously this had been a some blob that provided a GUI, but the interface is now a web-based console. The console uses TomCat with a single-sign-on implementation - I've only figured some of this stuff out today by poking around. So this could just as easily be a problem with TomCat, I don't know. When I've finished posting this and upload the strace log, I'll have a look to see if the error we see is a common config issue with tomcat. It does explain what vmware is doing running chkpwd though - there's a config script you need to run which asks you to allocate a user as administrator, and this is the username you need to enter in the web interface. As far as getting an strace output, that's a little tricky as I'm not sure which process we should be tacking onto. Executing /sbin/service vmware start starts 5 processes (the init script is about 35kb and a little cryptic for me - I'm a windows admin). When the original pam.d config is in place, the vmhostd process seems to stall and this seems to be the parent of the deceased unix_chkpwd call, but the strace output for that process seems to contain not much more that 'POLLING' entries. I'm just restoring my original config and I'll try to set up a capture against the tomcat process, maybe that will help. OK, done; I tried a login at 21:10 with password 'Tuesday_7th!' and login failure took ~60 seconds to appear in the web interface. Incidentals: I just got a selinux denial on executing 'passwd'... I know I'm running rawhide & all but that seems a little overcautious. Is this likely to be related do you think? I'm using firefox but get this also using konqueror. I could use IE7 but that would require a running VM guest... Created attachment 319692 [details]
strace of tomcat process whilst failing auth
captured with
strace -o /tmp/tomcat.trace -f -ff -s 200 -p 4847
where pid 4847=tomcat
I tried to strace VMWare authentication deamon again, but I cannot find any "unix_chkpwd" strings in the strace log (or I don't know what should I look for). I am guess that I strace the right process. Here are relevant lines from ps: root 5828 5735 0 23:14 ? 00:00:00 [unix_chkpwd] <defunct> root 5735 1 0 23:13 ? 00:00:01 /usr/lib/vmware/bin/vmware-hostd -a -d -u /etc/vmware/hostd/config.xml I noticed that in the trace log that there is clone() operation and shortly after that the cloned process is killed. It is visible also in attachment from comment #2. I don't know how can I debug this problem further. I just wanted to add myself to the list and offer any help needed. I am running a fresh install of VM Server 2 on a fresh, fully updated install of F9 x86_64 (2.6.26.5-45).With SELinux enabled, I can't fire up the web-based UI under root. Logon is denied. Same zombie process (unix_chkpwd) can be found. Disable SELinux, and I can get in. Put it in Permissive mode, and it fails. Currently in Disabled mode. The attachment from comment #6 is actually not a strace. Can someone produce strace with SELinux disabled and then with SELinux enabled? I'm really curious why the exec() doesn't appear in the traces when the unix_chkpwd is definitely getting run. Comment on attachment 319692 [details]
strace of tomcat process whilst failing auth
Broken file...
Created attachment 319783 [details]
Traces with SELinux enabled and disabled
These are traces of VMWare authd with and without SELinux enabled.
Although the traces do not contain the unix_chkpwd trace it seems to me that this process is executed fine and exits with 0 but the SIGCHLD is delivered to a different process (thread?). Unfortunately I don't think anything can be done at the pam_unix code with this. The use of PAM in multithreaded daemons was always a gray area with potential unsupported things. (In reply to comment #3) > /etc/pam.d/vmware-authd > #%PAM-1.0 > auth required pam_permit.so shadow nullok > account required pam_permit.so Thanks, this worked for me too. Linux Kernel 2.6.26.6-79.fc9.x86_64 VMware Infrastructure Web AccessVersion 2.0.0 Build 122589 VMware ServerVersion 2.0.0 Build 122956 (In reply to comment #13) > (In reply to comment #3) > > /etc/pam.d/vmware-authd > > #%PAM-1.0 > > auth required pam_permit.so shadow nullok > > account required pam_permit.so > > Thanks, this worked for me too. > > Linux Kernel 2.6.26.6-79.fc9.x86_64 > VMware Infrastructure Web AccessVersion 2.0.0 Build 122589 > VMware ServerVersion 2.0.0 Build 122956 It is a really BAD idea to use this configuration - it means you just disable authentication completely. Everyone is allowed with any password. (In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #3) ... > It is a really BAD idea to use this configuration - it means you just disable > authentication completely. Everyone is allowed with any password. Clarification: On a single user machine with dual firewalls up (router + local host) thus prohibiting connections both from WAN and LAN to the machine it makes more sense to have SELinux enabled than having pam authentication running with vmware. This is just a minor inconvenience. If the above is NOT true then yeah it would be a foolish approach. ok, lets find out what is exec'ing unix_chkpwd and how selinux is getting in the way. Can we get a copy of the following with selinux enforcing? auditctl -w /sbin/unix_chkpwd [reproduce problem] service auditd restart ausearch -ts recent I don't think the problem is with SELinux. The problem is with multithreaded application calling PAM in such way the SIGCHLD signal is lost somehow. Comment #8 is what I'm wondering about... With SELinux enabled, I can't fire up the web-based UI under root. Logon is denied. Same zombie process (unix_chkpwd) can be found. Disable SELinux, and I can get in. Put it in Permissive mode, and it fails. I really want to know what operation selinux could possibly be denying in permissive. (I don't want to say I don't believe you, but I find it highly unlikely) pam_unix doesn't exec unix_chkpwd when SELinux is completely disabled. But it will exec it both with permissive and enforcing modes. I think this must be fixed in the VMWare server so the SIGCHLD is properly handled (SIG_DFL) and so the waitpid() call in the pam_unix module will not fail. (In reply to comment #16) > ok, lets find out what is exec'ing unix_chkpwd and how selinux is getting in > the way. Can we get a copy of the following with selinux enforcing? > > auditctl -w /sbin/unix_chkpwd > [reproduce problem] > service auditd restart > > ausearch -ts recent time->Mon Dec 22 20:08:00 2008 type=CONFIG_CHANGE msg=audit(1229976480.687:15093): auid=500 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op=add rule key=(null) list=4 res=1 ---- time->Mon Dec 22 20:08:30 2008 type=PATH msg=audit(1229976510.198:15094): item=1 name=(null) inode=98452 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0 type=PATH msg=audit(1229976510.198:15094): item=0 name="/sbin/unix_chkpwd" inode=229624 dev=fd:00 mode=0104755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:chkpwd_exec_t:s0 type=CWD msg=audit(1229976510.198:15094): cwd="/var/log/vmware" type=EXECVE msg=audit(1229976510.198:15094): argc=3 a0="/sbin/unix_chkpwd" a1="paul" a2="nullok" type=SYSCALL msg=audit(1229976510.198:15094): arch=c000003e syscall=59 success=yes exit=0 a0=46406d8 a1=41112840 a2=4846748 a3=3b69967a70 items=2 ppid=3719 pid=29979 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="unix_chkpwd" exe="/sbin/unix_chkpwd" subj=system_u:system_r:initrc_t:s0 key=(null) ---- time->Mon Dec 22 20:09:30 2008 type=DAEMON_END msg=audit(1229976570.964:7129): auditd normal halt, sending auid=500 pid=30010 subj=unconfined_u:system_r:initrc_t:s0 res=success ---- time->Mon Dec 22 20:09:32 2008 type=DAEMON_START msg=audit(1229976572.168:5785): auditd start, ver=1.7.5 format=raw kernel=2.6.27.5-37.fc9.x86_64 auid=500 pid=30028 res=success (In reply to comment #19) > pam_unix doesn't exec unix_chkpwd when SELinux is completely disabled. But it > will exec it both with permissive and enforcing modes. > > I think this must be fixed in the VMWare server so the SIGCHLD is properly > handled (SIG_DFL) and so the waitpid() call in the pam_unix module will not > fail. Any thoughts on why this only seems to affect x86_64 and not i386 users? Just tested on the new VMWare build (VMware-server-2.0.1-156745.x86_64) and it has the same problem. I am seeing this problem too on the latest F10 Kernel: 2.6.27.21-170.2.56.fc10.x86_64 VMware-server-2.0.1-156745.x86_64 I have SELinux set to "Permissive" mode. I see the same thing with Zombied unix_chkpwd process. I've also seen instances where the shutdown process hangs up because the rc.d script can't get to the administrative process to shut down the "running" VM's. I have not reproduced this behavior yet, but I'm working on trying to narrow it down. With VMware Server 2.0.1 (156745.x86_64) installed and configured in the init scripts to start at system boot time, you cannot (1) do a clean shutdown, because the shutdown script attempts to contact the admin process to shut down VMs, or (2) uninstall VMware Server, because the uninstall script apparently tries to do the same thing. This is true regardless of the SE-Linux settings (enforcing, permissive or disabled). The workaround is to disable the vmware script for your runlevel, and THEN uninstall. POSSIBLE WORKAROUND: Hummm... One of the options in the VMware server configuration script is the administrative user. This defaults to the root user. If you change this to an ordinary user, then you do NOT see the problem with the system hanging up when you try to log in. There are some other problems with permissions when you try to CREATE a new VM, but this sounds like permissions problems in the VMware process. (In reply to comment #24) > POSSIBLE WORKAROUND: > > Hummm... One of the options in the VMware server configuration script is the > administrative user. This defaults to the root user. > > If you change this to an ordinary user, then you do NOT see the problem with > the system hanging up when you try to log in. There are some other problems > with permissions when you try to CREATE a new VM, but this sounds like > permissions problems in the VMware process. I've always used an ordinary user, still can't log in. In the process list you see: unix_chkpwd <defunct>. I've actually never tried leaving that option blank, I've always set this to a particular person then changed as need. My understanding is that root is always an admin (in terms of VMware) whether you select an ordinary user or not. This was tested today with the following: Fedora 10 2.6.27.21-170.2.56.fc10.x86_64 VMware-server-2.0.1-156745.x86_64 SE - Permissive -Jase I read through this entire thread and it's very interesting stuff. I tried changing the /etc/pam.d/vmware-authd (didn't work) tried re-installing (didn't work) So... even though I didn't believe this would help... I changed my SELINUX config to disabled and now I am able to connect. Here is my config. VMware Infrastructure Web Access Version 2.0.0Build 128374 VMware Server Version 2.0.1Build 156745 [root@seraph ~]# uname -a Linux seraph.matrix.xxx.com 2.6.27.24-170.2.68.fc10.x86_64 #1 SMP Wed May 20 22:47:23 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux ... thanks much for everyone's diligence and suggestions... I was literally one reboot away from throwing Suse back on it to proceed. I had spent a while mis-diagnosing and doing sloppy/horrible troubleshooting to try and find my way through this problem. (In reply to comment #26) > I read through this entire thread and it's very interesting stuff. > > I tried > changing the /etc/pam.d/vmware-authd (didn't work) Using pam_permit.so works reliably for me, but... > tried re-installing (didn't work) you have to remember to kill the existing vmware processes and restart them after making that change, and you have to remember to edit /etc/pam.d/vmware-authd after every time you run vmware-config.pl because doing that reinstalls the original /etc/pam.d/vmware-authd. > So... even though I didn't believe this would help... I changed my SELINUX > config to disabled and now I am able to connect. Disabling SELinux is not an option for me, which is why I'm prepared to live with the pam_permit.so workaround until I can move from vmware to a kvm-based virt solution. This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Any update? I have this type of issue on Fedora 11 x86_64 with vmware server 2.01. # rpm -qa|grep VMw VMware-server-2.0.1-156745.x86_64 [root@gcalisse ~]# rpm -qa|grep pam fprintd-pam-0.1-9.git04fd09cfa.fc11.x86_64 pam_passwdqc-1.0.5-2.x86_64 pam_smb-1.1.7-10.fc11.x86_64 pam-1.0.91-6.fc11.x86_64 pam_pkcs11-0.5.3-28.x86_64 pam_ccreds-7-4.fc11.x86_64 spambayes-1.0.4-7.fc11.noarch pam_krb5-2.3.4-1.fc11.x86_64 [root@gcalisse ~]# ps aux|grep uni root 7919 0.0 0.0 0 0 ? Z 16:35 0:00 [unix_chkpwd] <defunct> root 7946 0.0 0.0 91076 848 pts/0 S+ 16:41 0:00 grep uni # ps aux|grep vm root 6065 0.3 1.2 226184 48528 ? Ssl 16:33 0:01 /usr/lib/vmware/bin/vmware-hostd -a -d -u /etc/vmware/hostd/config.xml root 7473 0.0 0.0 87072 524 ? Ss 16:35 0:00 /usr/bin/vmnet-bridge -d /var/run/vmnet-bridge-0.pid -n 0 -i eth0 root 7484 0.0 0.0 93868 472 ? Ss 16:35 0:00 /usr/bin/vmnet-dhcpd -cf /etc/vmware/vmnet1/dhcpd/dhcpd.conf -lf /etc/vmware/vmnet1/dhcpd/dhcpd.leases -pf /var/run/vmnet-dhcpd-vmnet1.pid vmnet1 root 7748 0.0 0.0 93956 776 ? Ss 16:35 0:00 /usr/sbin/vmware-authdlauncher root 7758 0.0 0.0 91840 1296 pts/0 S 16:35 0:00 /bin/sh /usr/bin/vmware-watchdog -s webAccess -u 30 -q 5 /usr/lib/vmware/webAccess/java/jre1.5.0_15/bin/webAccess -client -Xmx64m -XX:MinHeapFreeRatio=30 -XX:MaxHeapFreeRatio=30 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16/common/endorsed -classpath /usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16/bin/bootstrap.jar:/usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16/bin/commons-logging-api.jar -Dcatalina.base=/usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16 -Dcatalina.home=/usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16 -Djava.io.tmpdir=/usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16/temp org.apache.catalina.startup.Bootstrap start root 7770 3.6 2.4 1384636 96784 ? Ssl 16:35 0:12 /usr/lib/vmware/webAccess/java/jre1.5.0_15/bin/webAccess -client -Xmx64m -XX:MinHeapFreeRatio=30 -XX:MaxHeapFreeRatio=30 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16/common/endorsed -classpath /usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16/bin/bootstrap.jar:/usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16/bin/commons-logging-api.jar -Dcatalina.base=/usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16 -Dcatalina.home=/usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16 -Djava.io.tmpdir=/usr/lib/vmware/webAccess/tomcat/apache-tomcat-6.0.16/temp org.apache.catalina.startup.Bootstrap start root 7890 0.0 0.0 87048 308 ? Ss 16:35 0:00 /usr/bin/vmnet-netifup -d /var/run/vmnet-netifup-vmnet1.pid /dev/vmnet1 vmnet1 root 7948 0.0 0.0 91076 840 pts/0 S+ 16:41 0:00 grep vm (In reply to comment #27) > Disabling SELinux is not an option for me, which is why I'm prepared to live > with the pam_permit.so workaround until I can move from vmware to a kvm-based > virt solution. I was not willing to disable SELinux either. Below is the solution I came up with: 1. Download VMware-server 1 tarball 2. tar -xvf VMware-server-1.0.7-108231.tar.gz 3. cp /home/tclark/Download/vmware/vmware-server-distrib/lib/lib/libpam.so.0/security/pam_unix.so /lib/security/pam_unix_vm.so 4. vi /etc/pam.d/vmware-authd #%PAM-1.0 auth required pam_unix_vm.so shadow nullok account required pam_unix_vm.so 5. On a remote Linux machine with a less secure hash "grep tclark /etc/shadow" resulted in: tclark:1234567890123456789012345678901234:14377:0:99999:7::: 6. On PC I was trying to install VMware on "grep tclark /etc/shadow" resulted in: tclark:1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456:14377:0:99999:7::: 7. vi /etc/shadow Changed from: tclark:1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456:14377:0:99999:7::: to: tclark:1234567890123456789012345678901234:14377:0:99999:7::: 8. service vmware restart I was then able to successfully login. I can confirm that setting SELinux to disabled allows login to web GUI. I was experiencing the same [unix_chkpwd] <defunct> as others. The only modification I made was to change selinux, reboot and all works. This seemed to also trigger the file not_configured to appear in /etc/vmware. This forces the user to rerun vmware-config.pl ad nauseum. uname -a Linux localhost.localdoman 2.6.27.30-170.2.82.fc10.x86_64 #1 SMP Mon Aug 17 08:18:34 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux rpm -qa |grep VMware VMware-server-2.0.1-156745.x86_64 rpm -qa |grep selinux libselinux-python-2.0.78-1.fc10.x86_64 libselinux-2.0.78-1.fc10.i386 selinux-policy-targeted-3.5.13-70.fc10.noarch libselinux-utils-2.0.78-1.fc10.x86_64 selinux-policy-3.5.13-70.fc10.noarch libselinux-devel-2.0.78-1.fc10.x86_64 libselinux-2.0.78-1.fc10.x86_64 I would be happy to do any testing if this is still an active bug. It's the same on F11 as well (all updates current as of a couple days ago). The pam_permit hack didn't work for me, but rebooting with selinux disabled did. I only needed to use it once before converting to KVM, so not an ongoing problem for me, but definitely still there. I didn't see any AVC denial pop-ups. This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Still a problem with F12 same hack to make it work. Side note: VMware tomcat is getting increasingly unstable with each kernel too. This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping No longer an issue. |