Bug 434842

Summary:	local logins fail when network connection lost
Product:	Red Hat Enterprise Linux 4	Reporter:	Corporate UNIX <corporateunix>
Component:	nss_ldap	Assignee:	Nalin Dahyabhai <nalin>
Status:	CLOSED WONTFIX	QA Contact:
Severity:	medium	Docs Contact:
Priority:	low
Version:	4.6	CC:	dossy, jplans, jsafrane
Target Milestone:	rc	Keywords:	Reopened
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-06-20 13:28:00 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Corporate UNIX 2008-02-25 19:46:21 UTC

Description of problem:
During a recent network failure, local logins were not possible. The LDAP server
was unreachable, which would cause the message "Login timed out after 60
seconds" to appear when attempting to login to the console as root.

nsswitch.conf is configured correctly, as are system-auth and ldap.conf.

During testing, I found that turning off the LDAP server will result in local
logins working as expected, as the connection is refused instead of being
dropped into a black hole.

Commenting out the LDAP related entries in system-auth did NOT fix this problem.
During the outage, I had to reboot to single user mode, remove ldap from
nsswitch.conf, and then init 3 in order to login as root. That is not expected
behavior for nsswitch.conf, AFAIK.

Version-Release number of selected component (if applicable):
openldap-2.2.13-8
glibc-2.3.4-2.39

How reproducible:
Configure an LDAP client, then simulate the network unreachable failure using
iptables.

Steps to Reproduce:
1. Configure RHES 4 client to use to an LDAP server. We use fedora-ds.
2. Verify LDAP logins and local logins via console are working as expected.
3. Use iptables to "-j DROP" all outgoing packets to the LDAP server.
4. Attempt to login as root on the console.
  
Actual results:
"Login timed out after 60 seconds"

Expected results:
root login should succeed.

Additional info:
/etc/ldap.conf (hard linked to /etc/openldap/ldap.conf)

uri ldap://our.internal.ldap.server.com
base dc=our,dc=ldap,dc=server,dc=com
TLS_CACERT /etc/openldap/cacerts/cacert.asc
TLS_REQCERT allow
bind_policy soft
ssl start_tls
pam_password md5
nss_reconnect_tries 2
nss_reconnect_sleeptime 1
nss_reconnect_maxsleeptime 10
nss_reconnect_maxconntries 1

/etc/pam.d/system-auth:

auth        required      /lib/security/$ISA/pam_env.so
auth        sufficient    /lib/security/$ISA/pam_localuser
auth        sufficient    /lib/security/$ISA/pam_unix.so likeauth nullok
auth        sufficient    /lib/security/$ISA/pam_ldap.so
auth        required      /lib/security/$ISA/pam_deny.so

account     sufficient    /lib/security/$ISA/pam_unix.so
account     sufficient    /lib/security/$ISA/pam_local_user.so
account     sufficient    /lib/security/$ISA/pam_succeed_if.so uid < 100 quiet
account     sufficient    /lib/security/$ISA/pam_ldap.so
account     required      /lib/security/$ISA/pam_permit.so

password    requisite     /lib/security/$ISA/pam_cracklib.so retry=3
password    sufficient    /lib/security/$ISA/pam_unix.so nullok use_authtok md5
shadow
password    sufficient    /lib/security/$ISA/pam_ldap.so
password    required      /lib/security/$ISA/pam_deny.so

session     required      /lib/security/$ISA/pam_limits.so
session     required      /lib/security/$ISA/pam_unix.so


PLEASE NOTE: As stated above, commenting out all the relative pam_ldap entries
in system-auth does NOT fix the problem, which is counter-intuitive. Changing
the order of pam_ldap and pam_unix does not fix the problem either. Adding
pam_localuser was the last thing I tried, and that also did not fix the problem.


/etc/nsswitch.conf:

passwd:     files ldap
shadow:     files ldap
group:      files ldap
automount:  files ldap

hosts:      files dns

bootparams: files
ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files
netgroup:   files
publickey:  files
aliases:    files

I've also tried using these settings in nsswitch.conf, which did not work. (And
they should be the default behavior, from the docs I've read)

passwd:     files [success=return notfound=continue unavail=continue
tryagain=continue] ldap
shadow:     files [success=return notfound=continue unavail=continue
tryagain=continue] ldap


The only way I've been able to login as root on the console during a simulated
network failure has been to remove ldap from the nsswitch.conf settings.

Comment 1 Corporate UNIX 2008-02-25 20:15:08 UTC

I forgot to mention that I noticed some other odd behavior during my simulated
network failure testing.

I use iptables to block only the outgoing connections to the LDAP server, so I
decided to see what would happen if I attempted to login as root from another
location on the network. I enabled root logins in sshd_config, restarted sshd,
and tried logging in.

After typing the ssh command, it pauses for about 60 seconds, then prompts for
the password. After entering root's password, I am immediately greeted with the
"Last login" information, after which it pauses for another 60 seconds. Then I
am finally presented the root shell.

/etc/pam.d/sshd is configured as default:

auth       required     pam_stack.so service=system-auth
auth       required     pam_nologin.so
account    required     pam_stack.so service=system-auth
password   required     pam_stack.so service=system-auth
session    required     pam_stack.so service=system-auth
session    required     pam_loginuid.so

Comment 2 Jan Safranek 2008-02-27 09:27:42 UTC

It does not seem to be openldap problem, glibc should not try to contact ldap
server if it can find all root account information locally.

As a workaround you can tweak the ldap timeouts in /etc/ldap.conf (timelimit and
bind_timelimit options).

Comment 3 Jakub Jelinek 2008-02-27 09:43:35 UTC

Except that glibc doesn't try to contact ldap server at all, it is the nss_ldap
plugin that does that.

Comment 4 Jakub Jelinek 2008-02-27 15:28:09 UTC

During login etc. initgroups or getgrouplist are called.  And these functions
really have to look through all groups to see what groups the user (in your case
root) belongs to.

Comment 5 Corporate UNIX 2008-02-27 17:17:50 UTC

Wait, so not being able to login on to console as root during a network outage
is not a bug?

How can that be considered expected behavior? If nsswitch.conf is configured to
go to files then ldap, why is it attempting to look at ldap for groups? The
default behavior is supposed to be success=return - Are you suggesting that it
is expecting to find that local users are also part of ldap groups?

Regardless, this issue should remain open and be considered a bug. Default,
expected behavior should NOT lock you completely out of the system during an
LDAP or network failure. That's akin to programming my car to refuse to unlock
when it's hailing outside - it won't happen very often, but it will happen.

Also take into consideration the serious ramifications if a malicious person
were to deliberately target someone's LDAP servers, knowing the default behavior
will lock them out of ALL of their LDAP connected servers.

Comment 6 Corporate UNIX 2008-02-27 21:52:40 UTC

Adding the following to ldap.conf did indeed fix this problem:

bind_timelimit 15
timelimit 15

It actually took about 30 seconds to timeout because of the nss_reconnect values
I stated earlier.

Perhaps the default login timeout should be increased, or the default values for
bind_timelimit, timelimit and nss_reconnect should be changed to prevent a
console login from timing out before the LDAP query?

Granted, using nscd also fixes this problem (short term), but I shouldn't have
to rely on it.

I stand by my assertion that it's ludicrous to have a default design that locks
root completely out of the system because of a little network issue. A
determined hacker could use this little bug to their advantage. They could have
their way with your most critical server while you were busy troubleshooting
LDAP issues. Or, a malicious user could DDOS your LDAP servers, locking out
everyone on every LDAP connected server in your network.

Comment 7 Jan Safranek 2008-03-06 14:47:37 UTC

another hint: try to add to your /etc/ldap.conf:
nss_initgroups_ignoreusers root

As result, nss_ldap will not ask LDAP server for list of root's groups.

Comment 8 Jose Plans 2008-03-06 15:37:52 UTC

That would be the valid solution for this, however a concern would be Bug 429101
where a lock is not cleared up if the option is used.. causing dbus to lockup
during the boot sequence. This should be, however, fixed in 4.7's nss_ldap package.
     Jose

Comment 9 Chris Hunter 2008-03-19 22:26:10 UTC

I thought the nss_reconnect options were only implemented in nss_ldap v2.41 and
newer. Were they backported to RHEL 4 ?
http://www.liquidx.net/blog/2006/04/03/nss_ldap-undocumented-nss_reconnect_tries/

Comment 10 Jiri Pallich 2012-06-20 13:28:00 UTC

Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.