Bug 156671

Summary: Changing a crontab file in /etc/cron.d causes crond to crash when using LDAP users
Product: Red Hat Enterprise Linux 3 Reporter: David Jericho <david.jericho>
Component: nss_ldapAssignee: Jason Vas Dias <jvdias>
Status: CLOSED DUPLICATE QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: mtonn
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-21 19:08:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
LDAP DB listing none

Description David Jericho 2005-05-03 04:18:17 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050223 Firefox/1.0.1

Description of problem:
We have crontab files located in /etc/cron.d, often symlinks, all copied directly from a centralised management system. We are not running nscd.

In order to baseline the systems, we use LDAP (OpenLDAP 2.0.27 on RHEL 3) for user accounts, some cronjobs running as non-root LDAP users.

When crond has run at least one cycle of a crontab in /etc/cron.d with LDAP non-root users, making a change after the first cycle (either adding or removing a job - LDAP non-root user) results in crond crashing. No crond processes remain.

Version-Release number of selected component (if applicable):
vixie-cron-3.0.1-75.1

How reproducible:
Always

Steps to Reproduce:
1. Create a user in an LDAP tree, not present in /etc/passwd.
2. Create a crontab file in /etc/cron.d. Task is unimportant, but must be using a LDAP user not present in /etc/passwd.
3. Ensure no nscd is running.
4. Restart crond, and let it run one cycle. 
5. Edit the crontab file, adding another job as a non-root LDAP user.
6. Let crond run the cycle. Observe crash.
  

Actual Results:  Crond crashes with the message in /var/log/cron:

May  3 03:59:00 hostfoo crond[18059]: (*system*) RELOAD (/etc/cron.d/test)
May  3 03:59:00 hostfoo crond[18059]: nss_ldap: reconnecting to LDAP server...
May  3 03:59:00 hostfoo crond[18059]: nss_ldap: reconnected to LDAP server after 1 attempt(s)


Expected Results:  Crond to continue running, with or without nscd.

Additional info:

o This behaviour is repeatable across every host we have.
o It does not matter which LDAP replica/server is being queried.
o It happens to every LDAP user we have.
o Using the nscd service prevents the crashes.

Comment 1 Michael Tonn 2005-05-03 13:06:54 UTC
I have observed the same exact scenerio on RedHat 3.0 Update 4.

Comment 2 Jason Vas Dias 2005-05-03 23:10:48 UTC
I was unable to reproduce this problem, using a RHEL-3 i386 host with:
  vixie-cron-3.0.1-75.1
  nss_ldap-207-11

I had a remote openldap-servers-2.1.29-1 slapd instance, with
users defined as in the attached users.ldif file .

I ensured nscd was not running on the crond machine.

My nsswitch.conf said:
   hosts:  files dns
   passwd: files ldap nis
   shadow: files ldap nis
   group:  files ldap nis

I tried:
  1. Modifying the /etc/cron.d/test file, creating a new job
     to be run for ldap_user1
     (crond ran new job and continued OK).

  2. Creating a new ldap_user2, and creating a new job to be
     to be run for ldap_user2 by the same crond instance
     (crond ran new job and continued OK).

  3. Pulling the network cable from the ldap server, then 
     modifying the cron.d/test file
     (crond timed out for a long time, but eventually recovered
      and continued OK).

  4. Stopping the ldap server and modifying the /etc/cron.d/test
     file, then restarting the server:
May  3 18:35:00 jvdspc crond[6940]: (*system*) RELOAD (/etc/cron.d/test)
May  3 18:35:00 jvdspc crond[6940]: nss_ldap: reconnecting to LDAP
server...
May  3 18:35:00 jvdspc crond[6940]: nss_ldap: reconnecting to LDAP
server...
May  3 18:35:00 jvdspc crond[6940]: nss_ldap: reconnecting to LDAP
server (sleeping 4 seconds)...
May  3 18:35:04 jvdspc crond[6940]: nss_ldap: reconnecting to LDAP
server (sleeping 8 seconds)...
May  3 18:35:12 jvdspc crond[6940]: nss_ldap: reconnected to LDAP
server after 4 attempt(s)
May  3 18:35:12 jvdspc CROND[7114]: (root) CMD (/usr/bin/mrtg
/etc/mrtg/mrtg.cfg)
       crond continues OK .

  5. Stopping the ldap server until nss_ldap times out :
  $ more /tmp/ld.out
May  3 18:37:00 jvdspc crond[6940]: nss_ldap: reconnecting to LDAP
server...
May  3 18:37:00 jvdspc crond[6940]: nss_ldap: reconnecting to LDAP
server...
May  3 18:37:00 jvdspc crond[6940]: nss_ldap: reconnecting to LDAP
server (sleeping 4 seconds)...
May  3 18:37:04 jvdspc crond[6940]: nss_ldap: reconnecting to LDAP
server (sleeping 8 seconds)...
May  3 18:37:12 jvdspc crond[6940]: nss_ldap: reconnecting to LDAP
server (sleeping 16 seconds)...
May  3 18:37:28 jvdspc crond[6940]: nss_ldap: reconnecting to LDAP
server (sleeping 32 seconds)...
May  3 18:38:00 jvdspc crond[6940]: nss_ldap: could not hard reconnect
to LDAP server - Can\'t contact LDAP ser
ver
   crond then did not run the jobs by the ldap users, but did 
   reload and run them when the LDAP server was back in contact
   and the crontab files had been modified.

   This bug may be a duplicate of bug 124882 , for which an nss_ldap
   fix was submitted for RHEL-3-U5 .

   If this problem is reproducible for you, please let me know the
   differences between your configuration and the one I describe  
   above. Please let me know the version of the nss_ldap RPM you
   have installed, and the version of the LDAP server you are using.
   Is the LDAP server remote or local to the crond machine ? 
   Do you have an IP address or a host name as the \'host\' entry in
   /etc/ldap.conf ? Are there significant differences in the structure
   of the ldap user and group objects you use to mine ?

Comment 3 Jason Vas Dias 2005-05-03 23:13:35 UTC
Created attachment 113990 [details]
LDAP DB listing

Comment 4 David Jericho 2005-05-04 00:11:58 UTC
The requested version numbers are:
nss_ldap-207-11
openldap-servers-2.0.27-17

All other packages that are installed are up to date with the latest stable
release available on RHN via up2date.

The LDAP server location does not matter. It can be on the localhost, or
international, behaviour is the same.

/etc/hosts does contain the IPv4 address mappings for the LDAP servers. DNS and
reverse DNS match up, both IPv4 and IPv6. The behaviour does not change if I
disable the IPv6 interfaces and reboot.

/etc/ldap.conf contains the hostnames. Each ldap.conf has at least 3 replicas
and the master for the host parameter.

My nsswitch.conf settings, without the NIS option, are otherwise identical.

As for the LDAP objects, we have an additional group type, being 

dn: cn=ldapgroup,ou=Groups,dc=aarnet,dc=edu,dc=au
objectClass: top
objectClass: groupOfUniqueNames
cn: ldapgroup
description: Example Group Of Unique Names
uniqueMember: uid=ldapgroup,ou=People,dc=aarnet,dc=edu,dc=au

For user accounts, we have both full user accounts with extended attributes, and
stripped down accounts. Full user accounts have the following object classes:

objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
objectClass: eduPerson
objectClass: account
objectClass: posixAccount
objectClass: kerberosSecurityObject
objectClass: shadowAccount

Stripped down uid only accounts (i.e. not for users), which produce the same
behaviour, have the following object classes:

objectClass: top
objectClass: account
objectClass: posixAccount
objectClass: shadowAccount

No password fields are on these accounts. Your LDAP schema otherwise is identical.

I looked over bug IT_56879, and it appears similar, restarting the LDAP server
makes no difference. Crond crashes either way. My strace shows rt_sigaction
being called with a signal of SIGPIPE. GDB on the running process also tells me
"Program received signal SIGPIPE, Broken pipe." Either my GDB-foo isn't upto
scratch, or I need the debuginfo RPMs to get a trace.

I could be reading your test cases wrong, but your description doesn't appear to
match up to what I was doing. In particular case 1.

The procedure I used was:

1) Create crontab file /etc/cron.d/test, containing the string * * * * *
ldapuser touch /tmp/foo
2) Let cron run the job.
3) Append the crontab, copying the first entry, changing the filename it
touched. e.g. * * * * * ldapuser touch /tmp/foo2
4) Crond crashes.


Comment 5 Jason Vas Dias 2005-05-04 14:44:18 UTC
It would appear that this bug is a duplicate of bug 124882 - 
it seems that switching between master / replica servers can
cause the SIGPIPE in nss_ldap as well as restarting the ldap
server.

So installing the RHEL-3-U5 nss_ldap-207-15 should fix this problem.

U5 is still in beta testing, but meanwhile you could download
nss_ldap-207-15 from:
  http://people.redhat.com/~jvdias/nss_ldap/RHEL-3/207-15

This bug is not a CRON problem, but an nss_ldap problem.

There are also newer cron versions available, for RHEL-3-U5:
  http://people.redhat.com/~jvdias/cron/RHEL-3/3.0.1-76_EL3
and for RHEL-3-U6 :
  http://people.redhat.com/~jvdias/cron/RHEL-3/4.1-6_EL3
vixie-cron-4.1 adds an improved scheduling algorithm, and 
support for PAM access control for cron.



*** This bug has been marked as a duplicate of 124882 ***

Comment 6 Red Hat Bugzilla 2006-02-21 19:08:19 UTC
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.