Description of problem: I run a script that creates around 30 000 users by smbldap_useradd. After it I stopped ldap, and when started it reported database to be corrupt. Version-Release number of selected component (if applicable): openldap-2.4.10-1.fc9.i386 How reproducible: Steps to Reproduce: 1. Run ldap server. 2. Create more than 30 000 users using openldap 3. Restart ldap server. Actual results: Database is reported to be corrupt. Expected results: Database remains consistent. Additional info:
I can't reproduce it (using simple ldapadd). I'll let smbldap_useradd running over the night in case it does something else. Can you still reproduce the bug? It may be related to #458683, does it work if you allow more memory to slapd?
It is still there. Maybe I should rebuild a database after memory limit removal. When I restarted ldap (/etc/init.d/ldap restart) I got shutdown failed and then the database was corrupt. I should say I put this into /etc/sysconfig/ldap: function recoverdatabase () { /usr/sbin/slapd_db_recover -h /var/lib/ldap /usr/sbin/slaptest chown ldap:ldap /var/lib/ldap/* return 0 } if [ "x$1" = "xstart" ]; then recoverdatabase fi maybe if I remove it, ldap will start OK. But in that case ldap will not start after incorrect shutdown.
Your code should do no harm. But I wonder what could have made ldap to shutdown incorrectly... Is here anything interesting in the log with loglevel 2? Do you use unusually slow storage device (like network share) for bdb database or the bdb logs? And please post strace of the shutdown - attach strace to running ldap by "strace -o trace.log -r -p `pidof slapd`" and stop the ldap service in another terminal.
I rebuilded database and I did not get the error yet. It could be something wrong in database or I suspect some write operation was pending while shutdown and the database recover was run on not yet cleanly closed database. Maybe I am not able to do it at the correct time to simulate it again. I tried /etc/init.d/ldap stop and /etc/init.d/ldap start and everything was working correctly. Then I tried /etc/init.d/ldap restart and also everything worked. Probably it is not safe to do /etc/init.d/ldap restart. I will try to recognize the situations when it occures.
(In reply to comment #4) > Probably it is not safe to do /etc/init.d/ldap restart. There is no difference in calling stop && start and simple restart. There could be problem with slow slapd reaction to signal - "service ldap stop" sends SIGTERM to the slapd process and waits for 3 seconds. The slapd process starts to shut down, flushes all buffers, closes database etc. and if it can't be finished in these 3 seconds, the init script kills slapd by SIGKILL in middle of the operation. This can result in broken database. But, according to long-term experience, 3 seconds are a lot more than common slapd needs to shutdown itself... That's why I ask if you have any unusually slow storage. Anyway, please report back if you get the log and/or strace from failed shutdown.
I am still not able to reproduce the problem after database rebuild. I do not have any unusual storage. Database itself and bdb logs are on local disk. The size of database is more than 800 MB now.
Maybe 3 seconds for shutdown is not enough when write operation is in progress. It is possible to have more concurrent smbldap-useradd accesses in my setup. And maybe I cannot guess the moment when to run /etc/init.d/ldap stop again.
Created attachment 317349 [details] Debug of shutting down the ldap server after which the database corruption occurs
To get the database corrupted after shutdown I run a script smbldap-usermod which changed the home directory for each of the more then 32000 users. When I run /etc/init.d/ldap restart the shutdown failed and the startup reported database to be corrupted. After subsequent /etc/init.d/ldap restart the shutdown worked properly, but the startup always reports corrupted database. After running /etc/init.d/ldap stop and /etc/init.d/ldap start everything works OK. Remember the code in comment 2 which cleans the database on /etc/init.d/ldap start.
I get rid of the problem by setting checkpoint 1024 5 after the suffix line of the slapd.conf. This settings makes slapd to checkpoint database every 5 minutes. When not setting checkpoint it does it only on slapd shutdown. When running slapd longer time and many changes occured on bdb database the checkpoint procedure lasts longer time than the /etc/init.d/ldap stop procedure is willing to wait for it to finish. This setting probably makes slapd to be able to finish checkpoint procedure (for the last 5 minutes changes) in 3 seconds. But I would advice to use longer time for killing slapd in /etc/init.d/ldap than 3 seconds anyway. I would also advice to appear the checkpoint setting in default slapd.conf in the openldap-servers package.
> But I would advice to use longer time for > killing slapd in /etc/init.d/ldap than 3 seconds anyway. 3 seconds are good for most of the users - you are the first one complaining... I'll add new option to /etc/sysconfig/ldap, where you can tune it. > I would also advice to appear the checkpoint setting in default slapd.conf in > the openldap-servers package. Good idea, I'll add it there.
openldap-2.4.10-2.fc9 has been submitted as an update for Fedora 9. http://admin.fedoraproject.org/updates/openldap-2.4.10-2.fc9
openldap-2.4.10-2.fc9 has been pushed to the Fedora 9 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update openldap'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-8834
openldap-2.4.10-2.fc9 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report.