When adding large amounts of users to a system the useradd command slows Description of problem: When using the following script the useradd command slows down. At around 11000 users the useradd command takes about 5 seconds, at around 23000 users is slows to about 30 seconds: #!/bin/bash BEGIN=1 END=30000 ID=$BEGIN while [ $ID -ge $BEGIN ] && [ $ID -le $END ]; do if [ $ID -le 9999 ]; then LOGIN=$(printf "%04d" "$ID") elif [ $ID -le 99999 ]; then LOGIN=$(printf "%05d" "$ID") fi /usr/sbin/useradd $LOGIN # I added the time command: # time /usr/sbin/useradd $LOGIN let ID=$ID+1 done After running an strace on the useradd command it pauses at these two points: lseek(9, 7052384, SEEK_SET) = 7052384 write(9, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 292) = 292 close(9) . . . sendto(3, "D\0\0\0T\4\5\0\1\0\0\0\0\0\0\0useradd: op=addi"..., 68, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 68 select(4, [3], NULL, NULL, {0, 100000}) = 1 (in [3], left {0, 100000}) recvfrom(3, "$\0\0\0\2\0\0\0\1\0\0\0E\20\0\0\0\0\0\0D\0\0\0T\4\5\0\1"..., 8476, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 recvfrom(3, "$\0\0\0\2\0\0\0\1\0\0\0E\20\0\0\0\0\0\0D\0\0\0T\4\5\0\1"..., 8476, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36 Version-Release number of selected component (if applicable): RHEL 4 AS (both i386 and x86_64) running the 2.6.9-11, 2.6.9-22.0.1 and 2.6.9-34 smp kernels; shadow-utils-4.0.3-60.RHEL4. System is a 4 way Xeon EM64T 3.66Ghz system with 6GB of RAM. How reproducible: consistent results over several test runs Steps to Reproduce: 1.Run the above script (to see times uncomment the time line) 2. 3. Actual results: The time to run useradd command increases as more users are added. Expected results: The time to run the useradd command should stay somewhat consistent. Additional info: Bug report from customer who is adding 100,000 users for a tftp server to serve images to voip phones.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
My guess looking at the script is that they are flooding the audit system with audit events. If they do not need the audit events from adding that many users, they might try doing auditctl -e 0 and then run the test. At least that would tell us if the audit system is overflowing - which puts the syscalls on a wait queue. It might also be interesting to print out the audit backlog, auditctl -s | awk '{ print $8 }'.
We need to know if the audit system is causing the impact. If it is, that is an expected condition. Thanks.
It's not caused by audit system. But I see some impact of nscd: shadow-utils-4.0.3-60.RHEL4 --------------------------- # cat /etc/passwd | wc -l 30582 # service nscd status nscd (pid 9423) is running... # time useradd foo real 0m34.011s user 0m6.634s sys 0m0.068s # service nscd stop Stopping nscd: [ OK ] # time useradd foo1 real 0m8.030s user 0m6.162s sys 0m0.099s
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.