Bug 1673906

Summary: sudo 1.8.23 is significantly slower than 1.8.19p2
Product: Red Hat Enterprise Linux 7 Reporter: Renaud Métrich <rmetrich>
Component: pamAssignee: Tomas Mraz <tmraz>
Status: CLOSED WONTFIX QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: medium Docs Contact:
Priority: low    
Version: 7.6CC: dapospis, dmitrijs.samsonovs, mgrepl, redhat-bugzilla, robert.scheck, tmraz, tonster, ysoni
Target Milestone: rcKeywords: Performance, Reopened, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-22 10:29:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Reproducer none

Description Renaud Métrich 2019-02-08 12:29:25 UTC
Created attachment 1528052 [details]
Reproducer

Description of problem:

Comparing the time required to execute sudo between RHEL7.5 (1.8.19p2-14) and RHEL7.6 (1.8.23-3), it appears that sudo and "sudo -i" are significantly slower on RHEL7.6:

RHEL7.5:

$ ./sudoloop 
'sudo pwd' for 1000 iterations:

real	0m8.242s
user	0m3.197s
sys	0m4.651s

'sudo -i pwd' for 1000 iterations:

real	0m22.686s
user	0m11.812s
sys	0m10.437s


RHEL7.6:

$ ./sudoloop 
'sudo pwd' for 1000 iterations:

real	0m15.117s
user	0m5.029s
sys	0m7.871s

'sudo -i pwd' for 1000 iterations:

real	0m27.075s
user	0m13.154s
sys	0m12.031s


Version-Release number of selected component (if applicable):

sudo-1.8.23-3.el7


How reproducible:

Always


Steps to Reproduce:

1. Create a user account

  # useradd -G wheel user
  # echo "%wheel	ALL=(ALL)	NOPASSWD: ALL" > /etc/sudoers.d/wheel

2. As user 'user', execute the reproducer (sudoloop, in attachment)

  $ ./sudoloop

3. Downgrade sudo

  # yum -y downgrade sudo

4. As user 'user', Execute the reproducer again

  $ ./sudoloop


Actual results:

See Description above.


Additional info:

This is problematic for sudo intensive applications, such as the start of 3rd party application 'Zimbra' which spawns sudo > 200 times when executing "/opt/zimbra/libexec/zmconfigd".

Comment 4 Miroslav Grepl 2019-02-19 11:42:19 UTC
This issue was not selected to be included either in Red Hat Enterprise Linux 7.7 because it is seen either as low or moderate impact to a small amount of use-cases. The next release will be in Maintenance Support 1 Phase, which means that qualified Critical and Important Security errata advisories (RHSAs) and Urgent Priority Bug Fix errata advisories (RHBAs) may be released as they become available. We will now close this issue, but if you believe that it qualifies for the Maintenance Support 1 Phase, please re-open; otherwise we recommend moving the request to Red Hat Enterprise Linux 8 if applicable.

Comment 5 Robert Scheck 2019-02-20 11:23:07 UTC
I am sorry, but WONTFIX is not really the expected answer. This RHBZ was opened to clarify if the decreased sudo performance is a bug or not (independent whether it will be addressed with a fix for RHEL 7 or not). Thus I am cloning this for RHEL 8 now...

Comment 6 Robert Scheck 2019-02-20 11:26:56 UTC
Bug #1679115 is the clone for RHEL 8 (as mentioned in comment #5)

Comment 7 Renaud Métrich 2019-02-21 15:17:14 UTC
I'm sorry but I cannot accept this answer. There is a real bug here.
See below my new reproducer:

sudo-1.8.23-3.el7.x86_64 (newer):
--------------------------------

# ulimit -H -n 65536

# ./sudoloop
'sudo pwd' for 1000 iterations:

real	1m7.926s
user	0m47.327s
sys	0m19.539s

'sudo -i pwd' for 1000 iterations:

real	1m23.989s
user	0m56.926s
sys	0m26.506s


sudo-1.8.19p2-14.el7_5 (older):
------------------------------

# ulimit -H -n 65536

# ./sudoloop
'sudo pwd' for 1000 iterations:

real	0m46.112s
user	0m40.882s
sys	0m5.220s

'sudo -i pwd' for 1000 iterations:

real	1m1.666s
user	0m50.270s
sys	0m11.920s


DIFFERENCE:
----------

'sudo pwd': +47%
'sudo -i pwd': +37%


Stracing shows that the time is spent mostly in closing invalid descriptors before executing /usr/sbin/unix_chkpwd.
The more the hard limit on number of files is, the slower it will be.

strace excerpt:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
11581 15:58:51.068182 getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=64*1024}) = 0 <0.000016>
11581 15:58:51.068235 close(3</dev/urandom>) = 0 <0.000007>
11581 15:58:51.068263 close(4</etc/sudoers>) = 0 <0.000006>
11581 15:58:51.068301 close(5<pipe:[132649]>) = 0 <0.000006>
11581 15:58:51.068326 close(6<pipe:[132649]>) = 0 <0.000007>
11581 15:58:51.068351 close(7<pipe:[63019]>) = 0 <0.000007>
11581 15:58:51.068375 close(8<pipe:[63019]>) = 0 <0.000007>
11581 15:58:51.068399 close(9)          = -1 EBADF (Bad file descriptor) <0.000007>
...
11581 15:58:52.637810 close(65535)      = -1 EBADF (Bad file descriptor) <0.000006>
11581 15:58:52.637835 geteuid()         = 0 <0.000006>
11581 15:58:52.637855 setuid(0)         = 0 <0.000007>
11581 15:58:52.637892 execve("/usr/sbin/unix_chkpwd", ["/usr/sbin/unix_chkpwd", "root", "chkexpiry"], []) = 0 <0.000284>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

--> 1.6 seconds lost closing unused file descriptors.

Comment 8 Renaud Métrich 2019-02-21 16:36:37 UTC
The culprit is not sudo but PAM unix which is now called by sudo:

pam_unix_acct.c:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
117     /* XXX - should really tidy up PAM here too */
118 
119     if (getrlimit(RLIMIT_NOFILE,&rlim)==0) {
120       if (rlim.rlim_max >= MAX_FD_NO)
121         rlim.rlim_max = MAX_FD_NO;
122       for (i = STDERR_FILENO + 1; i < (int)rlim.rlim_max; i++) {
123         close(i);
124       }
125     }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 9 Tomas Mraz 2019-02-22 10:26:57 UTC
This is not a bug. Basically sudo started calling pam which does this call to ensure that no open file descriptors leak when spawning other processes.

Of course there could be some "heuristical optimization" done for example by stopping the loop after more than ten successive EBADF errors are encountered from the close(). Or we could look up the opened fds in /proc/self/fd. However there is no plan to change/optimize this for RHEL-7.

Comment 15 Robert Scheck 2019-03-22 22:32:21 UTC
In our specific case of Zimbra's zmconfigd starting up only slowly since the sudo update, the following change causes a massive zmconfigd startup speedup. Thanks to Renaud for the proposal!

--- snipp ---
--- /etc/pam.d/sudo   2018-09-25 08:02:19.000000000 +0200
+++ /etc/pam.d/sudo   2019-03-22 23:30:27.404306013 +0100
@@ -1,5 +1,6 @@
 #%PAM-1.0
 auth       include      system-auth
+account    sufficient   pam_succeed_if.so user = zimbra quiet
 account    include      system-auth
 password   include      system-auth
 session    optional     pam_keyinit.so revoke
--- snapp ---

Comment 16 Dmitry Samsonov 2019-07-31 10:48:11 UTC
(In reply to Robert Scheck from comment #15)

Hope to save someone's time. In our case correct workaround should be like:
account    sufficient   pam_succeed_if.so user = zimbra quiet use_uid

Note additional option use_uid in the end. Otherwise it applies to target username, not source username.