Bug 467508

Summary: ls slower, due to capabilities
Product: [Fedora] Fedora Reporter: James Antill <james.antill>
Component: coreutilsAssignee: Kamil Dudka <kdudka>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 9CC: kdudka, ovasik, twaugh
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: coreutils-6.12-16.fc10 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-10-24 06:22:30 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description James Antill 2008-10-17 16:29:51 EDT
Description of problem:
 I'm not even 100% sure I should log this, but I recently got a coreutils update, and I've seen ls be "slow" a couple of times on large-ish directories. The configuration for ls specifically is:

/bin/ls --color=auto --sort=version -F -T 0 -ABFbhs

...now doing an strace shows lots of these calls between the write()s:

capget(0x20071026, 0, NULL)             = -1 EFAULT (Bad address)
getxattr("20020513h.gif", "security.capability"..., 0x7fff4e8701e0, 20) = -1 ENODATA (No data available)

...generally being 4 sets of calls (matching the 4 rows of output for each line), and this can be so slow that you can watch the terminal scroll each line upwards ... of course that's for the "first" attempt, if you then try it immediately afterwards it's fast (presumably the capabilities for each inode is cached). Also even from a cold cache POV, it's not always slow.

 As I said I'm not 100% sure this is an ls bug, and not a kernel bug or an ext3 "capabilities are slow" feature ... it's just I've only just starting seeing it since the coreutils update, so I thought I'd mention it.

Version-Release number of selected component (if applicable):
coreutils.x86_64                      6.10-33.fc9                      installed
Comment 1 Kamil Dudka 2008-10-20 09:20:57 EDT
(In reply to comment #0)
Thank you for the report.

> The configuration for ls specifically is:
> 
> /bin/ls --color=auto --sort=version -F -T 0 -ABFbhs
Why so many parameters? (-F is even twice) An minimal example would be nice. Can you give me the output of this (1st run, 2nd run, old ls, new ls - 4 cases together)?
time ls -U1 --color > /dev/null # without sort, without output to terminal

Though I could not reproduce this bug, this is a really good point. There should be possible to turn off capability checking by unset the $LS_COLORS ca attribute (in the same way as symlink validity checking). I will try to propose this one-line patch to upstream.
Comment 2 James Antill 2008-10-20 09:44:01 EDT
It's actually a couple of aliases, so I just type "l" (ell) ... so I followed the aliases to pasted together what ls will see.
 I hadn't realized -F was in my alises twice :)

Doing the timing is somewhat hard because I have to wait until the directory is out of cache again, to get the "slow" version. However first thing in the morning is a good time for at least one cold run, so (both new ls):

/bin/ls --color=auto --sort=version -F -T 0 -ABFbhs  0.01s user 0.15s system 3% cpu 4.345 total

/bin/ls --color=auto --sort=version -F -T 0 -ABFbhs  0.01s user 0.07s system 45% cpu 0.181 total

...doing the same with old-ls will be particularly hard, as I don't have that installed anymore.
Comment 3 Kamil Dudka 2008-10-20 11:39:11 EDT
(In reply to comment #2)
You can try the patch proposed to coreutils upstream:
http://lists.gnu.org/archive/html/bug-coreutils/2008-10/msg00242.html

It should behave as "old ls" if capabilities checking is disabled by $LS_COLORS.
Comment 4 James Antill 2008-10-20 12:24:14 EDT
 By disabled you mean that:

dircolors -b ~/.dir_colors | perl -nle 'print for split ":"'

...does having anything starting with "ca=" yeh?

 I can try that, although a test rpm would be easier?:)
Comment 5 Kamil Dudka 2008-10-20 12:42:27 EDT
(In reply to comment #4)
>  By disabled you mean that:
> 
> dircolors -b ~/.dir_colors | perl -nle 'print for split ":"'
I am not familiar with perl, this works for me:
$ eval `dircolors -b | sed s/ca=[^:]*:/ca=:/`

> ...does having anything starting with "ca=" yeh?
It is not set in /etc/DIR_COLORS on F-9 because of tcsh Bug 457342, but you can get this attribute from dircolors utility.

>  I can try that, although a test rpm would be easier?:)
Test rpm will be available soon, we are still waiting for upstream reaction...
Comment 6 James Antill 2008-10-20 13:13:17 EDT
If I run just dircolors on Fedora 9 I get a "ca=" line. I have my own ~/.dir_colors file though, which doesn't have that.

Feel free to post the test rpm, when it's ready and I'll try that. Thanks.
Comment 7 James Antill 2008-10-21 17:32:57 EDT
fwiw, maybe more useful for some kernel guys, stracing some cold dirs. I can see that it's the getxattr("blah", "security.capability", ...) calls that are taking all the time.
 Roughly one call in every 5 takes about 0.015 of a second, with the rest taking around 0.00018, so if you have 2,000 files in a directory you quickly get a lot of latency ((2000 / 5) * 0.015).
 This implies, to me, that the kernel isn't doing the correct amount of readahead on the getxattr data (ie. not growing the number of xattrs it tries to read at once).

Predictably after the first cold run, all the getxattr calls are in the 0.00005 = 0.00015 range.
Comment 8 Kamil Dudka 2008-10-22 06:45:30 EDT
The patch was accepted by upstream:
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=f3f1ccfd871ee395e7fafc051c1b7dedb39fdfc9

You can test the rpm from this scratch build:
http://koji.fedoraproject.org/koji/taskinfo?taskID=895188

Feel free to open/clone the bug against libcap/kernel if your think there is a performance problem. This patch is only way to stop capabilities checking, not to make it run faster.
Comment 9 Ondrej Vasik 2008-10-22 07:53:29 EDT
Since it is accepted by upstream, built in Rawhide with possibility of disabling capability ls performance impact as coreutils-6.12-16.fc10 to have it for F-10. As Kamil said, this is only workaround for coreutils - feel free to open bug against libcap/kernel about this strange unpredictible performance issue.
Comment 10 James Antill 2008-10-22 09:44:39 EDT
Installed Packages
coreutils.x86_64                     6.10-33.1.fc9                     installed

...doesn't work for me, strace still shows it doing getxattr calls.

% echo $COLORS                            
/home/james/.dir_colors
% dircolors -b ~/.dir_colors | perl -nle 'print for split ":"' | fgrep ca
%
Comment 11 Kamil Dudka 2008-10-23 05:49:16 EDT
Could you please attach the full content of $LS_COLORS?
Comment 12 James Antill 2008-10-23 09:33:39 EDT
no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.pl=01;32:*.py=01;32:*.csh=01;32:*.conf=32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:
Comment 13 Kamil Dudka 2008-10-23 10:57:05 EDT
Thanks. Could you try to add "ca=:" to the list and check with strace again?
Comment 14 James Antill 2008-10-23 13:42:39 EDT
yeh, adding that manually fixed it.
Comment 15 Kamil Dudka 2008-10-24 06:22:30 EDT
Fixed coreutils package is available for F-10, closing NEXTRELEASE.
Comment 16 James Antill 2008-10-24 09:21:06 EDT
So I just upgraded coreutils and util-linux-ng from rawhide, and it still has the same problem ... if you don't configure ca as off it still does the capabilities check?
 Is the default for ca= not being there that it does/means something?
Comment 17 Ondrej Vasik 2008-10-24 10:55:58 EDT
Default behaviour is to have colored capability - and this causes performance impact. Workaround was only about possibility to disable colorized capability... I guess you still have ~/.dir_colors or changed /etc/DIR_COLORS (%config(noreplace) file) without defined capability color - and therefore default from dircolors is used. Add line "CAPABILITY 00" there and it should be ok...
Comment 18 James Antill 2008-10-24 11:08:30 EDT
Maybe we want to change the default then?

Yeh, I have a ~/.dir_colors ... my worry is that other people will have these too, also not see any colour differences (almost no files have capabilities) and see the perf. impact.
I've added the CAPABILITY 00 in there now, so it wfm at least :)

Also random idea ... capabilities only make sense on exec, yeh? ... so how about we only check them for files with (mode * 0111) ?
Comment 19 Ondrej Vasik 2008-10-24 12:10:04 EDT
As adding capability displaying to ls was requested internally by Red Hat security tool guys, I would like to keep it default. You are right that checking only on executables does make sense and I'm sure it will reduce performance impact even when not cached. What do you think, Kamil?
Comment 20 Kamil Dudka 2008-10-27 11:46:06 EDT
I think it would increase the performance, but at the cost of usability. Non-executable files with capabilities are unusual case which should be visible at first glance.

I did some investigation about the "performance impact", here are my results (coreutils-6.12-16.fc10):

# umount /home && mount /home
# mount | fgrep home
/dev/mapper/VolGroup00-LogVol02 on /home type ext3 (rw,usrquota,grpquota)

$ time ~/cvs/coreutils/devel/coreutils-6.12/src/ls -U1 --color huge_dir >/dev/null

               | 1st run | 2nd run | 3rd run |
---------------|---------|---------|---------|
default colors |  24.4s  |   0.8s  |   0.8s  |
         ca=00 |  23.0s  |   0.4s  |   0.4s  |

James, can you give me such results from your system?
Comment 21 James Antill 2008-10-27 12:15:44 EDT
> Non-executable files with capabilities are unusual case which should be visible at first glance.

I'm not disagreeing that's unusual ... but it is painful given the amount of work required, and that the unusual case is a noop. And there are other unusual cases which aren't checked/flagged.

As for timings, see the time results in comment #2 and the further explanation in comment #7 ... this is very consistent for me, doing capability checking adds roughly 0.015 of a second for every 5 files in a directory. A couple of directors I ls in have > 1,000 files in them ... at which point you can "see" that the ls output is slow (instead of being "instant" it looks like a fast printer).
Comment 22 Kamil Dudka 2008-10-27 12:31:08 EDT
So you have completely different results. If we consider only the 1st run (not cached) it is only about 5% slower with checking of capabilities on my box. I forgot to note that huge_dir contains 100,000 files.

Maybe you can ping me on #fedora-devel, the communication would be faster ;-)