Bug 1891594

Summary: only one url is displayed using "htcacheclean -p . -A" when there are >= 2 files in one subdirectory
Product: Red Hat Enterprise Linux 8 Reporter: Olimp Bockowski <obockows>
Component: httpdAssignee: aegorenk
Status: CLOSED ERRATA QA Contact: Branislav NĂ¡ter <bnater>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.2CC: aegorenk, bnater, jorton, luhliari
Target Milestone: rcKeywords: AutoVerified
Target Release: 8.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-18 15:41:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Olimp Bockowski 2020-10-26 18:54:05 UTC
Description of problem:
If one has CacheDirLevels=1  that there is a quite high chance we will have 2 cached files in one subdirectory. Then htcacheclean shows just one of them and totally ignores the others)


Version-Release number of selected component (if applicable):
I believe all httpd for 7.x and 8.x

How reproducible
Always.

Steps to Reproduce:
Set CacheRoot for example to use /var/cache/httpd/proxy and CacheDirLevels 1. Then put some static content to /var/www/html/ with many small images and access a content with the webbrowser. The directory will be populated. Using tree one can check that sometimes we have more than one header file in a subdirectory, like:

$ ll Xl/
total 20
-rw-------. 1 obscuredUser obscuredGroup Oct  9 10:13 AXqo1qrLdD8d_Rgip
-rw-------. 1 obscuredUser obscuredGroup Oct  9 10:13 AXqo1qrLdD8d_Rgip
-rw-------. 1 obscuredUser obscuredGroup Oct  9 10:14 MDJi0Upkcetk3Ffr5CkA.data
-rw-------. 1 obscuredUser obscuredGroup Oct  9 10:14 MDJi0Upkcetk3Ffr5CkA.header

then using "htcacheclean -p . -A" doesn't show all cached files

Actual results:
one cached file showed

Expected results:
all cached files showed

Additional info:

I was thinking to use for example CacheDirLevels=5 but there is still concern:
- more subdirectories = some impact on filesystem
- still there is a risk we will end up with more than 1 header in the subdirectory with a lot of cache. The question what is the chance, I understand we have some hashing algorith here

Info about the investigation:

so, in strace we have just read one file:

37540 15:41:59.376804 getdents64(4</home/ObscuredUser/WEB/var/cache/Xl>, [{d_ino=2099779, d_off=10, d_reclen=24, d_type=DT_DIR, d_name="."}, {d_ino=178, d_off=12, d_reclen=24, d_type=DT_DIR, d_name=".."}, {d_ino=4397, d_off=17, d_reclen=48, d_type=DT_REG, d_name="AXqo1qrLdD8d_Rgip"}, {d_ino=4414, d_off=22, d_reclen=48, d_type=DT_REG, d_name="AXqo1qrLdD8d_Rgip"}, {d_ino=4361, d_off=27, d_reclen=48, d_type=DT_REG, d_name="MDJi0Upkcetk3Ffr5CkA.header"}, {d_ino=2445, d_off=512, d_reclen=48, d_type=DT_REG, d_name="MDJi0Upkcetk3Ffr5CkA.data"}], 32768) = 240 <0.000011>

37540 15:41:59.376847 openat(AT_FDCWD, "/home/ObscuredUser/WEB/var/cache/Xl/AXqo1qrLdD8d_Rgip", O_RDONLY|O_CLOEXEC) = 5</home/ObscuredUser/WEB/var/cache/Xl/AXqo1qrLdD8d_Rgip> <0.000010>

37540 15:41:59.376885 read(5</home/ObscuredUser/WEB/var/cache/Xl/AXqo1qrLdD8d_Rgip>, "\6\0\0\0", 4) = 4 <0.000006>

37540 15:41:59.376915 lseek(5</home/ObscuredUser/WEB/var/cache/Xl/AXqo1qrLdD8d_Rgip>, 0, SEEK_SET) = 0 <0.000005>

37540 15:41:59.376941 read(5</home/ObscuredUser/WEB/var/cache/Xl/AXqo1qrLdD8d_Rgip>, "\6\0\0\0\310\0\0\0:\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\300\252\207\246\214\261\5\0\300N\33}\215\261\5\0\301#\212\246\214\261
\5\0|U\212\246\214\261\5\0>\21\0\0\0\0\0\0\3\375\0\0\0\0\0\0\1\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377", 120) = 120 <0.000005>

37540 15:41:59.376970 read(5</home/ObscuredUser/WEB/var/cache/Xl/AXqo1qrLdD8d_Rgip>, "http://ObscuredUrl.ico?", 58) = 58 <0.000005>

37540 15:41:59.376997 fstat(5</home/ObscuredUser/WEB/var/cache/Xl/AXqo1qrLdD8d_Rgip>, {st_dev=makedev(253, 3), st_ino=4397, st_mode=S_IFREG|0600, st_nlink=1, st_uid=10001, st_gid=10001, st_blksize=4096, st_blocks=8, st_size=826, st_atime=1602592547 /* 2020-10-13T14:35:47.808084050+0200 */, st_atime_nsec=808084050, st_mtime=1602592531 /* 2020-10-13T14:35:31.174069190+0200 */, st_mtime_nsec=174069190, st_ctime=1602592531 /* 2020-10-13T14:35:31.174069190+0200 */, st_ctime_nsec=174069190}) = 0 <0.000005>

37540 15:41:59.377037 stat("/home/ObscuredUser/WEB/var/cache/Xl/AXqo1qrLdD8d_Rgip", {st_dev=makedev(253, 3), st_ino=4414, st_mode=S_IFREG|0600, st_nlink=1, st_uid=10001, st_gid=10001, st_blksize=4096, st_blocks=16, st_size=5222, st_atime=1602576971 /* 2020-10-13T10:16:11.863343828+0200 */, st_atime_nsec=863343828, st_mtime=1602487370 /* 2020-10-12T09:22:50.329685197+0200 */, st_mtime_nsec=329685197, st_ctime=1602487370 /* 2020-10-12T09:22:50.330685197+0200 */, st_ctime_nsec=330685197}) = 0 <0.000008>

37540 15:41:59.377079 write(1</dev/pts/1<char 136:1>>, "http://ObscuredUrl.ico? 826 5222 200 0 1602592531000000 1602596131000000 1602592531162049 1602592531174780 1 0\n", 146) = 146 <0.000066>
37540 15:41:59.377212 close(4</home/ObscuredUser/WEB/var/cache/Xl>) = 0 <0.000007>
37540 15:41:59.377251 close(5</home/ObscuredUser/WEB/var/cache/Xl/AXqo1qrLdD8d_Rgip>) = 0 <0.000006>

and done, it goes to /ObscuredPath/WEB/var/cache/H6

Odd, however it looks like expected in the code:

 406 static int list_urls(char *path, apr_pool_t *pool, apr_off_t round)

list_urls has a while loop:

 424     while (apr_dir_read(&info, APR_FINFO_TYPE, dir) == APR_SUCCESS && !interrupted) {


that checks for a file and the extension. It recognize . and .., then itterate again and hits a regular file:

 436         else if (info.filetype == APR_REG) {

eventually we hit:

 456                             if (apr_file_read_full(fd, &disk_info, len, &len)
 457                                     == APR_SUCCESS) {

entering:
 465                                     if (listextended) {
hits:
 498                                         else {
 499 
 500                                             apr_file_printf(


Using /root/rpmbuild/BUILD/httpd-2.4.37 with my limited coding knowledge I was trying to leverage gdb, but I can't understand why gdb doesn't show me break that I believe should hit. The last thing it access (before breaking the loop) is:

 498                                         else {
 499 
 500                                             apr_file_printf(

and in gdb I can't see it reaches:

 561                                 break;

however it ends up the loop without processing next header files and executing:

 573     apr_dir_close(dir);

However, the most important why it takes care of just one file and ignores others

Comment 1 Olimp Bockowski 2020-10-29 09:14:54 UTC
Ok, the issue with gdb resolved - Renaud Metrich found out it was due to optimization during compilation. 
To sum up: it just break indeed so we can focus on that problem.
That break was added by Graham Leggett <minfrin> ~ 10 years ago :]

Comment 2 aegorenk 2021-01-19 15:47:14 UTC
Patch submitted to upstream:
https://bz.apache.org/bugzilla/show_bug.cgi?id=65091

Comment 3 aegorenk 2021-01-25 13:23:07 UTC
Patch merged to Fedora:
https://src.fedoraproject.org/rpms/httpd/pull-request/11

Comment 16 errata-xmlrpc 2021-05-18 15:41:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: httpd:2.4 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1809