Description of problem: Account server and container server processes make excessive and redundant stat() calls while serving GET requests Version-Release number of selected component (if applicable): RHGS 3.1.3 swiftonfile-2.3.0-3.el7rhgs How reproducible: Always. Do a GET on a container and watch syscalls made by the container server process. Steps to Reproduce: 1. Create a container 2. Create some objects in the container 3. Do GET on the container while strace is attached to container server process root# ps aux | grep container-server root 1396 0.0 1.2 272164 25396 ? Ss 18:10 0:00 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server.conf root 1429 0.0 1.0 272564 20976 ? S 18:10 0:00 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server.conf root# strace -ff -p 1396,1429 Process 1396 attached Process 1429 attached [pid 1429] stat("/mnt/gluster-object/test/c1", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 1429] getxattr("/mnt/gluster-object/test/c1", "user.swift.metadata", 0x0, 0) = 148 [pid 1429] getxattr("/mnt/gluster-object/test/c1", "user.swift.metadata", "{\"X-Object-Count\":[0,0],\"X-Timestamp\":[\"1467290429.21123\",0],\"X-Type\":[\"container\",0],\"X-PUT-Timestamp\":[\"1467290429.34439\",0],\"X-Bytes-Used\":[0,0]}", 148) = 148 [pid 1429] stat("/mnt/gluster-object/test/c1", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 1429] stat("/mnt/gluster-object/test/c1", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 1429] openat(AT_FDCWD, "/mnt/gluster-object/test/c1", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 8 [pid 1429] getdents(8, /* 8 entries */, 32768) = 200 [pid 1429] getdents(8, /* 0 entries */, 32768) = 0 [pid 1429] close(8) = 0 [pid 1429] stat("/mnt/gluster-object/test/c1/o1", {st_mode=S_IFREG|0755, st_size=1, ...}) = 0 [pid 1429] stat("/mnt/gluster-object/test/c1/o2", {st_mode=S_IFREG|0755, st_size=1, ...}) = 0 [pid 1429] stat("/mnt/gluster-object/test/c1/o3", {st_mode=S_IFREG|0755, st_size=1, ...}) = 0 [pid 1429] stat("/mnt/gluster-object/test/c1/o4", {st_mode=S_IFREG|0755, st_size=1, ...}) = 0 [pid 1429] stat("/mnt/gluster-object/test/c1/dirobject", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 1429] stat("/mnt/gluster-object/test/c1/o5", {st_mode=S_IFREG|0755, st_size=1, ...}) = 0 [pid 1429] getxattr("/mnt/gluster-object/test/c1/dirobject", "user.swift.metadata", 0x0, 0) = 174 [pid 1429] getxattr("/mnt/gluster-object/test/c1/dirobject", "user.swift.metadata", "{\"Content-Length\":0,\"ETag\":\"d41d8cd98f00b204e9800998ecf8427e\",\"X-Object-Type\":\"dir\",\"X-Timestamp\":\"1467291058.57983\",\"X-Type\":\"Object\",\"Content-Type\":\"application/directory\"}", 174) = 174 [pid 1429] poll([{fd=4, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout) [pid 1429] lstat("/mnt/gluster-object/test/c1/dirobject", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 1429] openat(AT_FDCWD, "/mnt/gluster-object/test/c1/dirobject", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 8 [pid 1429] getdents(8, /* 2 entries */, 32768) = 48 [pid 1429] getdents(8, /* 0 entries */, 32768) = 0 [pid 1429] close(8) = 0 Actual results: Lot of stat() calls observed while serving the container listing. Expected results: Container server should not do additional stat() to figure out if an entry is a file or directory. Container server should use d_type information returned by readdir call.
Upstream patches: http://review.gluster.org/#/c/14543/ http://review.gluster.org/#/c/14460/
Downstream patches merged: https://code.engineering.redhat.com/gerrit/#/q/topic:bug-1351640
Verified that the number of stat calls are reduced on a GET operation: Created 10000 objects in a container and executed GET on the container while strace is attached to the container process: curl -i http://localhost:8080/v1/AUTH_test/dir1/ -X GET strace -ff -p 11858,11888 pid 11888] restart_syscall(<... resuming interrupted call ...>) = 1 [pid 11888] accept(4, {sa_family=AF_INET, sin_port=htons(54862), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5 [pid 11888] fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) [pid 11888] fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 11888] fcntl(5, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) [pid 11888] fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 11888] accept(4, 0x7fffa157a040, [16]) = -1 EAGAIN (Resource temporarily unavailable) [pid 11888] recvfrom(5, "GET /test/0/AUTH_test/dir1/ HTTP"..., 8192, 0, NULL, NULL) = 287 [pid 11888] getsockname(5, {sa_family=AF_INET, sin_port=htons(6011), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 [pid 11888] lstat("/mnt/gluster-object/test", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 11888] lstat("/mnt/gluster-object/test/..", {st_mode=S_IFDIR|0755, st_size=63, ...}) = 0 [pid 11888] stat("/mnt/gluster-object/test/dir1", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 11888] getxattr("/mnt/gluster-object/test/dir1", "user.swift.metadata", 0x0, 0) = 150 [pid 11888] getxattr("/mnt/gluster-object/test/dir1", "user.swift.metadata", "{"X-Object-Count":[500,0],"X-Timestamp":["1481622561.19635",0],"X-Type":["container",0],"X-PUT-Timestamp":["1481622561.11308",0],"X-Bytes-Used":[0,0]}", 150) = 150 [pid 11888] openat(AT_FDCWD, "/mnt/gluster-object/test/dir1", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 7 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 23 entries */, 32768) = 552 [pid 11888] getdents(7, /* 19 entries */, 32768) = 456 [pid 11888] getdents(7, /* 0 entries */, 32768) = 0 [pid 11888] close(7) = 0 [pid 11888] poll([{fd=4, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout) [pid 11888] sendto(5, "HTTP/1.1 200 OK\r\nContent-Length:"..., 2864, 0, NULL, 0) = 2864 [pid 11888] shutdown(5, SHUT_RDWR) = 0 [pid 11888] close(5) = 0 [pid 11888] poll([{fd=4, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 60000 Detailed performance testing is not been done , only verification of reduced stat calls is performed. Marking the BZ verified. swiftonfile-2.3.0-5.el7rhgs.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0488.html