Bug 1760294
Summary: | kernel: seccomp: wrong return value for blocked syscalls on s390x | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jan Staněk <jstanek> |
Component: | kernel | Assignee: | Vladis Dronov <vdronov> |
kernel sub component: | Memory Management | QA Contact: | Ping Fang <pifang> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | arozansk, brueckner, bugproxy, cye, hannsj_uhl, lmiksik, longman, mm-maint, omosnace, prudo, qcai, vdronov, vondruch |
Version: | 7.7 | Keywords: | Patch |
Target Milestone: | rc | ||
Target Release: | 7.8 | ||
Hardware: | s390x | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-3.10.0-1113.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-31 19:33:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1689150, 1713152 |
Description
Jan Staněk
2019-10-10 10:37:16 UTC
Additional info from bug#1759152: > It may not be a kernel issue but instead a Docker problem. It has been addresses in this ticket: > https://github.com/docker/for-linux/issues/208 So if it is a docker/podman issue, feel free to reassign – although I found it strange that it would manifest only on s390x if that was the case. In case it helps, there is a simple reproducer, adapted from stat() example in man pages by joransiu.com:
> You can actually reproduce the issue with a simple C testcase running on a Ubuntu docker image, and query against a symbolic link... after a few runs, you should observe inconsistent results being printed out. This was the testcase I have in my notes.. but haven't had a chance to test it again... about to board a flight in 5 minutes! Hope this helps!
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h> /* Definition of AT_* constants */
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
struct uv__statx_timestamp {
int64_t tv_sec;
uint32_t tv_nsec;
int32_t unused0;
};
struct uv__statx {
uint32_t stx_mask;
uint32_t stx_blksize;
uint64_t stx_attributes;
uint32_t stx_nlink;
uint32_t stx_uid;
uint32_t stx_gid;
uint16_t stx_mode;
uint16_t unused0;
uint64_t stx_ino;
uint64_t stx_size;
uint64_t stx_blocks;
uint64_t stx_attributes_mask;
struct uv__statx_timestamp stx_atime;
struct uv__statx_timestamp stx_btime;
struct uv__statx_timestamp stx_ctime;
struct uv__statx_timestamp stx_mtime;
uint32_t stx_rdev_major;
uint32_t stx_rdev_minor;
uint32_t stx_dev_major;
uint32_t stx_dev_minor;
uint64_t unused1[14];
};
int statx(int dirfd, const char *pathname, int flags,
unsigned int mask, struct uv__statx *statxbuf);
intmain(int argc, char *argv[]) {
int dirfd = AT_FDCWD;
int flags = AT_SYMLINK_NOFOLLOW;
int mode = 0xFFF;
int rc;
struct uv__statx sb;
if (argc != 2) {
fprintf(stderr, "Usage: %s <pathname>\n", argv[0]);
exit(EXIT_FAILURE);
}
printf("Path: %s\n", argv[1]);
if (syscall(379, dirfd, argv[1], flags, mode, &sb) == -1) {
perror("statx");
exit(EXIT_FAILURE);
}
printf("File type: ");
switch (sb.stx_mode & S_IFMT) {
case S_IFBLK: printf("block device\n"); break;
case S_IFCHR: printf("character device\n"); break;
case S_IFDIR: printf("directory\n"); break;
case S_IFIFO: printf("FIFO/pipe\n"); break;
case S_IFLNK: printf("symlink\n"); break;
case S_IFREG: printf("regular file\n"); break;
case S_IFSOCK: printf("socket\n"); break;
default: printf("unknown?\n"); break;
}
}
as mentioned in "man 2 statx": VERSIONS statx() was added to Linux in kernel 4.11. so, RHEL-7 does not have a statx() syscall implemented in any arch including s390x. double-check: [src/rhel7]$ git grep statx | grep -v -e ^tools/ -e ^redhat/ arch/s390/include/uapi/asm/unistd.h:/* Number 379 is reserved for sys_statx */ the reproducer above on a bare-metal system expectedly says: # uname -r 3.10.0-1062.el7.s390x # ./statx statx.c Path: statx.c statx: Function not implemented ret -1 errno 38 the reproducer above in an s390x RHEL7-based podman container with "--security-opt=seccomp=unconfined" expectedly says: (app-root)./statx statx.c Path: statx.c statx: Function not implemented ret -1 errno 38 the expected strace is: syscall_379(0xffffffffffffff9c, 0x3ffffb9f8e2, 0x100, 0xfff, 0x3ffffb9efd8, 0x3ffffb9f318) = -1 (errno 38) the reproducer above in an s390x RHEL7-based podman container (default seccomp config is /usr/share/containers/seccomp.json) says: (app-root)./statx statx.c Path: statx.c ret 1 errno 0 File type: mode: 0 mask: 3ff unknown? with the strace: 13:31:26 exit(-100) = ? 13:31:26 <... exit resumed> strace: _exit returned! ) = ? normally, syscall blocked by seccomp should force retval for syscall() to be -1 and errno 1 (EPERM). compare with RHEL7 x86_64, podman with "--security-opt=seccomp=unconfined": (app-root)./statx statx.c Path: statx.c statx: Function not implemented ret -1 errno 38 syscall_379(0xffffff9c, 0x7fff29c6f8e2, 0x100, 0xfff, 0x7fff29c6e970, 0x7fff29c6eb78) = -1 (errno 38) compare with RHEL7 x86_64, podman with default seccomp config: (app-root)./statx statx.c Path: statx.c statx: Operation not permitted ret -1 errno 1 syscall_379(0xffffff9c, 0x7ffdd0fc68e2, 0x100, 0xfff, 0x7ffdd0fc4460, 0x7ffdd0fc4668) = -1 (errno 38) statfs() blocked by seccomp: (app-root)./statx statx.c Path: statx.c statx: Operation not permitted ret -1 errno 1 statfs(0xffffff9c, 0x7fff1addd8e2) = -1 ENOSYS (Function not implemented) also, s390x RHEL8-based podman container with statx disabled by seccomp says: (app-root)./statx statx.c Path: statx.c statx: Operation not permitted ret -1 errno 1 statx(AT_FDCWD, "statx.c", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, 0x3ffe09fe708) = -1 EPERM (Operation not permitted) need to look at seccomp code which handles SCMP_ACT_ERRNO (default action in seccomp.json) in RHEL-7 s390x. test: 3.10.0-1104.el7.scmpfx.s390x - OK (app-root)./statx statx.c Path: statx.c statx: Operation not permitted ret -1 errno 1 syscall_379(0xffffffffffffff9c, 0x3ffff8068e2, 0x100, 0xfff, 0x3ffff805d28, 0x3ffff806068) = -1 (errno 1) test with statfs() disabled by seccomp - OK (app-root)./statx statx.c Path: statx.c statx: Operation not permitted ret -1 errno 1 statfs(0xffffffffffffff9c, 0x3ffff8658e2) = -1 EPERM (Operation not permitted) So do I correctly understand that this really is kernel issue on s390x in implementation of secomp? (In reply to Vít Ondruch from comment #9) > So do I correctly understand that this really is kernel issue on s390x in implementation of secomp? in a short word - yes. still, statx() syscall is not present on RHEL-7, so libuv shouldn't call it anyway. in regular setup such a call would return -38 (-ENOSYS), but the current implementation of libseccomp(?, i guess) on RHEL-7 makes a kernel to return -1 (-EPERM) for blocked calls, so userspace cannot determine if a syscall was blocked or returned just an ordinary error. See BZ1762578#c12, the seccomp.json in podman/docker whitelists the statx() syscall, but it is still blocked due to libseccomp missing the string-to-number mapping (and thus dropping it from the whitelist). Once this is fixed in libseccomp, statx() should be returning ENOSYS in containers as expected. (I assume libuv does some fallback when it gets ENOSYS error.) all 'seccomp after ptrace' upstream patches (except unsupported archs like um or tile): $ git l --oneline | grep 'seccomp after ptrace' + 1addc57e111b powerpc/ptrace: run seccomp after ptrace + 0208b9445bc0 s390/ptrace: run seccomp after ptrace - a5cd110cb836 arm64/ptrace: run seccomp after ptrace (arch is not supported) - 0f3912fd934c arm/ptrace: run seccomp after ptrace (arch is not supported) * 93e35efb8de4 x86/ptrace: run seccomp after ptrace so indeed, we need to add 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace") strace test: OK 08:26:12 syscall_0x1df(0xffffff9c, 0x7ffdbb643457, 0x100, 0xfff, 0x7ffdbb642f70, 0x7ffdbb643178) = -1 ENOSYS (Function not implemented) 08:26:12 read(0, "3\n", 1024) = 2 08:26:13 dup(2) = 3 08:26:13 fcntl(3, F_GETFL) = 0x8402 (flags O_RDWR|O_APPEND|O_LARGEFILE) 08:26:13 brk(NULL) = 0x15a9000 08:26:13 brk(0x15ca000) = 0x15ca000 08:26:13 brk(NULL) = 0x15ca000 strace+seccomp test: FAIL 08:24:42 syscall_0x1df(0xffffff9c, 0x7ffe7842d8d5, 0x100, 0xfff, 0x7ffe7842ba40, 0x7ffe7842bc48) = -1 ENOSYS (Function not implemented) 08:24:45 read(0, 0x7f8fa5044000, 1024) = -1 ENOSYS (Function not implemented) 08:24:45 dup(2) = -1 ENOSYS (Function not implemented) 08:24:45 fcntl(3, F_GETFL) = -1 ENOSYS (Function not implemented) 08:24:45 brk(NULL) = -1 ENOSYS (Function not implemented) 08:24:45 brk(0xe2f000) = -1 ENOSYS (Function not implemented) 08:24:45 brk(NULL) = -1 ENOSYS (Function not implemented) 08:24:45 fstat(3, 0x7ffe7842ad90) = -1 ENOSYS (Function not implemented) strace+seccomp+93e35efb8de4 test: OK 03:05:55 syscall_0x1df(0xffffff9c, 0x7ffe4ac578d5, 0x100, 0xfff, 0x7ffe4ac56c10, 0x7ffe4ac56e18) = -1 EPERM (Operation not permitted) 03:05:55 read(0, "3\n", 1024) = 2 03:06:04 dup(2) = 3 03:06:04 fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) 03:06:04 brk(NULL) = 0x1d17000 03:06:04 brk(0x1d38000) = 0x1d38000 03:06:04 brk(NULL) = 0x1d38000 *** Bug 1772147 has been marked as a duplicate of this bug. *** Patch(es) committed on kernel-3.10.0-1113.el7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1016 |