Unable to run a pod with very high nofile limits. This is something that is critical for some workloads. An example is a KinD[1] cluster where we need more than a million open files to deploy something like Istio[2] on top of the KinD cluster. [1] https://kind.sigs.k8s.io/ [2] https://istio.io/latest/docs/setup/getting-started/ Reproducible: Always Steps to Reproduce: 1. Edit /etc/security/limits.conf then reboot: # KinD Cluster Tuning * soft nofile 4194304 * hard nofile 4194304 2. Verify applied: $ prlimit RESOURCE DESCRIPTION SOFT HARD UNITS AS address space limit unlimited unlimited bytes CORE max core file size unlimited unlimited bytes CPU CPU time unlimited unlimited seconds DATA max data size unlimited unlimited bytes FSIZE max file size unlimited unlimited bytes LOCKS max number of file locks held unlimited unlimited locks MEMLOCK max locked-in-memory address space 8388608 8388608 bytes MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes NICE max nice prio allowed to raise 0 0 NOFILE max number of open files 1024 4194304 files NPROC max number of processes 256382 256382 processes RSS max resident set size unlimited unlimited bytes RTPRIO max real-time priority 0 0 RTTIME timeout for real-time tasks unlimited unlimited microsecs SIGPENDING max number of pending signals 256382 256382 signals STACK max stack size 8388608 unlimited bytes 3. Try to run a rootless container with more than 1048576 limit for nofile. Actual Results: $ podman run -it --rm --ulimit=nofile=1048577:1048577 fedora:rawhide ulimit -n Error: crun: setrlimit `RLIMIT_NOFILE`: Operation not permitted: OCI permission denied Expected Results: podman run -it --rm --ulimit=nofile=1048577:1048577 fedora:rawhide ulimit -n 1048577 https://github.com/containers/podman/blob/main/libpod/define/config.go#L89 seems to being applied.
This is the kernel blocking the change not Podman. Could you grab an strace of podman to see what is the values being passed into set the crun limit. I know that newer versions of Podman had some changes around handling of these limits, so it might be helpful to give us the version of Podman you are using. Please attach `podman info`.
opened a PR: https://github.com/containers/podman/pull/24228
$ podman info host: arch: amd64 buildahVersion: 1.37.4 cgroupControllers: - cpu - io - memory - pids cgroupManager: systemd cgroupVersion: v2 conmon: package: conmon-2.1.12-2.fc40.x86_64 path: /usr/bin/conmon version: 'conmon version 2.1.12, commit: ' cpuUtilization: idlePercent: 96.19 systemPercent: 0.78 userPercent: 3.03 cpus: 24 databaseBackend: sqlite distribution: distribution: fedora variant: workstation version: "40" eventLogger: journald freeLocks: 2048 hostname: XXXX idMappings: gidmap: - container_id: 0 host_id: 1000 size: 1 - container_id: 1 host_id: 524288 size: 65536 uidmap: - container_id: 0 host_id: 1000 size: 1 - container_id: 1 host_id: 524288 size: 65536 kernel: 6.10.13-200.fc40.x86_64 linkmode: dynamic logDriver: journald memFree: 49458327552 memTotal: 67291123712 networkBackend: netavark networkBackendInfo: backend: netavark dns: package: aardvark-dns-1.12.2-2.fc40.x86_64 path: /usr/libexec/podman/aardvark-dns version: aardvark-dns 1.12.2 package: netavark-1.12.2-1.fc40.x86_64 path: /usr/libexec/podman/netavark version: netavark 1.12.2 ociRuntime: name: crun package: crun-1.17-1.fc40.x86_64 path: /usr/bin/crun version: |- crun version 1.17 commit: 000fa0d4eeed8938301f3bcf8206405315bc1017 rundir: /run/user/1000/crun spec: 1.0.0 +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL os: linux pasta: executable: /usr/bin/pasta package: passt-0^20240906.g6b38f07-1.fc40.x86_64 version: | pasta 0^20240906.g6b38f07-1.fc40.x86_64 Copyright Red Hat GNU General Public License, version 2 or later <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. remoteSocket: exists: false path: /run/user/1000/podman/podman.sock rootlessNetworkCmd: pasta security: apparmorEnabled: false capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: true seccompEnabled: true seccompProfilePath: /usr/share/containers/seccomp.json selinuxEnabled: true serviceIsRemote: false slirp4netns: executable: "" package: "" version: "" swapFree: 8589930496 swapTotal: 8589930496 uptime: 0h 23m 27.00s variant: "" plugins: authorization: null log: - k8s-file - none - passthrough - journald network: - bridge - macvlan - ipvlan volume: - local registries: search: - registry.fedoraproject.org - registry.access.redhat.com - docker.io store: configFile: /home/jon/.config/containers/storage.conf containerStore: number: 0 paused: 0 running: 0 stopped: 0 graphDriverName: overlay graphOptions: {} graphRoot: /home/jon/.local/share/containers/storage graphRootAllocated: 998483427328 graphRootUsed: 81131991040 graphStatus: Backing Filesystem: btrfs Native Overlay Diff: "true" Supports d_type: "true" Supports shifting: "false" Supports volatile: "true" Using metacopy: "false" imageCopyTmpDir: /var/tmp imageStore: number: 2 runRoot: /run/user/1000/containers transientStore: false volumePath: /home/jon/.local/share/containers/storage/volumes version: APIVersion: 5.2.4 Built: 1728259200 BuiltTime: Sun Oct 6 18:00:00 2024 GitCommit: "" GoVersion: go1.22.7 Os: linux OsArch: linux/amd64 Version: 5.2.4
Created attachment 2051406 [details] podman run resulting in rror: crun: setrlimit `RLIMIT_NOFILE`: Operation not permitted: OCI permission denied
Closing given podman with Giuseppe's PR has been shipped. Please reopen if issue persists with podman v5.4.