Hide Forgot
Description of problem: I am debugging a weird cockpit build system bug on my development system, which is essentially Fedora 37 OSTree on the outside, and a Fedora 37 toolbox running on it. It essentially tries to copy a file from the toolbox'es /usr/share to the bind-mounted $HOME in the container, which fails with a weird error Version-Release number of selected component (if applicable): On the host: podman-4.3.1-1.fc37.x86_64 glibc-2.36-8.fc37.x86_64 kernel-core-6.0.9-300.fc37.x86_64 In the F37 toolbox: coreutils-9.1-6.fc37.x86_64 glibc-2.36-1.fc37.x86_64 How reproducible: Always Steps to Reproduce: The most straightforward way is with toolbox: mkdir ~/test-tmp toolbox create -r 37 toolbox run -r 37 -- cp -p /usr/bin/cat ~/test-tmp/x The `cp` command fails with exit code 1 and cp: failed to preserve ownership for '/home/martin/test-tmp/x': Value too large for defined data type strace shows that it's trying to do this: fchown(4, 0, 0) = -1 EOVERFLOW (Value too large for defined data type) EOVERFLOW is not a documented error code for fchown(). It *should* fail with EPERM. So this bug is somewhere between podman (triggering the situation), coreutils (cp should ignore that error, as `-p` is documented to only *try* and preserve properties), and glibc/kernel for producing that weird error code. This is definitively related to the bind-mounted home directory. It works for a non-mounted target dir: toolbox run -r 37 -- cp -p /usr/bin/cat /var/tmp/x This can be reproduced relatively accurately with podman directly, to strip off the toolbox layer: podman run -it -v $HOME/test-tmp:/h:z --rm registry.fedoraproject.org/fedora cp -p /usr/bin/cat /h/x which fails with cp: cannot create regular file '/h/x': Value too large for defined data type This isn't exactly fchown(), but open(), but the same "should be EPERM, but is EOVERFLOW" applies. Again this works with either dropping the `-u` and running as root inside the container, or copying to /var/tmp. Additional info: [1] https://github.com/cockpit-project/cockpit/pull/17919
Sorry, the podman reproducer was wrong -- that lacked the -u option to actually reproduce the bug. The right one is this: podman run -it -v $HOME/test-tmp:/h:z -u $(id -u) --rm registry.fedoraproject.org/fedora cp -p /usr/bin/cat /h/x I tested this in a Fedora 36 container as well. My real dev env is not a straight registry.fedoraproject.org/fedora nor toolbox standard Fedora container, but https://quay.io/repository/cockpit/tasks (also Fedora 37 essentially). However, the workload inside the container does not seem to matter much, it even fails with busybox, which has a vastly different glibc/coreutils cp implementation: podman run -it -v $HOME/test-tmp:/h:z -u $(id -u) --rm docker.io/busybox cp -p /bin/cat /h/x # cp: can't create '/h/x': Value too large for defined data type
Interestingly, when I run this on a standard Fedora 37 cloud VM instead of my system, I get a correct error code: cp: can't create '/h/x': Permission denied as running as uid 1000 inside the container is a completely different user than my uid 1000 outside. My $HOME is pretty complicated (systemd-homed user, i.e. LUKS encrypted partition mounted on /home/martin). I tried this on a "regular" Linux user, and it works. Also, when I copy this to a bind mount which isn't LUKS, it works as well (/run/host/home/martin-cache/x in my case). So I tried to reproduce this with a LUKS partition: sudo dd if=/dev/zero of=/var/tmp/img bs=1M count=500 sudo cryptsetup luksFormat /var/tmp/img sudo cryptsetup luksOpen /var/tmp/img test1 sudo mkfs.ext4 /dev/mapper/test1 sudo chown -R $(id -u) /mnt podman run -it -v /mnt:/h:z -u $(id -u) --rm docker.io/busybox cp -p /bin/cat /h/x But this "works", i.e. cp fails with "Permission denied" instead of "Value too large for defined data type". So this is a bit more obscure. Any other idea what I could try here?
I don't see this as a Podman error. This is either coreutils or the kernel.
Allison was pointing out that this is most likely related to some UID/GID namespace mapping between the host and the podman container. In particular, this does not work on my system: touch ~/somefile; podman unshare chown 1:1 ~/somefile chown: changing ownership of '/home/martin/somefile': Value too large for defined data type mappings on my system: id # uid=1000(martin) gid=1000(martin) groups=1000(martin),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 cat /etc/subuid # martin:100000:65536 /etc/subgid looks exactly the same But this works in a fresh Fedora 37 with a standard Linux user: touch ~/somefile; podman unshare chown 1:1 ~/somefile ls -l somefile # -rw-r--r--. 1 100000 100000 0 Nov 22 09:12 somefile id # uid=1000(admin) gid=1000(admin) groups=1000(admin),10(wheel),100(users) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 cat /etc/subuid # admin:100000:65536 # fedora:165536:65536 /etc/subgid looks identical So I don't see any difference between my system and the cloud image so far, but at least this feels a bit closer to the root cause?
Could you please provide strace output for the failing chown command?
Kamil: It's already in the description, but it looks the same for podman unshare: touch ~/somefile; podman unshare strace -fvvs1024 chown 1:1 ~/somefile newfstatat(AT_FDCWD, "/home/martin/somefile", {st_dev=makedev(0xfd, 0), st_ino=524453, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=0, st_atime=1669111623 /* 2022-11-22T11:07:03.664013917+0100 */, st_atime_nsec=664013917, st_mtime=1669111623 /* 2022-11-22T11:07:03.664013917+0100 */, st_mtime_nsec=664013917, st_ctime=1669111623 /* 2022-11-22T11:07:03.664013917+0100 */, st_ctime_nsec=664013917}, AT_SYMLINK_NOFOLLOW) = 0 fchownat(AT_FDCWD, "/home/martin/somefile", 1, 1, 0) = -1 EOVERFLOW (Value too large for defined data type)
Oops, sorry for overlooking the strace output in comment #0. I cannot see how this would be coreutils' fault. chown(1) does the syscall as requested and properly handles the error returned out of it.
I think the only blame one could put on coreutils is that "cp -p" fails on EOVERFLOW instead of ignoring it (as `-p` is opportunistic). But that's more of a distracting side issue, the root cause is indeed somewhere else.
Good point. `cp -p` seems to tolerate EPERM and EINVAL but not EOVERFLOW: https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/copy.c;h=b15d919900e178c0667c1ced1c276e87ba6a8f2d;hb=HEAD#l3158
Ah, this was a systemd change in version 251: https://github.com/systemd/systemd/blob/1d679b208d982bd5b8ba893981774cac5959b4b4/NEWS#L789 * Starting with v250 systemd-homed uses UID/GID mapping on the mounts of activated home directories it manages (if the kernel and selected file systems support it). So far it mapped three UID ranges: the range from 0…60000, the user's own UID, and the range 60514…65534, leaving everything else unmapped (in other words, the 16bit UID range is mapped almost fully, with the exception of the UID subrange used for systemd-homed users, with one exception: the user's own UID). Unmapped UIDs may not be used for file ownership in the home directory — any chown() attempts with them will fail. With this release a fourth range is added to these mappings: 524288…1879048191. This range is the UID range intended for container uses .. and some more. So that at least explains what happened to break this. So it seems podman needs to adjust to this, or systemd change the provided mappings?
Could you show podman unshare cat /proc/self/uid_map /proc/self/gid_map
❱❱❱ podman unshare cat /proc/self/uid_map /proc/self/gid_map 0 1000 1 1 100000 65536 0 1000 1 1 100000 65536
Its working for me. $ rm -f somefile; touch somefile; podman unshare chown 1:1 somefile $ podman unshare cat /proc/self/uid_map 0 3267 1 1 100000 65536
In accordance with the above systemd change, I changed my /etc/sub[ug]id from martin:100000:65536 to martin:524288:65536 then trashed and rebuilt my toolboxes, and things work again. So reassigning this to shadow-utils for considering to changing SUB_UID_MIN in /etc/login.defs to 524288.
Does it also affect fedora 36? I'm currently using this fedora version and systemd's version is 250. On top of that, should I also change SUB_GID_MIN? Or just SUB_UID_MIN? The documentation doesn't specify anything for subordinate gid's, but as I'm already changing one value I could change the other.
Iker, indeed F36 currently has systemd 250, which is not affected. That change landed in 251, see comment #10. https://bodhi.fedoraproject.org/updates/?packages=systemd does not look like the systemd team regularly pushes new upstream releases into non-current stable releases, so I figure F36 is okay for the time being. > On top of that, should I also change SUB_GID_MIN? Yes, sorry for forgetting that. The two should stay in sync.
FEDORA-2022-4e499aeffd has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-4e499aeffd
FEDORA-2022-4e499aeffd has been pushed to the Fedora 37 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-4e499aeffd` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-4e499aeffd See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2022-4e499aeffd has been pushed to the Fedora 37 stable repository. If problem still persists, please make note of it in this bug report.