Bug 1895363
Summary: | Dnf transaction calculated in --forcearch mode thinks there's no space in / partition | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Pavel Raiskup <praiskup> |
Component: | qemu | Assignee: | Fedora Virtualization Maintainers <virt-maint> |
Status: | CLOSED WORKSFORME | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 34 | CC: | bberg, berrange, cfergeau, dmach, dominik, igor.raits, itamar, jkadlcik, jmracek, jrohel, kdudka, mblaha, mdomonko, mhatina, mjw, ngompa13, ondrejj, packaging-team-maint, pbonzini, philmd, pkratoch, pmatilai, pmoravco, pstodulk, rjones, rpm-software-management, virt-maint, vmukhame |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-06-15 06:36:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Pavel Raiskup
2020-11-06 13:32:17 UTC
FWIW, I'm not able to reproduce with the above instructions, in mock or otherwise (after correcting /var/tmp/main-root-rawhide/ vs /var/tmp/main-root/ inconsistencies in the instructions). So there seems to be some missing ingredient.
But with the non-mock instructions I get this, which might be related...
bash-5.0# df
/usr/bin/df: cannot read table of mounted file systems: No such file or directory
> This can be easily reproduced on Fedora 33 x86_64 using mock:
> $ mock -r fedora-rawhide-armhfp --shell
> ...
Is that "mock -r fedora-rawhide-armhfp --shell" alone supposed to reproduce it, or does "..." mean some "business as usual" commands that might be relevant after all?
> after correcting /var/tmp/main-root-rawhide/ vs /var/tmp/main-root/ Yes, sorry. I tested this again onw on my F33, so I corrected the instructions: rpm -q rpm dnf rpm-4.16.0-1.fc33.x86_64 dnf-4.4.0-3.fc33.noarch Steps to reproduce: sudo su - dnf -y --installroot /var/tmp/main-root --releasever 34 --forcearch armv7hl install dnf --setopt=tsflags=nocontexts --disablerepo='*' --enablerepo fedora --enablerepo updates # the resolv.conf symlink is not created anymore by systemd, so no need to # remove it now cp /etc/resolv.conf /var/tmp/main-root/etc/resolv.conf touch /var/tmp/main-root/dev/urandom mount --bind /dev/urandom /var/tmp/main-root/dev/urandom chroot /var/tmp/main-root/ bash-5.0# /usr/bin/dnf -y --installroot /sub-root --releasever 34 --forcearch armv7hl install filesystem --setopt=tsflags=nocontexts --disablerepo='*' --enablerepo fedora --enablerepo updates ... Error: Transaction test error: installing package fedora-release-identity-basic-34-0.9.noarch needs 44KB more space on the / filesystem ... Error Summary ------------- Disk Requirements: At least 48MB more space needed on the / filesystem. > Is that "mock -r fedora-rawhide-armhfp --shell" alone supposed to reproduce it Yes, but I guess I should suggest 'mock -r fedora-rawhide-armhfp --scrub=all' first, and turn on (if explicitly disabled) the bootstrap. The mock process fails when it tries to install the build chroot from the (emulated) bootstrap chroot. This seems rather unstable at best. I can now fairly reliably reproduce the first failure (outside mock), but what happens after than varies a lot. Sometimes the second attempt at the last command succeeds, sometimes it doesn't, and so on. When it fails, rpm on debug logging shows: D: computing file dispositions D: 0x0000fd00 4096 0 4062891 rotational:-1 / Which says 4096 blocksize, 0 available blocks and 4062891 available inodes on the / filesystem. Zero available blocks could be from read-only filesystem indicated by statvfs(), but other than that it's just whatever statvfs() returns, and rpm is right to stop when there's no space indicated (or the media is read-only, which is mapped to 0 avail by rpm) What does this forcearch-thing really do? That's where I would look at first. Figured I can strace the thing from the outside. This is what that activity looks like when run from the host (with rpm -Uvv --ignorearch): stat("/var/lib/rpm", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 statfs("/", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=17931038, f_bfree=4107696, f_bavail=3186096, f_files=4587520, f_ffree=4078319, f_fsid={val=[3681299638, 2236116973]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 stat("/", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 write(2, "D: ", 3) = 3 write(2, "0x0000fd00 4096 3186096"..., 62) = 62 What happens in the qemu-static process is something quite different: statx(AT_FDCWD, "/var/lib/rpm", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=4096, ...}) = 0 mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1de2bc6000 futex(0x7e7dc8, FUTEX_WAKE, 2147483647) = 1 access("/usr/qemu-arm/usr/lib/", F_OK) = -1 ENOENT (No such file or directory) statfs("/usr/lib/", {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=17931038, f_bfree=3938041, f_bavail=3016441, f_files=4587520, f_ffree=4062853, f_fsid={val=[3681299638, 2236116973]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0 statx(AT_FDCWD, "/usr/lib/", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=4096, ...}) = 0 statx(AT_FDCWD, "/usr", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=4096, ...}) = 0 statx(AT_FDCWD, "/usr/lib", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=4096, ...}) = 0 statx(AT_FDCWD, "/usr", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=4096, ...}) = 0 statx(AT_FDCWD, "/", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=4096, ...}) = 0 write(2, "D: ", 3) = 3 write(2, "0x0000fd00 4096 0"..., 62) = 62 The statfs() is on a strange path, and then I dunno, some emulation going on which gets it wrong, mayhap? So. When the same thing works on bare metal but fails when emulated, I'm inclined to blame the emulator. Short summary: it seems that statvfs() in this situation (inside chroot and all) returns with ST_RDONLY set in f_flag or 0 in f_bavail, either would seem incorrect. Panu, I don't suppose it would be possible to distil this down to a minimal reproducer? My best attempt was below but I wasn't able to reproduce the problem with qemu-user-5.0.0-5.fc33.x86_64. What version of qemu-user is supposed to cause the problem? --- #include <stdio.h> #include <stdlib.h> #include <sys/stat.h> #include <sys/vfs.h> int main () { struct statfs buffs; struct stat buf; if (statfs ("/", &buffs) == -1) { perror ("statfs"); exit (1); } printf ("f_bavail: %d\n", buffs.f_bavail); if (stat ("/", &buf) == -1) { perror ("stat"); exit (1); } printf ("st_size: %d\n", buf.st_size); exit (0); } qemu-user-static-5.1.0-5.fc33.x86_64 is the version I used to reproduce. The issue occurs inside a chroot (in fact a nested one), which is likely relevant for reproducing. I didn't try this, but: # dnf -y --installroot /var/tmp/main-root --releasever 34 --forcearch armv7hl install dnf --setopt=tsflags=nocontexts --disablerepo='*' --enablerepo fedora --enablerepo updates # chroot /var/tmp/main-root/ # run your test program If that doesn't reproduce it, add the nested chroot to the mix, ie in the test-program before stat'ing: mkdir('/sub-root') chroot('/sub-root') The nested chroot is very likely the key to this. In the initial fedora-devel thread, it's mentioned that disabling bootstrap mode works around the issue: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/NNP6TZNG2ZRQ3XCKCJWFTL3XIABOBZJW/#NNP6TZNG2ZRQ3XCKCJWFTL3XIABOBZJW > In the initial fedora-devel thread, it's mentioned that disabling bootstrap
> mode works around the issue:
When you disable bootstrap, the build chroot is installed by the
system-default DNF/RPM stack, so the emulation isn't in action at all.
That said, I'm not sure the second (nested) chroot is needed. It should
be enough to just use one chroot to actually run statfs() through the
emulation layer.
Ah, it of course depends on what RPM is doing ... if it calls chroot() internally with `--root DIRECTORY`, another chroot level might really be needed. Yes, rpm does it's own chroot() when --root / --installroot is used. This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle. Changing version to 34. FTR, after moving to F34 I fail to reproduce the original issue. Ok, as I'm the submitter of this bug - I think it is fair to close this now (we moved Copr builders to F34). But feel free to reopen in case you want to fix F33. |