Bug 1872141 - upgrading texlive fails
Summary: upgrading texlive fails
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Packaging Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-25 07:44 UTC by Daniel Mach
Modified: 2021-01-06 08:12 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-30 08:52:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Daniel Mach 2020-08-25 07:44:43 UTC
Description of problem:
Upgrading texlive fails.
The next transaction succeeds and even the previously failed package gets upgraded correctly.


Version-Release number of selected component (if applicable):
$ rpm -q rpm libdnf dnf
rpm-4.16.90-0.git15116.1.fc33.x86_64
libdnf-0.51.0-0.15gdd15584e.fc33.x86_64
dnf-4.2.23-17gce90f283.fc33.noarch




Reproducer:
$ dnf --installroot=/tmp/foo --releasever=31 --repoid=fedora install texlive glibc-minimal-langpack
$ rm /tmp/foo/etc/yum.repos.d -rfv
$ dnf --installroot=/tmp/foo --releasever=33 --repoid=rawhide update --nogpgcheck
^^^ running the command for the 2nd time passes


Actual results:
Transaction succeeds


Expected results:
Failed:
  texlive-base-7:20190410-2.fc31.x86_64                                                        texlive-base-7:20200327-15.fc34.x86_64                                                       

Error: Transaction failed

Comment 1 Panu Matilainen 2020-08-25 12:42:40 UTC
FWIW, the above reproducer didn't work for me as such, I needed to do this instead (on F32, installroot difference aside):

# rm -rf /srv/test
# dnf -y --installroot=/srv/test --releasever=31 --repoid=fedora install texlive fedora-repos-rawhide
# dnf -y --installroot=/srv/test/ --releasever=33 --repoid=rawhide update --nogpgcheck

Here's the failure:

>  Running scriptlet: texlive-base-7:20200327-15.fc34.x86_64             34/1299 
>  Upgrading        : texlive-base-7:20200327-15.fc34.x86_64             34/1299 
> Error unpacking rpm package texlive-base-7:20200327-15.fc34.x86_64
>  Upgrading        : zlib-1.2.11-22.fc33.x86_64                         35/1299 
> error: unpacking of archive failed on file /usr/share/texlive/texmf-var: cpio: chown
> error: texlive-base-7:20200327-15.fc34.x86_64: install failed

texlive-base has this:
> preinstall scriptlet (using /bin/sh):
> rm -rf /usr/share/texlive/texmf-var
> rm -rf /var/lib/texmf/*

...and in the f31 package, texmf-var is a symlink:
# ls -l /srv/test/usr/share/texlive/texmf-var
lrwxrwxrwx. 1 root root 14 Aug  2  2019 /srv/test/usr/share/texlive/texmf-var -> /var/lib/texmf


...but in the f34 package, it's a directory:
[root@lumikko ~]# ls -ld /srv/test/usr/share/texlive/texmf-var
drwxr-xr-x. 3 root root 4096 Aug 25 15:22 /srv/test/usr/share/texlive/texmf-var

So it seems to be a case of the "good ole" directory <-> symlink replacement case, which you can't do in %pre, it needs to be done in %pretrans: https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/

Comment 2 Tom "spot" Callaway 2020-08-25 15:35:20 UTC
Panu, this is much more insidious. The intent is for /usr/share/texlive/texmf-var to always be a symlink to /var/lib/texmf. When I do a clean install of texlive-base in the same manner that you did the upgrade above, I end up with a symlink. (The logic for this has not changed at all between F31 and F33/F34). I added coreutils to the install list to get rm, though, it makes no difference in the end result (I tested without coreutils and it just results in those rm invocations in pre not running, and yes, I should have Requires(pre): coreutils):

[spot@localhost texlive-base]$ sudo rm -rf /srv/test/*
[spot@localhost texlive-base]$ sudo dnf --installroot=/srv/test --release=33 --nogpgcheck install coreutils texlive-base  
[spot@localhost texlive-base]$ ls -l /srv/test/usr/share/texlive/texmf-var
lrwxrwxrwx. 1 root root 14 Aug 13 14:29 /srv/test/usr/share/texlive/texmf-var -> /var/lib/texmf

So, we know it is a symlink in a fresh install of F31. We know it is a symlink in a fresh install of F33.

I tried to reproduce your failure... and I could not.

Starting with F31 (coreutils, texlive-base, and dependencies) in /srv/test, I did:

sudo dnf --installroot=/srv/test/ --release=33 --nogpgcheck update

The transaction did not fail, there were no errors, and /srv/test/usr/share/texlive/texmf-var is still a symlink:

[spot@localhost texlive-base]$ ls -l /srv/test/usr/share/texlive/texmf-var
lrwxrwxrwx. 1 root root 14 Aug 13 14:29 /srv/test/usr/share/texlive/texmf-var -> /var/lib/texmf


*****

So... I can't reproduce this failure, and I can't see any obvious reason why it would go from a symlink to a directory (it should never be a directory). I thought perhaps something in the existing %pretrans lua code was somehow converting it from a symlink to a directory, but the lua code does this:

ath = "/usr/share/texmf"
st = posix.stat(path)
if st and st.type == "directory" then
...

the rpm lua posis.stat code should return st.type == "link", thus, that conditional should never be invoked for /usr/share/texlive/texmf-var.

Oh, and the %pretrans bits? They're taken exactly from https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/. /usr/share/texlive/texmf-var used to be a directory (in the pre F30 timeframe, iirc). 

I _could_ also add the lua scriptlet to replace a symlink to a directory with a directory, but... /usr/share/texlive/texmf-var should _ALWAYS_ be a symlink, and we never want that outcome.

We need to figure out a reproducible scenario, then figure out how in the world this symlink is becoming a directory.

Comment 3 Panu Matilainen 2020-08-26 07:52:18 UTC
Oh... yeah, this is funny. 

It's totally reproducable for me, and also for Dan (the reporter), so I assumed it's always reproducable. Thanks for testing + background! Now that I got the reproducer down to rpm-cli level to get meaningful debug logs, it's actually pretty obvious:

> D: %prein(texlive-base-7:20200327-15.fc34.x86_64): scriptlet start
> fdio:       2 writes,       68 total bytes in 0.000011 secs
> D: %prein(texlive-base-7:20200327-15.fc34.x86_64): execv(/bin/sh) pid 1113727
> + rm -rf /usr/share/texlive/texmf-var
> + rm -rf '/var/lib/texmf/*'
> + :
> D: %prein(texlive-base-7:20200327-15.fc34.x86_64): waitpid(1113727) rc 1113727 s
> tatus 0
> [...]
> D: create     040755  1 (   0,   0)     0 /usr/share/texlive/texmf-local/texmf-compat
> D:  fsmChown (/usr/share/texlive/texmf-local/texmf-compat, 0, 0) 
> D:  fsmChmod (/usr/share/texlive/texmf-local/texmf-compat, 00755) 
> D:  fsmUtime (/usr/share/texlive/texmf-local/texmf-compat, 0x5f357bca) 
> D: touch      120777  1 (   0,   0)    14 /usr/share/texlive/texmf-var
>    ^^^^^
> D:  fsmChown (/usr/share/texlive/texmf-var, 0, 0) No such file or directory
> fdio:     546 reads, 16606678 total bytes in 0.018707 secs
> error: unpacking of archive failed on file /usr/share/texlive/texmf-var: cpio: chown failed - No data available
> D: exiting chroot /srv/test/
> ufdio:       6 reads,    21173 total bytes in 0.000010 secs
> error: texlive-base-7:20200327-15.fc34.x86_64: install failed

The "touch" is the smoking gun: this only happens with %_minimize_io enabled.

In that "ssd preservation mode", rpm skips file creation if the operation would not change the contents, and only updates file metadata (aka "touches"). The calculation what files to touch is done during fingerprinting (ie after %pretrans but before installs actually start), so it can be thrown off with packages doing funny things in their scriptlets. Such as this - the failure wouldn't happen if /usr/share/texlive/texmf-var was nuked in %pretrans instead of %pre).

So in a sense, the package is shooting itself in the foot here, but the behavior difference from rpm POV doesn't seem acceptable. I'll see what I can do.

Comment 4 Tom "spot" Callaway 2020-08-26 13:03:26 UTC
texlive is very good at shooting itself. The current %pre is something that got blindly inherited forward from before my time (I think):

%pre
rm -rf %{_texdir}/texmf-var
rm -rf %{_texmf_var}/*
:

It's deleting the file/directory/link/whatever at /usr/share/texlive/texmf-var _AND_ deleting anything in /var/lib/texmf/*

Now, the %pretrans lua magic should handle /usr/share/texlive/texmf-var, so that seems safe to remove, but is it fine to leave %pre as:

%pre
rm -rf %{_texmf_var}/*
:

Or should I include that action in the %pretrans?

[[Also, if you want me to leave texlive-base as it is for a bit so you can test this case, I can, but I'd like to take a few bullets away from texlive.]]

Comment 5 Panu Matilainen 2020-09-02 07:16:34 UTC
Sorry, I've no idea what's right for texlive. 

This bug does nicely demonstrate how brittle and dangerous the %_minimize_writes feature currently is though (we knew it of course, but seeing is believing :) Until we can ensure identical end result whether _minimize_writes is enabled or not, that thing needs to remain experimental and off by default (the plan was to automatically enable on SSD in 4.16)

So the bottom line is, if you just want to shoot the other leg from texlive that's fine :) but it's rpm that needs fixing: rpm needs to be bulletproof against this kind of stuff, and _minimize_writes must not alter outcome of installation.

Comment 6 Panu Matilainen 2020-09-30 08:52:19 UTC
Upstream has a minimal fix for this now: https://github.com/rpm-software-management/rpm/pull/1347

However that's not what went into 4.16.0, in that release %_minimize_writes is back to being always disabled by default because the above fix is only papering over the more fundamental issue. Nevertheless, with 4.16.0 final we can consider this fixed in rawhide and f33 for the general case.

Comment 7 Panu Matilainen 2020-09-30 08:53:06 UTC
Eh, should've been:

with 4.16.0 final we can consider this fixed in rawhide and f33 for the default case.

Comment 8 Miroslav Suchý 2020-12-22 15:38:38 UTC
@pmatilai Can this be resolved by calling checkInstalledFiles() again from rpmteProcess() if the package has either %pretrans or %pre scriptlet? And if %_minimize_writes is set. This is performance penalty, but way smaller than actually copying the file.


Note You need to log in before you can comment on or make changes to this bug.