Bug 1931904

Summary: reposync re-downloads packages with multiple hardlinks
Product: Red Hat Enterprise Linux 8 Reporter: Josef Kubin <jkubin>
Component: librepoAssignee: Marek Blaha <mblaha>
Status: CLOSED ERRATA QA Contact: Jan Blazek <jblazek>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 8.3CC: amatej, mblaha, naresh.sukhija_ext, pkratoch
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: librepo-1.14.0-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 19:45:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1951407    
Bug Blocks:    

Description Josef Kubin 2021-02-23 14:28:09 UTC
Description of problem:

The reposync re-downloads files that have hardlinks outside the repo directory.

For example: The exact same RPM file perf-debuginfo-2.6.32-754.36.1.el6.x86_64.rpm exists in both rhel-6-server-els-optional-debug-rpms and rhel-6-server-els-debug-rpms.
So if we de-duplicate the 2 files using any utility (rdfind, cp, hardlink etc.), the file seen by repo rhel-6-server-els-optional-debug-rpms and rhel-6-server-els-debug-rpms would be the same file with multiple hardlinks.
At this point, if the repo metadata expected different attributes of the file (modification date, sha256sum etc.) and somehow due to bug in dnf reposync, it finds the existing file not matching one of the criteria, it would re-download that file every time.

We did not have this bug till EL7 yum reposync.

List of extended file attributes of the mentioned package that is being downloaded again:
~~~
# getfattr --dump Packages/p/perf-debuginfo-2.6.32-754.36.1.el6.x86_64.rpm
# file: Packages/p/perf-debuginfo-2.6.32-754.36.1.el6.x86_64.rpm
user.Zif.MdChecksum[1610935587]="769f719235d607f4406aa33ced7f27c76f5db4c0"
user.Zif.MdChecksum[1613801719]="cf7ebf419e892ae6f0df7c8fcb3fe538844d9553f0a1b523d8addd8faa5c0111"
user.Zif.MdChecksum[1613985221]="cf7ebf419e892ae6f0df7c8fcb3fe538844d9553f0a1b523d8addd8faa5c0111"
user.Zif.MdChecksum[1613990826]="cf7ebf419e892ae6f0df7c8fcb3fe538844d9553f0a1b523d8addd8faa5c0111"
user.Zif.MdChecksum[1613990960]="cf7ebf419e892ae6f0df7c8fcb3fe538844d9553f0a1b523d8addd8faa5c0111"
user.Zif.MdChecksum[1613992052]="cf7ebf419e892ae6f0df7c8fcb3fe538844d9553f0a1b523d8addd8faa5c0111"
~~~

Mount attributes:
~~~
# mount
/dev/mapper/vg_data-lv_var_mrepo on /var/mrepo type ext4 (rw,nodev,relatime,seclabel)
~~~

Version-Release number of selected component (if applicable):
librepo-1.12.0-2.el8.x86_64
libdnf-0.48.0-5.el8.x86_64
dnf-4.2.23-4.el8.noarch
dnf-plugins-core-4.0.17-5.el8.noarch

How reproducible:
In the customer's environment.

Actual results:

Each time we run the `reposync` on the same repository, the command repeatedly downloads packages which are hardlink-ed.

Expected results:

The packages are downloaded only once no matter whether are hardlink-ed or not.

Comment 1 Marek Blaha 2021-03-01 10:18:57 UTC
I've created patch that fixes the problem - https://github.com/rpm-software-management/librepo/pull/232

Comment 2 Marek Blaha 2021-04-12 11:17:25 UTC
PR with tests: https://github.com/rpm-software-management/ci-dnf-stack/pull/972

Comment 12 errata-xmlrpc 2021-11-09 19:45:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (librepo bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4429

Comment 13 Jaroslav Mracek 2022-03-15 11:34:35 UTC
*** Bug 1929274 has been marked as a duplicate of this bug. ***