RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1696808 - createrepo_c generates corrupted sqlite files
Summary: createrepo_c generates corrupted sqlite files
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: createrepo_c
Version: 7.6
Hardware: x86_64
OS: Linux
medium
urgent
Target Milestone: rc
: ---
Assignee: amatej
QA Contact: Eva Mrakova
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-05 16:54 UTC by Douglas Furlong
Modified: 2019-11-26 11:27 UTC (History)
10 users (show)

Fixed In Version: createrepo_c-0.10.0-20.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-26 11:27:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rpm-software-management createrepo_c pull 151 0 'None' closed Prevent race condition when --update & we have duplicate pkgs (RhBug:1696808) 2020-07-01 07:52:24 UTC
Red Hat Knowledge Base (Solution) 4079101 0 Supportability None createrepo_c creates corrupted sqlite files when same RPM is duplicated in repository 2019-04-22 16:34:39 UTC
Red Hat Product Errata RHBA-2019:3986 0 None None None 2019-11-26 11:27:30 UTC

Description Douglas Furlong 2019-04-05 16:54:05 UTC
Description of problem: 
createrepo_c at times creates corrupted primary.sqlite.bz2 files

Version-Release number of selected component (if applicable):
Version     : 0.10.0
Release     : 18.el7
Architecture: x86_64

How reproducible:
Frequent

Steps to Reproduce:
This is how I've been able to reproduce
run createrepo_c with the below switches against a mirror of the Fedora Everything repository.

createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat 

I've managed to get it to generate a corrupted sqlite file within 3 runs.

Output from the three runs of createrepo_c
[root@cfm7803-pulp doug]# createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat /var/lib/pulp/doug/
Directory walk started
Directory walk done - 116414 packages
Loaded information about 0 packages
Temporary output repo path: /var/lib/pulp/doug/.repodata/
Preparing sqlite DBs
Pool started (with 5 workers)
Pool finished
[root@cfm7803-pulp doug]# createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat /var/lib/pulp/doug/                                                                                                                                                                                                                                                                                                         
Directory walk started
Directory walk done - 116414 packages
Loaded information about 58207 packages
Temporary output repo path: /var/lib/pulp/doug/.repodata/
Preparing sqlite DBs
Pool started (with 5 workers)
Pool finished
[root@cfm7803-pulp doug]# createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat /var/lib/pulp/doug/
Directory walk started
Directory walk done - 116414 packages
Loaded information about 58207 packages
Temporary output repo path: /var/lib/pulp/doug/.repodata/
Preparing sqlite DBs
Pool started (with 5 workers)
Pool finished


Actual results:
After running createrepo_c I run repoview against the repo, which fails with the following error.

Traceback (most recent call last):  
  File "/usr/bin/repoview", line 940, in <module>    
    main()  
  File "/usr/bin/repoview", line 937, in main    
    Repoview(opts)  
  File "/usr/bin/repoview", line 191, in __init__    
    packages = self.do_packages(repo_data, group_data, pkgnames)  
  File "/usr/bin/repoview", line 544, in do_packages    
    pkg_data = self.get_package_data(pkgname)  
  File "/usr/bin/repoview", line 442, in get_package_data    
    rows = pcursor.fetchall()
sqlite3.OperationalError: Could not decode to UTF-8 column \'location_href\' with text \'\xc0\x9f\'\n', 7760, 7761, 'ordinal not in range(128)')

Reviewing the python code, this is the reference to the file location within the repo.

After dumping the content of the sqlitedb, I can see entries along the lines of the below.

grep artwiz-aleczapka-fonts-1.3 /tmp/dumped_data_pulp 
This is a metapackage, which pulls in all the separated fonts in this family.','http://artwizaleczapka.sourceforge.net/',1531766861,1531427792,'GPLv2','Fedora Project','User Interface/X','buildhw-06.phx2.fedoraproject.org','artwiz-aleczapka-fonts-1.3-23.fc29.src.rpm',4504,7828,'Fedora Project',7944,0,124,'Packages/artwiz-aleczapka-fonts-1.3-23.fc29.noarch.rpm',NULL,'sha256');
This is a metapackage, which pulls in all the separated fonts in this family.','http://artwizaleczapka.sourceforge.net/',1531766861,1531427792,'GPLv2','Fedora Project','User Interface/X','buildhw-06.phx2.fedoraproject.org','artwiz-aleczapka-fonts-1.3-23.fc29.src.rpm',4504,7828,'Fedora Project',7944,0,124,'�)(�',NULL,'sha256');


Expected results:
For a clean sqlite file to be generated.

Additional info:
This was uncovered when using Pulp (2.17.1-1.el7) to mirror the Fedora Everything repository, and noticing that no repoview could be produced for that particular repo, initial thought it was a bad package, but after further digging narrowed it down to the above.

Happy to upload the generated sqlite files however they are large at 55Mb, so I'll wait till requested.

Comment 2 James Hartsock 2019-04-11 22:34:53 UTC
In local copy of rhel-7-server-rpms repository.  Copy the now Pasckages/[0-9a-z]/*.rpm to Packages/ directory.  This mimics what reposync left laying around when Red Hat recently added this subdirectory.

  # cd rhel-7-server-rpms/

  # ls -1 Packages/*.rpm
  ls: cannot access Packages/*.rpm: No such file or directory

  # cp -arp Packages/?/*.rpm Packages/.


So now with duplicates rpms (1 copy in Packages/blah.rpm and one in Packages/b/blah.rpm)
Note RC=0 indicates fail here,as grep has match. Seems to be ~33% fail rate for me (so may have try multiple times)

  # createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat rhel-7-server-rpms/
  # bunzip2 -k -c $(ls -1 rhel-7-server-rpms/repodata/*primary.sqlite.bz2) > /tmp/file.db
  # sqlite3 /tmp/file.db '.dump' > /tmp/file.dump
  # grep -axv '.*' /tmp/file.dump > /dev/null
  # echo $?
  0

And repoview originally reports fails too
  # repoview rhel-7-server-rpms/
  Examining repository...done
  Opening primary database...done
  Opening changelogs database...done
  Parsing comps.xml...done
  Examining state db...done
  Collecting letters...done
  Writing package ElectricFence.html
  ...
  Writing package compat-libical1.html
  Traceback (most recent call last):
    File "/bin/repoview", line 940, in <module>
      main()
    File "/bin/repoview", line 937, in main
      Repoview(opts)
    File "/bin/repoview", line 191, in __init__
      packages = self.do_packages(repo_data, group_data, pkgnames)
    File "/bin/repoview", line 544, in do_packages
      pkg_data = self.get_package_data(pkgname)
    File "/bin/repoview", line 442, in get_package_data
      rows = pcursor.fetchall()
  @�'ite3.OperationalError: Could not decode to UTF-8 column 'location_href' with text '�



Here is createrepo_c RPM version I am using
  # rpm -q createrepo_c
  createrepo_c-0.10.0-18.el7.x86_64

Comment 4 James Hartsock 2019-04-16 13:28:32 UTC
Even with Fedora's createrepo_c-0.12.2 this issue appears to be present.
In run below, can see invalid characters in 2nd and 4th to last fields...


Duplicate the Packages/blah.rpm & Packages/b/blah.rpm manually.
Then run createrepo_c 
NOTE: Have to have multiple workers, unable replicate with --workers=1
  # rpm -q createrepo_c
  createrepo_c-0.12.2-1.fc29.x86_64

  # cd rhel-7-server-rpms
  # cp -ap Packages/?/*.rpm Packages/.

  # createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat .
  08:06:16: Version: 0.12.2 (Features: DeltaRPM LegacyWeakdeps )
  08:06:16: Signal handler setup
  08:06:16: Thread pool ready
  08:06:16: Dir to scan: ./.repodata
  08:06:16: Dir to scan: ./Packages
  08:06:16: Dir to scan: ./repodata
  <snip>
  08:06:26: Memory cleanup
  08:06:26: All done


Now do some manual checks for non-ascii, which we find
  # bunzip2 -k -c $(ls -1 repodata/*primary.sqlite.bz2) > /tmp/file.db
  # sqlite3 /tmp/file.db '.dump' > /tmp/file.dump
  # grep -naxv '.*' /tmp/file.dump > /tmp/OUTPUT
  # echo $?
  0 <------------ rc=0 means have non ASCII here


Here are the three packages found, and fields with issue
  # awk -F, '{print $1,$3," ... ", $(NF-4), $(NF-2)}' /tmp/OUTPUT 
  107:INSERT INTO packages VALUES(102 'PackageKit-glib'  ...  491028 'P\��'
  5137:INSERT INTO packages VALUES(5132 'libkexiv2-devel'  ...  replace(' $\n��' char(10))
  6953:INSERT INTO packages VALUES(6948 'openldap'  ...  1037392 '(��'


Now do strace, to get the garbage chars
  # strace -ttTvf -s 4096 -o /tmp/strace repoview .
  <snip>
  sqlite3.OperationalError: Could not decode to UTF-8 column 'location_href' with text ' $
  ��'
  # echo $?
  1

  # grep 'Could not decode' /tmp/strace 
  12393 08:15:24.869960 write(2, "Could not decode to UTF-8 column 'location_href' with text ' $\n\220\317\177'", 67) = 67 <0.000004>


With the chars from strace, use printf to set var to them
Then grep /tmp/OUTPUT to show matches one of lines we found above (libkexiv2-devel in this case).
  # var=$(printf "%s\n" $'\n\220\317\177')
  # grep -a $var /tmp/OUTPUT  | awk -F, '{print $1,$3," ... ", $(NF-4), $(NF-2)}'
  5137:INSERT INTO packages VALUES(5132 'libkexiv2-devel'  ...  replace(' $\n��' char(10))

Comment 7 amatej 2019-04-23 09:03:18 UTC
I created PR for it here: https://github.com/rpm-software-management/createrepo_c/pull/151

Comment 20 errata-xmlrpc 2019-11-26 11:27:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3986


Note You need to log in before you can comment on or make changes to this bug.