Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1696808

Summary: createrepo_c generates corrupted sqlite files
Product: Red Hat Enterprise Linux 7 Reporter: Douglas Furlong <bugzilla_rhn>
Component: createrepo_cAssignee: amatej
Status: CLOSED ERRATA QA Contact: Eva Mrakova <emrakova>
Severity: urgent Docs Contact:
Priority: medium    
Version: 7.6CC: amatej, dmach, hartsjc, jcastran, mdomonko, mpoole, pkratoch, rcadova, tbowling, toneata
Target Milestone: rcKeywords: Extras, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: createrepo_c-0.10.0-20.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-26 11:27:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Douglas Furlong 2019-04-05 16:54:05 UTC
Description of problem: 
createrepo_c at times creates corrupted primary.sqlite.bz2 files

Version-Release number of selected component (if applicable):
Version     : 0.10.0
Release     : 18.el7
Architecture: x86_64

How reproducible:
Frequent

Steps to Reproduce:
This is how I've been able to reproduce
run createrepo_c with the below switches against a mirror of the Fedora Everything repository.

createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat 

I've managed to get it to generate a corrupted sqlite file within 3 runs.

Output from the three runs of createrepo_c
[root@cfm7803-pulp doug]# createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat /var/lib/pulp/doug/
Directory walk started
Directory walk done - 116414 packages
Loaded information about 0 packages
Temporary output repo path: /var/lib/pulp/doug/.repodata/
Preparing sqlite DBs
Pool started (with 5 workers)
Pool finished
[root@cfm7803-pulp doug]# createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat /var/lib/pulp/doug/                                                                                                                                                                                                                                                                                                         
Directory walk started
Directory walk done - 116414 packages
Loaded information about 58207 packages
Temporary output repo path: /var/lib/pulp/doug/.repodata/
Preparing sqlite DBs
Pool started (with 5 workers)
Pool finished
[root@cfm7803-pulp doug]# createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat /var/lib/pulp/doug/
Directory walk started
Directory walk done - 116414 packages
Loaded information about 58207 packages
Temporary output repo path: /var/lib/pulp/doug/.repodata/
Preparing sqlite DBs
Pool started (with 5 workers)
Pool finished


Actual results:
After running createrepo_c I run repoview against the repo, which fails with the following error.

Traceback (most recent call last):  
  File "/usr/bin/repoview", line 940, in <module>    
    main()  
  File "/usr/bin/repoview", line 937, in main    
    Repoview(opts)  
  File "/usr/bin/repoview", line 191, in __init__    
    packages = self.do_packages(repo_data, group_data, pkgnames)  
  File "/usr/bin/repoview", line 544, in do_packages    
    pkg_data = self.get_package_data(pkgname)  
  File "/usr/bin/repoview", line 442, in get_package_data    
    rows = pcursor.fetchall()
sqlite3.OperationalError: Could not decode to UTF-8 column \'location_href\' with text \'\xc0\x9f\'\n', 7760, 7761, 'ordinal not in range(128)')

Reviewing the python code, this is the reference to the file location within the repo.

After dumping the content of the sqlitedb, I can see entries along the lines of the below.

grep artwiz-aleczapka-fonts-1.3 /tmp/dumped_data_pulp 
This is a metapackage, which pulls in all the separated fonts in this family.','http://artwizaleczapka.sourceforge.net/',1531766861,1531427792,'GPLv2','Fedora Project','User Interface/X','buildhw-06.phx2.fedoraproject.org','artwiz-aleczapka-fonts-1.3-23.fc29.src.rpm',4504,7828,'Fedora Project',7944,0,124,'Packages/artwiz-aleczapka-fonts-1.3-23.fc29.noarch.rpm',NULL,'sha256');
This is a metapackage, which pulls in all the separated fonts in this family.','http://artwizaleczapka.sourceforge.net/',1531766861,1531427792,'GPLv2','Fedora Project','User Interface/X','buildhw-06.phx2.fedoraproject.org','artwiz-aleczapka-fonts-1.3-23.fc29.src.rpm',4504,7828,'Fedora Project',7944,0,124,'�)(�',NULL,'sha256');


Expected results:
For a clean sqlite file to be generated.

Additional info:
This was uncovered when using Pulp (2.17.1-1.el7) to mirror the Fedora Everything repository, and noticing that no repoview could be produced for that particular repo, initial thought it was a bad package, but after further digging narrowed it down to the above.

Happy to upload the generated sqlite files however they are large at 55Mb, so I'll wait till requested.

Comment 2 James Hartsock 2019-04-11 22:34:53 UTC
In local copy of rhel-7-server-rpms repository.  Copy the now Pasckages/[0-9a-z]/*.rpm to Packages/ directory.  This mimics what reposync left laying around when Red Hat recently added this subdirectory.

  # cd rhel-7-server-rpms/

  # ls -1 Packages/*.rpm
  ls: cannot access Packages/*.rpm: No such file or directory

  # cp -arp Packages/?/*.rpm Packages/.


So now with duplicates rpms (1 copy in Packages/blah.rpm and one in Packages/b/blah.rpm)
Note RC=0 indicates fail here,as grep has match. Seems to be ~33% fail rate for me (so may have try multiple times)

  # createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat rhel-7-server-rpms/
  # bunzip2 -k -c $(ls -1 rhel-7-server-rpms/repodata/*primary.sqlite.bz2) > /tmp/file.db
  # sqlite3 /tmp/file.db '.dump' > /tmp/file.dump
  # grep -axv '.*' /tmp/file.dump > /dev/null
  # echo $?
  0

And repoview originally reports fails too
  # repoview rhel-7-server-rpms/
  Examining repository...done
  Opening primary database...done
  Opening changelogs database...done
  Parsing comps.xml...done
  Examining state db...done
  Collecting letters...done
  Writing package ElectricFence.html
  ...
  Writing package compat-libical1.html
  Traceback (most recent call last):
    File "/bin/repoview", line 940, in <module>
      main()
    File "/bin/repoview", line 937, in main
      Repoview(opts)
    File "/bin/repoview", line 191, in __init__
      packages = self.do_packages(repo_data, group_data, pkgnames)
    File "/bin/repoview", line 544, in do_packages
      pkg_data = self.get_package_data(pkgname)
    File "/bin/repoview", line 442, in get_package_data
      rows = pcursor.fetchall()
  @�'ite3.OperationalError: Could not decode to UTF-8 column 'location_href' with text '�



Here is createrepo_c RPM version I am using
  # rpm -q createrepo_c
  createrepo_c-0.10.0-18.el7.x86_64

Comment 4 James Hartsock 2019-04-16 13:28:32 UTC
Even with Fedora's createrepo_c-0.12.2 this issue appears to be present.
In run below, can see invalid characters in 2nd and 4th to last fields...


Duplicate the Packages/blah.rpm & Packages/b/blah.rpm manually.
Then run createrepo_c 
NOTE: Have to have multiple workers, unable replicate with --workers=1
  # rpm -q createrepo_c
  createrepo_c-0.12.2-1.fc29.x86_64

  # cd rhel-7-server-rpms
  # cp -ap Packages/?/*.rpm Packages/.

  # createrepo_c -d --update --keep-all-metadata --local-sqlite -s sha256 --skip-stat .
  08:06:16: Version: 0.12.2 (Features: DeltaRPM LegacyWeakdeps )
  08:06:16: Signal handler setup
  08:06:16: Thread pool ready
  08:06:16: Dir to scan: ./.repodata
  08:06:16: Dir to scan: ./Packages
  08:06:16: Dir to scan: ./repodata
  <snip>
  08:06:26: Memory cleanup
  08:06:26: All done


Now do some manual checks for non-ascii, which we find
  # bunzip2 -k -c $(ls -1 repodata/*primary.sqlite.bz2) > /tmp/file.db
  # sqlite3 /tmp/file.db '.dump' > /tmp/file.dump
  # grep -naxv '.*' /tmp/file.dump > /tmp/OUTPUT
  # echo $?
  0 <------------ rc=0 means have non ASCII here


Here are the three packages found, and fields with issue
  # awk -F, '{print $1,$3," ... ", $(NF-4), $(NF-2)}' /tmp/OUTPUT 
  107:INSERT INTO packages VALUES(102 'PackageKit-glib'  ...  491028 'P\��'
  5137:INSERT INTO packages VALUES(5132 'libkexiv2-devel'  ...  replace(' $\n��' char(10))
  6953:INSERT INTO packages VALUES(6948 'openldap'  ...  1037392 '(��'


Now do strace, to get the garbage chars
  # strace -ttTvf -s 4096 -o /tmp/strace repoview .
  <snip>
  sqlite3.OperationalError: Could not decode to UTF-8 column 'location_href' with text ' $
  ��'
  # echo $?
  1

  # grep 'Could not decode' /tmp/strace 
  12393 08:15:24.869960 write(2, "Could not decode to UTF-8 column 'location_href' with text ' $\n\220\317\177'", 67) = 67 <0.000004>


With the chars from strace, use printf to set var to them
Then grep /tmp/OUTPUT to show matches one of lines we found above (libkexiv2-devel in this case).
  # var=$(printf "%s\n" $'\n\220\317\177')
  # grep -a $var /tmp/OUTPUT  | awk -F, '{print $1,$3," ... ", $(NF-4), $(NF-2)}'
  5137:INSERT INTO packages VALUES(5132 'libkexiv2-devel'  ...  replace(' $\n��' char(10))

Comment 7 amatej 2019-04-23 09:03:18 UTC
I created PR for it here: https://github.com/rpm-software-management/createrepo_c/pull/151

Comment 20 errata-xmlrpc 2019-11-26 11:27:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3986