Bug 658296

Summary: invalid byte sequence for encoding "UTF8"
Product: [Community] Spacewalk Reporter: Luc de Louw <luc>
Component: ServerAssignee: Michael Mráka <mmraka>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: low Docs Contact:
Priority: low    
Version: 1.3CC: jpazdziora, pierre.casenove
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-22 16:47:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 723481    

Description Luc de Louw 2010-11-29 21:59:20 UTC
Description of problem:
When importing packages from a external yum repo, some packaged are failing


Version-Release number of selected component (if applicable):
Spacewalk 1.2 and 1.3 nightly

How reproducible:
Always


Steps to Reproduce:
1. spacewalk-repo-sync --channel centos5-x86_64 -u http://mirror.switch.ch/ftp/mirror/centos/5.5/os/x86_64/ (or any other mirror)
  
Actual results:
[root@spacewalk-nightly-f14 ~]# spacewalk-repo-sync --channel centos5-x86_64 -u http://mirror.switch.ch/ftp/mirror/centos/5.5/os/x86_64/
Repo http://mirror.switch.ch/ftp/mirror/centos/5.5/os/x86_64/ has 3434 packages.
1/2 : aspell-is-0.51.1-2.2.2-50.x86_64
invalid byte sequence for encoding "UTF8": 0xed736c
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

2/2 : man-pages-da-0.1.1-12.1.1-0.noarch
invalid byte sequence for encoding "UTF8": 0xe6736d
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

Sync complete
[root@spacewalk-nightly-f14 ~]# 

Expected results:
No errores

Additional info:

Comment 1 Luc de Louw 2010-11-29 22:00:20 UTC
Seems to be bound to PostgreSQL, as Spacewalk 1.2 with Oracle XE works.

Comment 2 Jan Pazdziora 2011-03-23 09:17:41 UTC
The file path in the rpm is not valid UTF-8:

# rpm -qlp /tmp/aspell-is-0.51.1-2.2.2.x86_64.rpm | grep slenska.alias | od -c
warning: /tmp/aspell-is-0.51.1-2.2.2.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 37017186
0000000   /   u   s   r   /   l   i   b   6   4   /   a   s   p   e   l
0000020   l   -   0   .   6   0   / 355   s   l   e   n   s   k   a   .
0000040   a   l   i   a   s  \n
0000046

# rpm -qlp aspell-is-0.51.1-2.2.2.x86_64.rpm | iconv -f iso-8859-1 -t utf8
warning: aspell-is-0.51.1-2.2.2.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID e8562897
/usr/lib64/aspell-0.60/icelandic.alias
/usr/lib64/aspell-0.60/is.dat
/usr/lib64/aspell-0.60/is.multi
/usr/lib64/aspell-0.60/is.rws
/usr/lib64/aspell-0.60/is_phonet.dat
/usr/lib64/aspell-0.60/íslenska.alias

Comment 3 Jan Pazdziora 2011-03-23 09:19:10 UTC
But create repo seems to handle this gracefully, by taking the string as being in ISO-8859-1 and converting to UTF-8:

# createrepo . ; zcat repodata/filelists.xml.gz  | xmllint -format -
1/1 - aspell-is-0.51.1-2.2.2.x86_64.rpm                                         
Saving Primary metadata
Saving file lists metadata
Saving other metadata
<?xml version="1.0" encoding="UTF-8"?>
<filelists xmlns="http://linux.duke.edu/metadata/filelists" packages="1">
  <package pkgid="e2952ace8a45a4c5c33219c7d13d5dcc205aa7be" name="aspell-is" arch="x86_64">
    <version epoch="50" ver="0.51.1" rel="2.2.2"/>
    <file>/usr/lib64/aspell-0.60/icelandic.alias</file>
    <file>/usr/lib64/aspell-0.60/is.dat</file>
    <file>/usr/lib64/aspell-0.60/is.multi</file>
    <file>/usr/lib64/aspell-0.60/is.rws</file>
    <file>/usr/lib64/aspell-0.60/is_phonet.dat</file>
    <file>/usr/lib64/aspell-0.60/íslenska.alias</file>
  </package>
</filelists>

Comment 4 Jan Pazdziora 2011-03-23 09:21:42 UTC
The same problem (which is quite expected) when the rpm is merely rhnpushed:

# rhnpush --server localhost -c bz658296 aspell-is-0.51.1-2.2.2.x86_64.rpm

The traceback is then

Exception Handler Information
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/spacewalk/server/apacheUploadServer.py", line 97, in _wrapper
    ret = function(req)
  File "/usr/share/rhn/upload_server/handlers/package_push/package_push.py", line 134, in handler
    relative_path=self.rel_package_path, org_id=self.org_id)
  File "/usr/lib/python2.4/site-packages/spacewalk/server/rhnPackageUpload.py", line 164, in push_package
    importer.run()
  File "/usr/lib/python2.4/site-packages/spacewalk/server/importlib/importLib.py", line 619, in run
    self.fix()
  File "/usr/lib/python2.4/site-packages/spacewalk/server/importlib/packageImport.py", line 269, in fix
    self.backend.processCapabilities(self.capabilities)
  File "/usr/lib/python2.4/site-packages/spacewalk/server/importlib/backend.py", line 100, in processCapabilities
    nullStatement.execute(name=name)
  File "/usr/lib/python2.4/site-packages/spacewalk/server/rhnSQL/sql_base.py", line 163, in execute
    return apply(self._execute_wrapper, (self._execute, ) + p, kw)
  File "/usr/lib/python2.4/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 263, in _execute_wrapper
    retval = apply(function, p, kw)
  File "/usr/lib/python2.4/site-packages/spacewalk/server/rhnSQL/sql_base.py", line 217, in _execute
    return self._execute_(args, kwargs)
  File "/usr/lib/python2.4/site-packages/spacewalk/server/rhnSQL/driver_postgresql.py", line 276, in _execute_
    self._real_cursor.execute(self.sql, params)
DataError: invalid byte sequence for encoding "UTF8": 0xed736c
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

Comment 5 Jan Pazdziora 2011-03-23 09:22:31 UTC
We might consider doing the "check if it's UTF-8, convert from ISO-8859-1 to UTF-8 if it's not" in our backend code as well.

I think we already do it for changelogs ...

Comment 6 Miroslav Suchý 2011-04-11 07:32:12 UTC
We did not have time for this one during Spacewalk 1.4 time frame. Mass moving to Spacewalk 1.5.

Comment 7 Miroslav Suchý 2011-04-11 07:36:41 UTC
We did not have time for this one during Spacewalk 1.4 time frame. Mass moving to Spacewalk 1.5.

Comment 8 Jan Pazdziora 2011-07-20 11:49:58 UTC
Aligning under space16.

Comment 9 pierre.casenove 2011-07-22 12:39:31 UTC
This is partially correctly in spacewalk 1.5:
If the file path is incorrectly encoded, it will be corrected (for exemple for aspell-is).
But the problem remains for the copyright field, which is not corrected.
In file /usr/lib/python2.4/site-packages/spacewalk/server/importlib/packageImport.py, line 218, I've donne this modification:

    218         # Change copyright to license
    219         # XXX
    220         license_fixed = self._fix_encoding(package['license'])
    221         package['copyright'] = license_fixed
    222         #package['copyright'] = package['license']

Comment 10 Jan Pazdziora 2011-07-25 18:15:47 UTC
Thank you, I have applied the copyright encoding fix to Spacewalk master as 388fe1a11160090fa5c39080dfd09bbd826f7a81.

Together with previous commits, this should address the issue for any fields where we've seen non-US-ASCII characters.

Comment 11 Milan Zázrivec 2011-12-22 16:47:40 UTC
Spacewalk 1.6 has been released.