Bug 1451724 - glusterfind pre crashes with "UnicodeDecodeError: 'utf8' codec can't decode" error when the `--no-encode` is used
Summary: glusterfind pre crashes with "UnicodeDecodeError: 'utf8' codec can't decode" ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterfind
Version: mainline
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
Assignee: Aravinda VK
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On: 1448334
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-17 11:26 UTC by Aravinda VK
Modified: 2017-09-05 17:30 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.12.0
Clone Of: 1448334
Environment:
Last Closed: 2017-09-05 17:30:44 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Aravinda VK 2017-05-17 11:26:20 UTC
+++ This bug was initially created as a clone of Bug #1448334 +++

Description of problem:

glusterfind pre command crashes with below backtrace when it is used with "no-encode" option for Russian filenames. 

~~~~~~~~
utf_8.py:16:decode:UnicodeDecodeError: 'utf8' codec can't decode byte 0xf0 in position 19: invalid continuation byte

Traceback (most recent call last):
  File "/usr/libexec/glusterfs/glusterfind/changelog.py", line 402, in <module>
    actual_end = changelog_crawl(args.brick, start, end, args)
  File "/usr/libexec/glusterfs/glusterfind/changelog.py", line 345, in changelog_crawl
    return get_changes(brick, working_dir, log_file, start, end, args)
  File "/usr/libexec/glusterfs/glusterfind/changelog.py", line 296, in get_changes
    parse_changelog_to_db(changelog_data, change, args)
  File "/usr/libexec/glusterfs/glusterfind/changelog.py", line 223, in parse_changelog_to_db
    changelog_data.when_create_mknod_mkdir(changelogfile, data)
  File "/usr/libexec/glusterfs/glusterfind/changelogdata.py", line 333, in when_create_mknod_mkdir
    bn1 = bn1.decode("utf-8").strip()
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf0 in position 19: invalid continuation byte

Local variables in innermost frame:
input: '17217523_\xf0\xe5\xe7.csv'
errors: 'strict'

~~~~~~~~

Comment 1 Worker Ant 2017-05-17 11:32:28 UTC
REVIEW: https://review.gluster.org/17317 (tools/glusterfind: Python 2 to Python 3) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 2 Worker Ant 2017-07-03 09:35:36 UTC
REVIEW: https://review.gluster.org/17674 (features/changelog: Fix encoding to encode only space and newline) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 3 Worker Ant 2017-07-03 15:03:18 UTC
REVIEW: https://review.gluster.org/17674 (features/changelog: Fix encoding to encode only space and newline) posted (#2) for review on master by Aravinda VK (avishwan)

Comment 4 Worker Ant 2017-07-17 11:23:14 UTC
REVIEW: https://review.gluster.org/17674 (features/libgfchangelog: Fix encoding to encode only space and newline) posted (#3) for review on master by Aravinda VK (avishwan)

Comment 5 Worker Ant 2017-07-17 11:26:50 UTC
REVIEW: https://review.gluster.org/17787 (geo-rep: Fix changelog encoding to encode only space and newline) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 6 Worker Ant 2017-07-17 11:30:01 UTC
REVIEW: https://review.gluster.org/17788 (tools/glusterfind: Fix encoding to encode only space and newline) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 7 Worker Ant 2017-07-19 06:41:30 UTC
REVIEW: https://review.gluster.org/17674 (features/libgfchangelog: Fix encoding to encode only space and newline) posted (#4) for review on master by Aravinda VK (avishwan)

Comment 8 Worker Ant 2017-07-19 06:47:39 UTC
REVIEW: https://review.gluster.org/17787 (geo-rep: Fix changelog encoding to encode only space and newline) posted (#2) for review on master by Aravinda VK (avishwan)

Comment 9 Worker Ant 2017-07-19 06:53:29 UTC
REVIEW: https://review.gluster.org/17788 (tools/glusterfind: Fix encoding to encode only space and newline) posted (#2) for review on master by Aravinda VK (avishwan)

Comment 10 Worker Ant 2017-07-21 04:44:04 UTC
COMMIT: https://review.gluster.org/17674 committed in master by Aravinda VK (avishwan) 
------
commit 5353389faf77bb2edb54e785c3d8aca323188dad
Author: Aravinda VK <avishwan>
Date:   Mon Jul 3 14:51:21 2017 +0530

    features/libgfchangelog: Fix encoding to encode only space and newline
    
    libgfchangelog was encoding path using spec rfc3986, but encoding only
    required for SPACE and NEWLINE chars since the NEWLINE char is used as
    record separator and SPACE as field separator in the parsed changelogs
    output.
    
    Changed the encoding function to encode only SPACE and NEWLINE.
    
    BUG: 1451724
    Change-Id: I4305459aab9e710517dd3eb065f0024503064b77
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: https://review.gluster.org/17674
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Prashanth Pai <ppai>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Kotresh HR <khiremat>

Comment 11 Worker Ant 2017-07-21 04:44:34 UTC
COMMIT: https://review.gluster.org/17787 committed in master by Aravinda VK (avishwan) 
------
commit e01783d871fbbf5a598d3bbf984ea98bafa5c10f
Author: Aravinda VK <avishwan>
Date:   Mon Jul 3 14:51:21 2017 +0530

    geo-rep: Fix changelog encoding to encode only space and newline
    
    libgfchangelog was encoding path using spec rfc3986, but encoding only
    required for SPACE and NEWLINE chars since the NEWLINE char is used as
    record separator and SPACE as field separator in the parsed changelogs
    output.
    
    Changed the encoding function to encode only SPACE and NEWLINE.
    
    BUG: 1451724
    Change-Id: I1936efad31788a9e636f912c832ed7d7efea4fe2
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: https://review.gluster.org/17787
    Reviewed-by: Prashanth Pai <ppai>
    Reviewed-by: Kotresh HR <khiremat>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>

Comment 12 Worker Ant 2017-07-21 05:02:37 UTC
REVIEW: https://review.gluster.org/17788 (tools/glusterfind: Fix encoding to encode only space,newline and percent chars) posted (#3) for review on master by Aravinda VK (avishwan)

Comment 13 Worker Ant 2017-07-21 08:41:17 UTC
COMMIT: https://review.gluster.org/17788 committed in master by Aravinda VK (avishwan) 
------
commit df85ed48e5e94449cdcc77de3b86e10ccea49f1e
Author: Aravinda VK <avishwan>
Date:   Mon Jul 3 14:51:21 2017 +0530

    tools/glusterfind: Fix encoding to encode only space,newline and percent chars
    
    libgfchangelog was encoding path using spec rfc3986, but encoding only
    required for SPACE, NEWLINE and PERCENT chars since the NEWLINE char is
    used as record separator and SPACE as field separator in the parsed
    changelogs output.
    
    Changed the encoding function to encode only SPACE, NEWLINE and PERCENT chars
    
    BUG: 1451724
    Change-Id: Ic1dea824d23493dedcf3db45f353f90572f4e046
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: https://review.gluster.org/17788
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Milind Changire <mchangir>

Comment 14 Shyamsundar 2017-09-05 17:30:44 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.