Bug 1191681 - watchman jboss plugin fails parsing invalid byte sequence in UTF-8
Summary: watchman jboss plugin fails parsing invalid byte sequence in UTF-8
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 1.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 2.x
Assignee: Dan Mace
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1202513
TreeView+ depends on / blocked
 
Reported: 2015-02-11 17:59 UTC by Andy Grimm
Modified: 2016-11-08 03:48 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1202513 (view as bug list)
Environment:
Last Closed: 2015-03-05 19:57:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andy Grimm 2015-02-11 17:59:29 UTC
Description of problem:

Watchman's jboss plugin fails with "invalid byte sequence in UTF-8" if a jboss log contains, ISO-8859-1 bytes which are not valid UTF-8 (such as \xe9).

Version-Release number of selected component (if applicable):
openshift-origin-node-util-1.33.4-1.el6oso.noarch

How reproducible:
Always

Steps to Reproduce:
1. create a jbossews app
2. echo -e '\xe9' >> ~/app-root/logs/jbossews.log

Actual results:
watchman will event print something like this to /var/log/messages:
Feb  9 17:41:38 ex-std-nodeXXX watchman[217144]: Unhandled exception (invalid byte sequence in UTF-8) from Watchman plugin #<JbossPlugin:0x00000002c277a0>: invalid byte sequence in UTF-8

Expected results:
watchman should ignore invalid characters in the log file.

Additional info:

This has been dealt with two different ways in the past.  It was first fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1023576 by treating the file as binary (opening it with 'rb').

When the code was refactored and moved into origin-server, the 'b' was lost, and the bug came up again as https://bugzilla.redhat.com/show_bug.cgi?id=1059804

This time, it was fixed by opening the file with "r:utf-8", but that only works as long as all of the byte sequences are valid utf-8, which we cannot control.  I'd suggest either going back to "rb", or doing something like:

File.open(log, 'r:utf-8').each_line do |event|
    next unless event.valid_encoding? && event =~ / java.lang.OutOfMemoryError/
    ...

In my tests, this is about 15-20% more expensive than grep, but it works.

Comment 2 Meng Bo 2015-02-13 07:45:18 UTC
Checked on devenv_5430, after insert the invalid ISO-8859-1 string "\xe9" to the jboss log, the watchman will not report unhandled exception and still works well on the existing features.

Move bug to verified.


Note You need to log in before you can comment on or make changes to this bug.