Bug 1191681

Summary: watchman jboss plugin fails parsing invalid byte sequence in UTF-8
Product: OpenShift Online Reporter: Andy Grimm <agrimm>
Component: ContainersAssignee: Dan Mace <dmace>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.xCC: bmeng, jgoulding, jokerman, mmccomas
Target Milestone: ---   
Target Release: 2.x   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1202513 (view as bug list) Environment:
Last Closed: 2015-03-05 19:57:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1202513    

Description Andy Grimm 2015-02-11 17:59:29 UTC
Description of problem:

Watchman's jboss plugin fails with "invalid byte sequence in UTF-8" if a jboss log contains, ISO-8859-1 bytes which are not valid UTF-8 (such as \xe9).

Version-Release number of selected component (if applicable):
openshift-origin-node-util-1.33.4-1.el6oso.noarch

How reproducible:
Always

Steps to Reproduce:
1. create a jbossews app
2. echo -e '\xe9' >> ~/app-root/logs/jbossews.log

Actual results:
watchman will event print something like this to /var/log/messages:
Feb  9 17:41:38 ex-std-nodeXXX watchman[217144]: Unhandled exception (invalid byte sequence in UTF-8) from Watchman plugin #<JbossPlugin:0x00000002c277a0>: invalid byte sequence in UTF-8

Expected results:
watchman should ignore invalid characters in the log file.

Additional info:

This has been dealt with two different ways in the past.  It was first fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1023576 by treating the file as binary (opening it with 'rb').

When the code was refactored and moved into origin-server, the 'b' was lost, and the bug came up again as https://bugzilla.redhat.com/show_bug.cgi?id=1059804

This time, it was fixed by opening the file with "r:utf-8", but that only works as long as all of the byte sequences are valid utf-8, which we cannot control.  I'd suggest either going back to "rb", or doing something like:

File.open(log, 'r:utf-8').each_line do |event|
    next unless event.valid_encoding? && event =~ / java.lang.OutOfMemoryError/
    ...

In my tests, this is about 15-20% more expensive than grep, but it works.

Comment 2 Meng Bo 2015-02-13 07:45:18 UTC
Checked on devenv_5430, after insert the invalid ISO-8859-1 string "\xe9" to the jboss log, the watchman will not report unhandled exception and still works well on the existing features.

Move bug to verified.