Description of problem: Watchman on a few of our production nodes is struggling to stay alive. We have automation that restarts the process but this seems to be happening regularly. Upon further investigation I was able to determine the error: invalid byte sequence in UTF-8 (ArgumentError) This happens on line 109 in rhc-watchman when it runs the following command: File.open(@message_file).grep(/ killed as a result of limit of /).each {|msg| The .grep call actually throws an exception and causes the script to reach its max limit of exceptions rather quickly. Written to syslog: Oct 25 15:25:49 ex-std-node264 rhc-watchman[6498]: watchman caught #<ArgumentError: invalid byte sequence in UTF-8>: invalid byte sequence in UTF-8. Retries left: 0 Oct 25 15:28:09 ex-std-node264 rhc-watchman[21731]: Starting rhc-watchman => delay: 20s, exception threshold: 10 Oct 25 15:28:09 ex-std-node264 rhc-watchman[21736]: Starting throttler => throttle at: 30.00%, restore at: 70.00%, period: 120, check_interval: 5.00 Version-Release number of selected component (if applicable): Current release. How reproducible: This is very reproducible. Placing an invalid byte sequence in UTF-8 inside of /var/log/messages will cause watchman to die. Steps to Reproduce: 1. Write a invalid UTF-8 byte sequence to syslog. 2. Run rhc-watchman and watch syslog to see if you see the ArgumentError noted above. 3. Actual results: rhc-watchman dies after 10 failed subsequent attempts to read the file. Expected results: Should skip over the bad lines in the file. Additional info: Will provide rmillner with the corresponding logs.
Ruby has known issues comparing unicode read in from files against a regexp. Switching the file IO to binary mode makes the problem go away. Pull request: https://github.com/openshift/li/pull/2047
Commit pushed to master at https://github.com/openshift/li https://github.com/openshift/li/commit/242d8453e244838294622d04e394bbbd84d7bb80 Bug 1023576 - ruby 1.9 has trouble dealing with unicode strings comparing file input to a regexp.
Tested on devenv_3960, after echo some invalid string to /var/log/messages rhc-watchman still running. Move bug to verified.