Hide Forgot
+++ This bug was initially created as a clone of Bug #730952 +++ The bug happens when a collating symbol exists in the current locale that is composed of the same character multiple times (e.g. aa in nb_NO locales). Attachment 537445 [details] is a zip file with reproducers from Terje Braten. In this case, you have something like this: %fourier-alt-itaalic -s -0.168exnansi 0 1 012345678901234^ with cur_idx pointing to the "a" at &mctx->input.mbs[15], which is also the last character (valid_len = 16). Bytes after the first "a" are leftovers from previous matching attempts. "aa" is a multicharacter collation element in the bokmal locale, so re_string_elem_size_at returns 2 and check_node_accept_bytes matches 2 bytes even though there is only one byte in the string. clean_state_log_if_needed then accesses one item past the allocated memory. I haven't tested the reproducer on RHEL5/6, but the out-of-bounds access is clear and the code has been mostly unchanged for years; attachment 537575 [details] should apply more or less to all even not-so-recent glibc versions.
No automated test went in upstream that I'm aware of. In general, it looks like Andreas was very very lax in submitting regression tests upstream.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: No Documentation Needed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0763.html