Description of problem: $ /usr/bin/pdftk <source> output <destination> encrypt_128bit owner_pw <password> compress allow Printing DegradedPrinting Unhandled Java Exception: java.lang.ArrayIndexOutOfBoundsException: 5 at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.11) at java.text.SimpleDateFormat.format(libgcj.so.11) at java.text.DateFormat.format(libgcj.so.11) at com.lowagie.text.Document.addCreationDate(itext-2.1.7.jar.so) at com.lowagie.text.pdf.PdfDocument.<init>(itext-2.1.7.jar.so) at com.lowagie.text.pdf.PdfStamperImp.<init>(itext-2.1.7.jar.so) $ locale LANG=de_AT.UTF-8 LC_CTYPE="de_AT.UTF-8" LC_NUMERIC="de_AT.UTF-8" LC_TIME="de_AT.UTF-8" LC_COLLATE="de_AT.UTF-8" LC_MONETARY="de_AT.UTF-8" LC_MESSAGES="de_AT.UTF-8" LC_PAPER="de_AT.UTF-8" LC_NAME="de_AT.UTF-8" LC_ADDRESS="de_AT.UTF-8" LC_TELEPHONE="de_AT.UTF-8" LC_MEASUREMENT="de_AT.UTF-8" LC_IDENTIFICATION="de_AT.UTF-8" LC_ALL= Version-Release number of selected component (if applicable): libgcj-4.5.1-4.fc14.i686 itext-2.1.7-6.fc13.i686 pdftk-1.41-27.fc14.i686 How reproducible: Always reproducible Steps to Reproduce: 1./usr/bin/pdftk <source> output <destination> encrypt_128bit owner_pw <password> compress allow Printing DegradedPrinting Actual results: Unhandled Java Exception: java.lang.ArrayIndexOutOfBoundsException: 5 at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.11) at java.text.SimpleDateFormat.format(libgcj.so.11) at java.text.DateFormat.format(libgcj.so.11) at com.lowagie.text.Document.addCreationDate(itext-2.1.7.jar.so) at com.lowagie.text.pdf.PdfDocument.<init>(itext-2.1.7.jar.so) at com.lowagie.text.pdf.PdfStamperImp.<init>(itext-2.1.7.jar.so) Expected results: it should work Additional info: Bug already resolved in other distributions: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=560594 https://bugs.launchpad.net/ubuntu/+source/pdftk/+bug/487922
Also affects Fedora 15 libgcj-4.6.0-9.fc15.i686 itext-2.1.7-7.fc15.i686 pdftk-1.44-3.fc15.i686 Unhandled Java Exception: java.lang.ArrayIndexOutOfBoundsException: 5 at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.12) at java.text.SimpleDateFormat.format(libgcj.so.12) at java.text.DateFormat.format(libgcj.so.12) at com.lowagie.text.Document.addCreationDate(itext-2.1.7.jar.so) at com.lowagie.text.pdf.PdfDocument.<init>(itext-2.1.7.jar.so) at com.lowagie.text.pdf.PdfStamperImp.<init>(itext-2.1.7.jar.so)
It's a libgcj -> gcc bug. pdftk just expose it and the patch is just a workaround for allowing pdftk do its work.
It would be helpful if you could provide a self-contained reproducer, it might very well be an itext bug too.
I think you could use this: https://bugs.launchpad.net/ubuntu/+source/pdftk/+bug/779908/+attachment/2146700/+files/TestDateFormat.java $ gcj -C TestDateFormat.java $ LANG=it_IT gij TestDateFormat gio giu 09 16:37:28 CEST 2011 $ LANG=de_DE gij TestDateFormat Do. 09 16:36:14 GMT+02:00 2011 $ LANG=de_AT gij TestDateFormat Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5 at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.12) at java.text.SimpleDateFormat.format(libgcj.so.12) at java.text.DateFormat.format(libgcj.so.12) at TestDateFormat.main(TestDateFormat.java:12)
I wondering, that this should happens on F15. pdftk-1.44 contains the workaround which was suggested by debian which is refer by this bug report.
pdftk-1.41-28.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/pdftk-1.41-28.fc14
If may be nice, if you can tryout pdftk-1.41-28 from updates-testing. this release should contains the workaround created by debian.
(In reply to comment #6) > pdftk-1.41-28.fc14 has been submitted as an update for Fedora 14. > https://admin.fedoraproject.org/updates/pdftk-1.41-28.fc14 Tested, not working. Unhandled Java Exception: java.lang.ArrayIndexOutOfBoundsException: 5 at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.11) at java.text.SimpleDateFormat.format(libgcj.so.11) at java.text.DateFormat.format(libgcj.so.11) at com.lowagie.text.Document.addCreationDate(itext-2.1.7.jar.so) at com.lowagie.text.pdf.PdfDocument.<init>(itext-2.1.7.jar.so) at com.lowagie.text.pdf.PdfStamperImp.<init>(itext-2.1.7.jar.so)
Andrew, please see that TestDateFormat.java testcase which fails the same: import java.io.IOException; import java.text.SimpleDateFormat; import java.util.Date; public class TestDateFormat { public static void main(String[] args) throws IOException { SimpleDateFormat sdf = new SimpleDateFormat("EEE MMM dd HH:mm:ss zzz yyyy"); System.out.println(sdf.format(new Date())); } } [jakub@xxx tmp]$ gcj -fmain=TestDateFormat -o TestDateFormat{,.java} [jakub@xxx tmp]$ LC_ALL=C ./TestDateFormat Mon Jun 13 11:18:27 GMT+02:00 2011 [jakub@xxx tmp]$ LC_ALL=de_DE.UTF-8 ./TestDateFormat Mo. 13 11:18:32 GMT+02:00 2011 [jakub@xxx tmp]$ LC_ALL=de_AT.UTF-8 ./TestDateFormat Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5 at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.12) at java.text.SimpleDateFormat.format(libgcj.so.12) at java.text.DateFormat.format(libgcj.so.12) at TestDateFormat.main(TestDateFormat) [jakub@xxx tmp]$ LC_ALL=de_DE.UTF-8 locale -k LC_TIME | grep abmon abmon="Jan;Feb;Mär;Apr;Mai;Jun;Jul;Aug;Sep;Okt;Nov;Dez" [jakub@xxx tmp]$ LC_ALL=de_AT.UTF-8 locale -k LC_TIME | grep abmon abmon="Jän;Feb;Mär;Apr;Mai;Jun;Jul;Aug;Sep;Okt;Nov;Dez" [jakub@xxx tmp]$ LC_ALL=C locale -k LC_TIME | grep abmon abmon="Jan;Feb;Mar;Apr;May;Jun;Jul;Aug;Sep;Oct;Nov;Dec" For de_DE.UTF-8 locale it surprisingly doesn't print any month at all, for de_AT.UTF-8 it crashes, for e.g. cs_CZ.UTF-8 it prints 6. instead of month name (that is also surprising, because abmon is "čen" (i.e. 3 letters). But both de_DE and de_AT use "Jun", i.e. 3 letters exactly as C or en_US.UTF-8 (unless Java has a different locale data from libc).
pdftk-1.41-28.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report.
(In reply to comment #10) > pdftk-1.41-28.fc14 has been pushed to the Fedora 14 stable repository. If > problems still persist, please make note of it in this bug report. As said in comment #8 is not working but with another error code (maybe because month changed): Unhandled Java Exception: java.lang.ArrayIndexOutOfBoundsException: 6 at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.11) at java.text.SimpleDateFormat.format(libgcj.so.11) at java.text.DateFormat.format(libgcj.so.11) at com.lowagie.text.Document.addCreationDate(itext-2.1.7.jar.so) at com.lowagie.text.pdf.PdfDocument.<init>(itext-2.1.7.jar.so) at com.lowagie.text.pdf.PdfStamperImp.<init>(itext-2.1.7.jar.so)
Created attachment 516911 [details] LocaleInformation_de_AT.properties
The core problem is that the local properties file is corrupt. We need to regenerate the LocaleInformation_de_AT.properties file without the corruption.
gcj issue.
Taking.
It's not that the locale file is corrupt. It accurately represents the CLDR data for that locale which only specifies short months for January and March. The problem is that the code for DateFormatSymbols doesn't account for the possibility of empty data. I think an easier fix would be to make DateFormatSymbols search up the hierarchy rather than trying to re-write the locale data generator extensively and introduce lots of duplicate data in every properties file. At the moment, it works on the basis of single CLDR files and would need to be changed to understand the locale hierarchy, something which is already built into the Locale object used by DFS.
I've added a sanity check for this data to Mauve: http://sourceware.org/ml/mauve-patches/2012/msg00000.html http://sourceware.org/cgi-bin/cvsweb.cgi/mauve/gnu/testlet/java/text/DateFormatSymbols/SanityCheck.java?cvsroot=mauve This checks that the arrays are the right size and contain non-null non-empty strings (except where empty strings are allowed; weekday[0] and month[12]). With gcj 4.6.2, we have 783 failures out of 29216 tests. With the patch I'm about to post to GNU Classpath applied, this reduces to 235 out of 30456. The test increase is due to some arrays now being their correct larger size. For example, the de_AT short month array now correctly has 13 elements. For reference, OpenJDK has 0 failures out of 14288 tests (they have much less locale data). I should be able to get our failure rate down further by fixing further bugs in the locale data. The generator is currently picking up the wrong incomplete set of month & week names for locales like be and cy. I'll attach the initial fix here once committed. It makes the following changes: 1. The data arrays are pre-populated with empty strings to the right size and then filled from the locale data. 2. The locale data is parsed, such that it doesn't throw away trailing empty fields (this was more necessary before I made change 1). 3. We now search locales further up the hierarchy. de_AT only provides values for January and March. This is why we end up with a three element array. This is intentional in the CLDR data, where locales should inherit data from further up. So the de_AT data should be composed of de_AT -> de -> ROOT, not just de_AT. 4. Similarly, the CLDR spec. (http://www.unicode.org/reports/tr35/tr35-10.html#Date_Elements) specifies "sideways" inheritance for the month and day names i.e. if there is no short name, the long name should be used. We now also do this. That's sufficient to avoid this bug but further patches will be needed to make the data correct. For example, de_AT now has a complete set of month names but some of them are incorrect as the locale generator includes data from the "stand-alone" context rather than the "format" context (again, see http://www.unicode.org/reports/tr35/tr35-10.html#Date_Elements). The former is for headings, while the latter is the one for date formatting that we need. I imagine this is because it uses the last one parsed rather than specifically choosing stand-alone. Anyway, this can be fixed, but it means regenerating the data which I'll do in a separate patch. The data also needs updating so that it occurs sorted (allowing for readable diffs) and omits the trailing separator (currently handled as a hack in the DateFormatSymbols class). I'll post all patches here, and leave it up to the packager as to which ones to include in the RPM.
Created attachment 552456 [details] Part 1 of fix (minimum required) With this patch, we have: de_AT: Short months=[Jän, Februar, Mär, April, Mai, Juni, Jul, Aug, Sep, Okt, Nov, Dez, ] Note that 2, 4 and 6 are wrong (should be Feb, Apr and Jun). This is because the incomplete stand-alone short name data is used, leaving the empty fields to be filled by the long names. This will be fixed with updated locale data.
I've now patched everything (I hope!) http://developer.classpath.org/pipermail/classpath-patches/2012-January/006635.html has most of the patches except the final one I just need to commit. I'll post them all here later today. What's the route for getting this into Fedora?
Any news?
The issue is fixed in full in GNU Classpath now. The changes need to be merged over to gcj. I was hoping we'd be able to get a Classpath release out and use that as a point to do a merge with gcj, but problems arose (doc. generation failures) and it didn't happen. Jakub said he would just backport from gcj for Fedora. I can commit just these patches to a specific gcj branch if that would be quicker. I just need pointers as to which one to get the ball rolling.
Created attachment 568457 [details] Part 2 of 5; sort locale resource files for later updates
Created attachment 568458 [details] Part 3 of 5; Use the main approved value for properties.
Created attachment 568459 [details] Part 4 of 5; Use the 'format' context type for months and days.
Created attachment 568460 [details] Part 5 of 5; update locale data without trailing separator
The attached patches complete the set. At least 1-4 should be completed to get a full fix. 5 is optional, getting rid of the trailing separator in the locale data rather than removing it using a substring call in DateFormatSymbols.
This patch is posted of the merge posted for review here: http://gcc.gnu.org/ml/java-patches/2012-q1/msg00063.html It's pretty huge though, you won't want to backport the whole thing. The patches posted here are enough to fix the bug.
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=186487 Should be in gcc-4.7.0-2, untested.
gcc-4.7.0-2.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/gcc-4.7.0-2.fc17
Package gcc-4.7.0-2.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing gcc-4.7.0-2.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-5975/gcc-4.7.0-2.fc17 then log in and leave karma (feedback).
gcc-4.7.0-2.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
pdftk-1.44-9.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/pdftk-1.44-9.fc17
pdftk-1.44-9.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.