This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 712013 - pdftk crashes with java.lang.ArrayIndexOutOfBoundsException
pdftk crashes with java.lang.ArrayIndexOutOfBoundsException
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: gcc (Show other bugs)
17
All Linux
unspecified Severity high
: ---
: ---
Assigned To: Jakub Jelinek
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-09 05:39 EDT by Sandro Bonazzola
Modified: 2012-04-28 21:02 EDT (History)
8 users (show)

See Also:
Fixed In Version: pdftk-1.44-9.fc17
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-04-18 19:04:25 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
LocaleInformation_de_AT.properties (579 bytes, text/plain)
2011-08-05 11:18 EDT, Andrew Haley
no flags Details
Part 1 of fix (minimum required) (4.78 KB, patch)
2012-01-12 13:03 EST, Andrew John Hughes
no flags Details | Diff
Part 2 of 5; sort locale resource files for later updates (900.30 KB, application/x-gzip)
2012-03-07 20:26 EST, Andrew John Hughes
no flags Details
Part 3 of 5; Use the main approved value for properties. (135.73 KB, application/x-gzip)
2012-03-07 20:28 EST, Andrew John Hughes
no flags Details
Part 4 of 5; Use the 'format' context type for months and days. (89.95 KB, application/x-gzip)
2012-03-07 20:30 EST, Andrew John Hughes
no flags Details
Part 5 of 5; update locale data without trailing separator (40.55 KB, application/x-gzip)
2012-03-07 20:30 EST, Andrew John Hughes
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 487922 None None None Never
Launchpad 779908 None None None Never
Debian BTS 560594 None None None Never

  None (edit)
Description Sandro Bonazzola 2011-06-09 05:39:06 EDT
Description of problem:

$ /usr/bin/pdftk <source> output <destination> encrypt_128bit owner_pw
<password> compress allow Printing DegradedPrinting
Unhandled Java Exception:
java.lang.ArrayIndexOutOfBoundsException: 5
   at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.11)
   at java.text.SimpleDateFormat.format(libgcj.so.11)
   at java.text.DateFormat.format(libgcj.so.11)
   at com.lowagie.text.Document.addCreationDate(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfDocument.<init>(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfStamperImp.<init>(itext-2.1.7.jar.so)

$ locale
LANG=de_AT.UTF-8
LC_CTYPE="de_AT.UTF-8"
LC_NUMERIC="de_AT.UTF-8"
LC_TIME="de_AT.UTF-8"
LC_COLLATE="de_AT.UTF-8"
LC_MONETARY="de_AT.UTF-8"
LC_MESSAGES="de_AT.UTF-8"
LC_PAPER="de_AT.UTF-8"
LC_NAME="de_AT.UTF-8"
LC_ADDRESS="de_AT.UTF-8"
LC_TELEPHONE="de_AT.UTF-8"
LC_MEASUREMENT="de_AT.UTF-8"
LC_IDENTIFICATION="de_AT.UTF-8"
LC_ALL=

Version-Release number of selected component (if applicable):
libgcj-4.5.1-4.fc14.i686
itext-2.1.7-6.fc13.i686
pdftk-1.41-27.fc14.i686

How reproducible:
Always reproducible

Steps to Reproduce:
1./usr/bin/pdftk <source> output <destination> encrypt_128bit owner_pw
<password> compress allow Printing DegradedPrinting

  
Actual results:
Unhandled Java Exception:
java.lang.ArrayIndexOutOfBoundsException: 5
   at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.11)
   at java.text.SimpleDateFormat.format(libgcj.so.11)
   at java.text.DateFormat.format(libgcj.so.11)
   at com.lowagie.text.Document.addCreationDate(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfDocument.<init>(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfStamperImp.<init>(itext-2.1.7.jar.so)

Expected results:
it should work

Additional info:

Bug already resolved in other distributions:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=560594
https://bugs.launchpad.net/ubuntu/+source/pdftk/+bug/487922
Comment 1 Sandro Bonazzola 2011-06-09 05:43:57 EDT
Also affects Fedora 15
libgcj-4.6.0-9.fc15.i686
itext-2.1.7-7.fc15.i686
pdftk-1.44-3.fc15.i686

Unhandled Java Exception:
java.lang.ArrayIndexOutOfBoundsException: 5
   at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.12)
   at java.text.SimpleDateFormat.format(libgcj.so.12)
   at java.text.DateFormat.format(libgcj.so.12)
   at com.lowagie.text.Document.addCreationDate(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfDocument.<init>(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfStamperImp.<init>(itext-2.1.7.jar.so)
Comment 2 Sandro Bonazzola 2011-06-09 07:48:51 EDT
It's a libgcj -> gcc bug.
pdftk just expose it and the patch is just a workaround for allowing pdftk do its work.
Comment 3 Jakub Jelinek 2011-06-09 08:14:13 EDT
It would be helpful if you could provide a self-contained reproducer, it might very well be an itext bug too.
Comment 4 Sandro Bonazzola 2011-06-09 10:37:21 EDT
I think you could use this:

https://bugs.launchpad.net/ubuntu/+source/pdftk/+bug/779908/+attachment/2146700/+files/TestDateFormat.java

$ gcj -C TestDateFormat.java
$ LANG=it_IT  gij TestDateFormat
gio giu 09 16:37:28 CEST 2011
$ LANG=de_DE gij TestDateFormat
Do.  09 16:36:14 GMT+02:00 2011
$ LANG=de_AT gij TestDateFormat
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5
   at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.12)
   at java.text.SimpleDateFormat.format(libgcj.so.12)
   at java.text.DateFormat.format(libgcj.so.12)
   at TestDateFormat.main(TestDateFormat.java:12)
Comment 5 Jochen Schmitt 2011-06-12 15:48:32 EDT
I wondering, that this should happens on F15. pdftk-1.44 contains the workaround which was suggested by debian which is refer by this bug report.
Comment 6 Fedora Update System 2011-06-12 16:09:53 EDT
pdftk-1.41-28.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/pdftk-1.41-28.fc14
Comment 7 Jochen Schmitt 2011-06-12 16:11:55 EDT
If may be nice, if you can tryout pdftk-1.41-28 from updates-testing. this release should contains the workaround created by debian.
Comment 8 Sandro Bonazzola 2011-06-13 03:18:13 EDT
(In reply to comment #6)
> pdftk-1.41-28.fc14 has been submitted as an update for Fedora 14.
> https://admin.fedoraproject.org/updates/pdftk-1.41-28.fc14

Tested, not working.
Unhandled Java Exception:
java.lang.ArrayIndexOutOfBoundsException: 5
   at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.11)
   at java.text.SimpleDateFormat.format(libgcj.so.11)
   at java.text.DateFormat.format(libgcj.so.11)
   at com.lowagie.text.Document.addCreationDate(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfDocument.<init>(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfStamperImp.<init>(itext-2.1.7.jar.so)
Comment 9 Jakub Jelinek 2011-06-13 05:26:18 EDT
Andrew, please see that TestDateFormat.java testcase which fails the same:

import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
public class TestDateFormat {
  public static void main(String[] args) throws IOException
  {
    SimpleDateFormat sdf = new SimpleDateFormat("EEE MMM dd HH:mm:ss zzz yyyy");
    System.out.println(sdf.format(new Date()));
  }
}

[jakub@xxx tmp]$ gcj -fmain=TestDateFormat -o TestDateFormat{,.java}
[jakub@xxx tmp]$ LC_ALL=C ./TestDateFormat
Mon Jun 13 11:18:27 GMT+02:00 2011
[jakub@xxx tmp]$ LC_ALL=de_DE.UTF-8 ./TestDateFormat
Mo.  13 11:18:32 GMT+02:00 2011
[jakub@xxx tmp]$ LC_ALL=de_AT.UTF-8 ./TestDateFormat
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5
   at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.12)
   at java.text.SimpleDateFormat.format(libgcj.so.12)
   at java.text.DateFormat.format(libgcj.so.12)
   at TestDateFormat.main(TestDateFormat)
[jakub@xxx tmp]$ LC_ALL=de_DE.UTF-8 locale -k LC_TIME | grep abmon
abmon="Jan;Feb;Mär;Apr;Mai;Jun;Jul;Aug;Sep;Okt;Nov;Dez"
[jakub@xxx tmp]$ LC_ALL=de_AT.UTF-8 locale -k LC_TIME | grep abmon
abmon="Jän;Feb;Mär;Apr;Mai;Jun;Jul;Aug;Sep;Okt;Nov;Dez"
[jakub@xxx tmp]$ LC_ALL=C locale -k LC_TIME | grep abmon
abmon="Jan;Feb;Mar;Apr;May;Jun;Jul;Aug;Sep;Oct;Nov;Dec"

For de_DE.UTF-8 locale it surprisingly doesn't print any month at all, for de_AT.UTF-8 it crashes, for e.g. cs_CZ.UTF-8 it prints 6. instead of month name
(that is also surprising, because abmon is "čen" (i.e. 3 letters).  But
both de_DE and de_AT use "Jun", i.e. 3 letters exactly as C or en_US.UTF-8 (unless Java has a different locale data from libc).
Comment 10 Fedora Update System 2011-07-12 17:56:59 EDT
pdftk-1.41-28.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 11 Sandro Bonazzola 2011-07-14 10:05:54 EDT
(In reply to comment #10)
> pdftk-1.41-28.fc14 has been pushed to the Fedora 14 stable repository.  If
> problems still persist, please make note of it in this bug report.

As said in comment #8 is not working but with another error code (maybe because month changed):

Unhandled Java Exception:
java.lang.ArrayIndexOutOfBoundsException: 6
   at java.text.SimpleDateFormat.formatWithAttribute(libgcj.so.11)
   at java.text.SimpleDateFormat.format(libgcj.so.11)
   at java.text.DateFormat.format(libgcj.so.11)
   at com.lowagie.text.Document.addCreationDate(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfDocument.<init>(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfStamperImp.<init>(itext-2.1.7.jar.so)
Comment 12 Andrew Haley 2011-08-05 11:18:52 EDT
Created attachment 516911 [details]
LocaleInformation_de_AT.properties
Comment 13 Andrew Haley 2011-08-05 11:20:40 EDT
The core problem is that the local properties file is corrupt.  We need to regenerate the LocaleInformation_de_AT.properties file without the corruption.
Comment 14 Andrew John Hughes 2011-09-21 11:51:45 EDT
gcj issue.
Comment 15 Andrew John Hughes 2011-09-21 11:52:09 EDT
Taking.
Comment 16 Andrew John Hughes 2012-01-03 06:36:40 EST
It's not that the locale file is corrupt.  It accurately represents the CLDR data for that locale which only specifies short months for January and March.  The problem is that the code for DateFormatSymbols doesn't account for the possibility of empty data.

I think an easier fix would be to make DateFormatSymbols search up the hierarchy rather than trying to re-write the locale data generator extensively and introduce lots of duplicate data in every properties file.  At the moment, it works on the basis of single CLDR files and would need to be changed to understand the locale hierarchy, something which is already built into the Locale object used by DFS.
Comment 17 Andrew John Hughes 2012-01-12 12:46:26 EST
I've added a sanity check for this data to Mauve:

http://sourceware.org/ml/mauve-patches/2012/msg00000.html
http://sourceware.org/cgi-bin/cvsweb.cgi/mauve/gnu/testlet/java/text/DateFormatSymbols/SanityCheck.java?cvsroot=mauve

This checks that the arrays are the right size and contain non-null non-empty strings (except where empty strings are allowed; weekday[0] and month[12]).
With gcj 4.6.2, we have 783 failures out of 29216 tests.  With the patch
I'm about to post to GNU Classpath applied, this reduces to 235 out of
30456.  The test increase is due to some arrays now being their correct
larger size.  For example, the de_AT short month array now correctly has 13 elements.

For reference, OpenJDK has 0 failures out of 14288 tests (they have
much less locale data).  I should be able to get our failure rate down
further by fixing further bugs in the locale data.  The generator is
currently picking up the wrong incomplete set of month & week names
for locales like be and cy.

I'll attach the initial fix here once committed.  It makes the following changes:

1.  The data arrays are pre-populated with empty strings to the right size and then filled from the locale data.
2.  The locale data is parsed, such that it doesn't throw away trailing empty fields (this was more necessary before I made change 1).
3.  We now search locales further up the hierarchy.  de_AT only provides values for January and March.  This is why we end up with a three element array.  This is intentional in the CLDR data, where locales should inherit data from further up.  So the de_AT data should be composed of de_AT -> de -> ROOT, not just de_AT.
4. Similarly, the CLDR spec. (http://www.unicode.org/reports/tr35/tr35-10.html#Date_Elements) specifies "sideways" inheritance for the month and day names i.e. if there is no short name, the long name should be used.  We now also do this.

That's sufficient to avoid this bug but further patches will be needed to make the data correct.  For example, de_AT now has a complete set of month names but some of them are incorrect as the locale generator includes data from the "stand-alone" context rather than the "format" context (again, see http://www.unicode.org/reports/tr35/tr35-10.html#Date_Elements).  The former is for headings, while the latter is the one for date formatting that we need.  I imagine this is because it uses the last one parsed rather than specifically choosing stand-alone.

Anyway, this can be fixed, but it means regenerating the data which I'll do in a separate patch.  The data also needs updating so that it occurs sorted (allowing for readable diffs) and omits the trailing separator (currently handled as a hack in the DateFormatSymbols class).  I'll post all patches here, and leave it up to the packager as to which ones to include in the RPM.
Comment 18 Andrew John Hughes 2012-01-12 13:03:28 EST
Created attachment 552456 [details]
Part 1 of fix (minimum required)

With this patch, we have:

de_AT: Short months=[Jän, Februar, Mär, April, Mai, Juni, Jul, Aug, Sep, Okt, Nov, Dez, ]

Note that 2, 4 and 6 are wrong (should be Feb, Apr and Jun).  This is because
the incomplete stand-alone short name data is used, leaving the empty fields to be filled by the long names.  This will be fixed with updated locale data.
Comment 19 Andrew John Hughes 2012-02-01 11:17:05 EST
I've now patched everything (I hope!)

http://developer.classpath.org/pipermail/classpath-patches/2012-January/006635.html

has most of the patches except the final one I just need to commit.
I'll post them all here later today.

What's the route for getting this into Fedora?
Comment 20 Sandro Bonazzola 2012-03-01 05:10:43 EST
Any news?
Comment 21 Andrew John Hughes 2012-03-01 06:01:47 EST
The issue is fixed in full in GNU Classpath now. The changes need to be merged over to gcj.  I was hoping we'd be able to get a Classpath release out and use that as a point to do a merge with gcj, but problems arose (doc. generation failures) and it didn't happen.

Jakub said he would just backport from gcj for Fedora.  I can commit just these patches to a specific gcj branch if that would be quicker.  I just need pointers as to which one to get the ball rolling.
Comment 22 Andrew John Hughes 2012-03-07 20:26:30 EST
Created attachment 568457 [details]
Part 2 of 5; sort locale resource files for later updates
Comment 23 Andrew John Hughes 2012-03-07 20:28:55 EST
Created attachment 568458 [details]
Part 3 of 5; Use the main approved value for properties.
Comment 24 Andrew John Hughes 2012-03-07 20:30:06 EST
Created attachment 568459 [details]
Part 4 of 5; Use the 'format' context type for months and days.
Comment 25 Andrew John Hughes 2012-03-07 20:30:56 EST
Created attachment 568460 [details]
Part 5 of 5; update locale data without trailing separator
Comment 26 Andrew John Hughes 2012-03-07 20:32:43 EST
The attached patches complete the set.  At least 1-4 should be completed to get a full fix.  5 is optional, getting rid of the trailing separator in the locale data rather than removing it using a substring call in DateFormatSymbols.
Comment 27 Andrew John Hughes 2012-03-21 11:09:06 EDT
This patch is posted of the merge posted for review here:

http://gcc.gnu.org/ml/java-patches/2012-q1/msg00063.html

It's pretty huge though, you won't want to backport the whole thing.  The patches posted here are enough to fix the bug.
Comment 28 Jakub Jelinek 2012-04-16 07:02:24 EDT
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=186487
Should be in gcc-4.7.0-2, untested.
Comment 29 Fedora Update System 2012-04-16 10:08:59 EDT
gcc-4.7.0-2.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/gcc-4.7.0-2.fc17
Comment 30 Fedora Update System 2012-04-16 17:55:24 EDT
Package gcc-4.7.0-2.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing gcc-4.7.0-2.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-5975/gcc-4.7.0-2.fc17
then log in and leave karma (feedback).
Comment 31 Fedora Update System 2012-04-18 19:04:25 EDT
gcc-4.7.0-2.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 32 Fedora Update System 2012-04-19 11:42:20 EDT
pdftk-1.44-9.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/pdftk-1.44-9.fc17
Comment 33 Fedora Update System 2012-04-28 21:02:52 EDT
pdftk-1.44-9.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.