Bug 1095590

Summary: Asciidoc import: alt text for figures is incomplete
Product: [Community] PressGang CCMS Reporter: mmurray
Component: ImportToolAssignee: Matthew Casperson <mcaspers>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 1.6CC: cbredesen, lnewson, mcaspers, nboldt
Target Milestone: ---   
Target Release: 1.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-05-28 21:56:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1091570, 1092170    
Attachments:
Description Flags
zip of maven project to build docbook and html from asciidoc sources
none
zip of maven project to build docbook and html from asciidoc sources - with commas in the alt tag
none
zip of maven project to build docbook and html from asciidoc sources - with commas in the alt tag, FIXED by adding quotes in the tag none

Description mmurray 2014-05-08 05:29:20 UTC
Asciidoc:
.My great figure
image::images/4116.png[Web Page Open on Simulated Device.]

Resulting docbook:
	<figure>
		<title>My great figure</title>
		<mediaobject>
			<imageobject>
				<imagedata contentdepth="pen o" contentwidth="age O" fileref="images/4116.png"/>
			</imageobject>
			<textobject> 
				<phrase>Web P</phrase>
			</textobject>
		</mediaobject>
	</figure>

Issues:
* contentwidth and contentdepth are wrongly being pulled out of alt text
* <phrase> only contains part of alt text

I understand that commas in the [] of asciidoc image are used to specify width, links etc. But i don't have any commas here. So I don't see why the text is being broken up as it is.

Comment 1 Nick Boldt 2014-05-12 14:54:41 UTC
Created attachment 894761 [details]
zip of maven project to build docbook and html from asciidoc sources

Seems to be treating the alt tag as 3 blocks of 5 chars each, and truncating the rest:

[Web Page Open on Simulated Device.] ==> Web P,age O,pen o

-----

If I create a sample .adoc file like this:

<<test.zip!/test/src/main/asciidoc/test.adoc>>

And I render it using this pom.xml and the maven asciidoc plugin

<<test.zip!/test/pom.xml>>

Then the resulting docbook looks like this:

<<test.zip!/test/target/generated-docs/test.xml>>

In that case, the docbook XML looks fine, as does the resulting HTML.

So, clearly there's a problem w/ the way asciidoctor is being used here, as the maven wrapper for asciidoctor works fine on your sample .adoc.

Comment 2 Nick Boldt 2014-05-12 15:09:31 UTC
Created attachment 894763 [details]
zip of maven project to build docbook and html from asciidoc sources - with commas in the alt tag

This version of the project (test-commas.zip), with commas in the alt tag, results in broken docbook (but does NOT truncate the content):

.My great figure
image::images/4116.png[This description, containing commas, is an alt tag.]

becomes:

<figure>
<title>My great figure</title>
<mediaobject>
<imageobject>
<imagedata fileref="./images/4116.png" contentwidth="containing commas" contentdepth="is an alt tag."/>
</imageobject>
<textobject><phrase>This description</phrase></textobject>
</mediaobject>
</figure>

Comment 3 Nick Boldt 2014-05-12 15:12:01 UTC
Created attachment 894765 [details]
zip of maven project to build docbook and html from asciidoc sources - with commas in the alt tag, FIXED by adding quotes in the tag

If we add quotes into the alt tag, the resulting docbook is once again clean:

.My great figure
image::images/4116.png["This description, containing commas, is an alt tag."]

becomes

<figure>
<title>My great figure</title>
<mediaobject>
<imageobject>
<imagedata fileref="./images/4116.png"/>
</imageobject>
<textobject><phrase>This description, containing commas, is an alt tag.</phrase></textobject>
</mediaobject>
</figure>

Comment 4 Matthew Casperson 2014-05-18 22:21:01 UTC
I have confirmed that the Ascidoctor application (i.e. the Ruby one) will convert the asciidoc from https://bugzilla.redhat.com/show_bug.cgi?id=1095590#c1 correctly, so this must be an issue with Asciidoctor.js.

Comment 5 Matthew Casperson 2014-05-18 22:53:31 UTC
Fixed in Build 1.6-SNAPSHOT 201405190849

This was caused by changing the $ identifier in Asciidoctor.js to opal$ so it would play well with angularjs. I missed two cases where this find and replace modified a regex where the $ was used as a string end marker.

opal$opal.cdecl(opal$scope, 'BoundaryRxs', opal$hash2(["\"", "'", ","], {"\"": /.*?[^\\](?=")/, "'": /.*?[^\\](?=')/, ",": /.*?(?=[ \t]*(,|opal$))/}));

opal$opal.cdecl(opal$scope, 'SkipRxs', opal$hash2(["blank", ","], {"blank": opal$scope.BlankRx, ",": /[ \t]*(,|opal$)/}));

Fixing these fixed the image importing.

Comment 6 Matthew Casperson 2014-05-18 22:55:01 UTC
The supplied test case now imports as 

<section>
	<title>Test Page</title>
	<figure>
		<title>My great figure</title>
		<mediaobject>
			<imageobject>
				<imagedata fileref="images/4116.png" />
			</imageobject>
			<textobject> 
				<phrase>Web Page Open on Simulated Device.</phrase>
			</textobject>
		</mediaobject>
	</figure>
</section>

Comment 7 Lee Newson 2014-05-21 00:15:23 UTC
Verified that alt text is correctly populated now assuming the correct syntax is used. In the example Nick gave in Comment #2 the content was still imported with the incorrect contentdepth and contentwidth values, however this happened with the maven build as well, so marking this as VERIFIED.