Bug 1198433 - Replace Seam Text with CommonMark Markdown
Summary: Replace Seam Text with CommonMark Markdown
Alias: None
Product: Zanata
Classification: Retired
Component: Component-UI, DatabaseChange
Version: development
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.7
Assignee: Sean Flanigan
QA Contact: Ding-Yi Chen
Depends On:
Blocks: 1027033 1056296 1056301 1065234 1094534
TreeView+ depends on / blocked
Reported: 2015-03-04 06:40 UTC by Sean Flanigan
Modified: 2015-07-22 02:19 UTC (History)
2 users (show)

Fixed In Version: 3.7.0-SNAPSHOT (git-jenkins-zanata-server-github-pull-requests-3566)
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-07-22 02:19:51 UTC

Attachments (Terms of Use)

Description Sean Flanigan 2015-03-04 06:40:41 UTC
To get rid of Seam (bug 1094534) we need to get rid of Seam Text.  Seam Text and the RichFaces editor for it have been a source of bugs, plus it is not a standard format users are familiar with.

We plan to replace our current support for editing and rendering Seam Text with support for CommonMark (standardised MarkDown).

Automatically migrating the existing Seam Text to CommonMark is out of scope due to complexity (eg see bug 1065234), but we will track whether each piece of text has been saved since CommonMark support was added.  This will let us notify the user, and could potentially be used to enable an automatic migration in future, should someone implement it.

Comment 1 Sean Flanigan 2015-03-04 06:52:18 UTC
Technical considerations:

* We want to be able to add a preview feature, for which a client-side JavaScript CommonMark renderer would be very helpful.  Plus there don't seem to be any complete CommonMark renderers for Java yet.
* We need to sanitise user input on the About Project page, to protect against XSS.  This is probably not necessary for the Home page, which is only editable by admin users.  Unfortunately, the JavaScript sanitisers seem to be pretty limited in their options, so it would be better to use the OWASP sanitiser we use for email.
* In order to achieve both of the above, I plan to run a JavaScript renderer on the server side, using Rhino (and perhaps Nashorn in future), and then run the generated HTML through OWASP's sanitiser.

See also the following Seam Text bugs, which should be eliminated by this feature:

* bug 1065234
* bug 1056296
* bug 1056301

Comment 2 Sean Flanigan 2015-03-18 01:16:35 UTC
Pull request: https://github.com/zanata/zanata-server/pull/732

Comment 3 Sean Flanigan 2015-05-08 05:59:37 UTC
A note on the testing and on the conversion approach:

I have a test (not checked in) which I used to run the conversion method against every bit of Seam Text I could find in all the production databases I could find, comparing the original rendered Seam Text and the rendered CommonMark.  I fixed some bugs in my conversion, massaged some of the data (in production) and told it to skip over some others which looked like plain old HTML (not Seam Text).

I have also run Zanata server (and thus the Liquibase migration) against copies of all the production databases.  As expected from the previous test, there were a number of warnings for invalid Seam Text, all of which probably cause rendering failures in production, like bug 1065234.

In the cases I looked at, the result looks better now that it is rendered as CommonMark/HTML, since most of these cases were actually HTML.  There was one case which was mostly rendered as HTML source in monospace, but it previously didn't render at all, so I still consider it an improvement.

Note that some valid Seam Text (ie with Seam Text markup inside HTML tags like <p>) may be transliterated to similar CommonMark, but because CommonMark doesn't process CommonMark inside HTML tags, it will be rendered a little differently.  

A common case is underscores in URLs inside HTML, which were always badly handled in Seam Text, especially if there was an odd number of underscores.  In cases like this, the converted keeps the underscores as underscores, rather than converting to <u></u>.  This will probably fix a few URLs which were broken by Seam!

A slightly worse case is emphasised text (using * markup or similar) inside HTML, which will be replaced by the CommonMark equivalent markup, but rendered as ASCII (and still quite readable) unless someone removes the surrounding HTML in the editor.

Comment 4 Ding-Yi Chen 2015-05-18 08:17:55 UTC
VERIFIED with Zanata 3.7.0-SNAPSHOT (git-jenkins-zanata-server-github-pull-requests-3566)

Comment 5 Ding-Yi Chen 2015-06-05 03:36:42 UTC
FYI: Upgrade commonmark.js to 0.19.0

Note You need to log in before you can comment on or make changes to this bug.