Red Hat Bugzilla – Bug 974640
RFE: Permalinks and human-readable URLs
Last modified: 2014-04-08 21:55:27 EDT
Description of problem:
It seems that we have and will continue to have many broken links not just in the Customer Portal, but internet-wide.
While I do love the human-readability of the current metadata-based structure (and so do search engines), it doesn't provide any permanence for documentation. This lack of a permalink causes issues in the Customer Portal, when referencing particular 'books'. It also affects many websites outside of Red Hat, e.g. Stack Overflow, resulting in 404 errors when users provide links to our own documentation.
I also imagine that this metadata-based structure poses problems for retiring documentation as well.
I believe this is why many Content Management Systems and Blog Systems provide URL structures in both ways. Pages can have human-readable URLs *and* permalinks that are based on IDs. When the human-readable URL changes, the ID will never change. Thus, allowing permanent redirects (on the server) to point to moved or retired documents. Both WordPress and Drupal do this.
It would be worth looking into a way that we can have both structures running simultaneously. Relying on the IDs for permanence and the prettier metadata URLs for SEO/robots/humans.
This will most likely require a server-side approach rather than statically generated HTML.
It could be done with static html like this:
publican --product-rename <new_name>
This generates two versions of the book, one with the new product name, and one with the old product name where the html output is a set of pages that redirect to their equivalent in the new book.
You publish both, and the redirect book handles the redirects for you.
In that case you'd hide the redirect book in the website TOC.
It can all be done from one place in that case. Separating it into a server-side operation and a authoring-side operation will put its execution across two groups, which will have latency and friction.
The 'static website' constraint is imposed on publican's design. You will need to discuss that with ECS management directly if you want that changed. Such a discussion is best held outside of Bugzilla.
"Faking" a web service is not scalable or maintainable, we are defiantly not going to try that.
Rudi has asked for this to be reopened and more information supplied.
Hi, I've sent Rudi a long email with a bunch of stuff that is IMO best left out of Bugzilla. He will forward that on as needed in other discussions.
A few of the options depend on how the Customer Portal exposes redirection, does the CP have an API for this? If not are there suggested methods on doing this in the CP?
The Customer Portal, as you may know, is not just one thing. Individual apps (case mgt, kbase, subscriptions, docs, etc) should keep their own documents in order and that includes handling redirects to renamed/retired/moved content. There is no central API for this mainly because it should be handled as close to the source content as it can be. What woudl such an API do?
Josh's proposal in comment 1 is spot on IMO.
(In reply to Chris Bredesen from comment #5)
> The Customer Portal, as you may know, is not just one thing. Individual apps
> (case mgt, kbase, subscriptions, docs, etc) should keep their own documents
> in order and that includes handling redirects to renamed/retired/moved
> content. There is no central API for this mainly because it should be
> handled as close to the source content as it can be. What woudl such an API
The publican website is not an app, it is static content. The discussion here is how we bridge the gap. Is there a way to update the Apache redirect rules? Is it using Apache at all, or is there some other system being used at that level?
The issue isn't whether we can update the rules. We live behind a global (*.redhat.com) unified proxy layer provided by an F5 appliance. That level in the stack is not an appropriate place to handle detailed application-level concerns like moved book content that happens rather frequently. This will need to be done elsewhere. Josh's suggestion is the best one IMO with .htaccess files in the Publican Apache instance also viable.
(In reply to Chris Bredesen from comment #8)
> The issue isn't whether we can update the rules. We live behind a global
> (*.redhat.com) unified proxy layer provided by an F5 appliance. That level
> in the stack is not an appropriate place to handle detailed
> application-level concerns like moved book content that happens rather
> frequently. This will need to be done elsewhere. Josh's suggestion is the
> best one IMO with .htaccess files in the Publican Apache instance also
"the Publican Apache instance" a knowledge gap is filled.
Josh did not mention .htaccess, to me he seemed to be talking about making static HTML files that have a redirect meta tag. There is no need for a separate package/payload with .htaccess files.
.htaccess files would be a reasonable approach for the subset of URL changes that can be managed at the directory/page level. I'm not sure it can handle the subset of URL changes caused by changes in chunking, renaming of internal links, or restructuring of content.
http://www.example.com/test.html#section5 becomes http://www.example.com/section5.html or vice-verse
http://www.example.com/test.html#section5 becomes http://www.example.com/index.html#section5_The_Life_Of_Brian
http://www.example.com/test.html#section5 becomes http://www.example.com/index.html#section6
To the googleatron!
By default, redirecting to an HTML anchor doesn't work, because mod_rewrite escapes the # character, turning it into %23. This, in turn, breaks the redirection.
Use the [NE] flag on the RewriteRule. NE stands for No Escape.