Bug 1039789

Summary: RFE: Provide more robust external link checking
Product: [Community] PressGang CCMS Reporter: Matthew Casperson <mcaspers>
Component: REST-APIAssignee: pressgang-ccms-dev
Status: NEW --- QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 1.3CC: cbredesen
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Matthew Casperson 2013-12-10 02:01:46 UTC
Link checking is currently limited to making sure the page opened by a link returns an appropriate HTTP return code. This could be made more robust in a number of different ways.

1. We could capture the etag, cache control and page titles of any external links when a topic is saved. These could be compared when the topic is opened in the UI or when it is opened in DocBuilder, and the user would be notified of any changes.

2. We could create minhash signatures of any external web pages when a topic is saved, and when they change by more than some amount (e.g. 20%) the user would be notified.

I think the second approach would be the most reliable, but also has a larger overhead in terms of storage and processing.

With this information it is possible to reliably detect any changes to external content and ensure that outgoing links point to what the author originally intended to point to. This may fix a lot of the issues with deep linking.

Comment 1 Lee Newson 2013-12-10 02:05:07 UTC
Just adding that for any of the above options they should be performed in threads, as accessing external content has varying amounts of time based on ping, page size, etc... which could result in timeouts on our endpoints (a good example of this was us having to fix source urls).