Bug 1039789 - RFE: Provide more robust external link checking
Summary: RFE: Provide more robust external link checking
Keywords:
Status: NEW
Alias: None
Product: PressGang CCMS
Classification: Community
Component: REST-API
Version: 1.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: pressgang-ccms-dev
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-10 02:01 UTC by Matthew Casperson
Modified: 2018-09-21 23:09 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)

Description Matthew Casperson 2013-12-10 02:01:46 UTC
Link checking is currently limited to making sure the page opened by a link returns an appropriate HTTP return code. This could be made more robust in a number of different ways.

1. We could capture the etag, cache control and page titles of any external links when a topic is saved. These could be compared when the topic is opened in the UI or when it is opened in DocBuilder, and the user would be notified of any changes.

2. We could create minhash signatures of any external web pages when a topic is saved, and when they change by more than some amount (e.g. 20%) the user would be notified.

I think the second approach would be the most reliable, but also has a larger overhead in terms of storage and processing.

With this information it is possible to reliably detect any changes to external content and ensure that outgoing links point to what the author originally intended to point to. This may fix a lot of the issues with deep linking.

Comment 1 Lee Newson 2013-12-10 02:05:07 UTC
Just adding that for any of the above options they should be performed in threads, as accessing external content has varying amounts of time based on ping, page size, etc... which could result in timeouts on our endpoints (a good example of this was us having to fix source urls).


Note You need to log in before you can comment on or make changes to this bug.