Bug 1367588 - Improve the redirection for specific URL for RTD coming from old website
Summary: Improve the redirection for specific URL for RTD coming from old website
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: project-infrastructure
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL: http://thread.gmane.org/gmane.comp.fi...
Whiteboard:
Depends On: 1359062
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-16 21:14 UTC by M. Scherer
Modified: 2018-08-13 04:19 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1359062
Environment:
Last Closed: 2018-08-13 04:19:17 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description M. Scherer 2016-08-16 21:14:46 UTC
+++ This bug was initially created as a clone of Bug #1359062 +++

On Thu, Jul 21, 2016 at 01:56:30PM -0400, Kaleb KEITHLEY wrote:
> <top post>
> I have archived the old community documention (in
> /root/old-community-documenation.tgz) and set up a redirect to our
> documentation on readthedocs.io.
> </top post>

Thanks! It is good to see some progress here.

However we should improve the redirection in such a way that any URL
starting with gluster.org/community/documentation/ gets pointed to the
new site.  There was a plan to map all pages to their right new
location, but that seems rather painful to do. A generic redirect would
still allow existing search results, bookmarks or links on other pages
reach the new docs.

Maybe Misc could put that in the webserver config? Or someone could
write an index.php (that was in all URLs anyway, right?) that does the
redirection.

Thanks,
Niels

--- Additional comment from Michael Scherer on 2016-07-25 06:00:59 EDT ---

yeah, I am gonna do it in the web server config, and move that config from salt to ansible while on it.

--- Additional comment from Michael Scherer on 2016-07-25 11:06:18 EDT ---

So we have:

Redirect permanent /rdo.php     http://www.gluster.org/community/documentation/index.php/OpenStack
Redirect permanent /presos.php  http://www.gluster.org/community/documentation/index.php/Presentation
Redirect permanent /docs/       http://www.gluster.org/community/documentation/index.php/Main_Page

Can we consider this to be obsoletes now ?

And we have:

RedirectMatch ^/documentation/(.*) http://www.gluster.org/docs-redirect/

I will extend that to /community/documentation as well ?

--- Additional comment from Nigel Babu on 2016-08-01 23:01:30 EDT ---

Looks like misc has fixed this. Closing as resolved.

--- Additional comment from Amye Scavarda on 2016-08-15 14:56:14 EDT ---

We should revisit this because we're discovering that there are things that people did need that are getting redirected into places where they don't exist. 

The review of all /community and Mediawiki work and moving into RTD was dependent on being able to resolve the search issues with RTD  - which is still outstanding.

However, with a focus on moving off RTD and into another system, we may be able to move what was in Mediawiki into the canonical documentation as part of this.

--- Additional comment from M. Scherer on 2016-08-16 04:03:47 EDT ---

Could "there are things that people did need that are getting redirected into places where they don't exist. " be a bit more detailed ?

--- Additional comment from Amye Scavarda on 2016-08-16 13:41:45 EDT ---

I have gotten multiple questions about where pieces from the mediawiki have gotten moved. 

Please can we revisit this?

--- Additional comment from Niels de Vos on 2016-08-16 14:03:54 EDT ---

(In reply to Amye Scavarda from comment #6)
> I have gotten multiple questions about where pieces from the mediawiki have
> gotten moved.

Could you expand on the "pieces from the mediawiki"? Depending on the "piece", it can be redirected to the correct RTD page, or elsewhere...

--- Additional comment from Amye Scavarda on 2016-08-16 14:13:55 EDT ---

At the very least, 
http://www.gluster.org/community/documentation/index.php/Features/Opversion that's marked in https://bugzilla.redhat.com/show_bug.cgi?id=1365706.

What would also help me get more data is tracking on the documentation site, as right now the redirect is not reporting back on what our highest hit pages are.

Comment 1 M. Scherer 2016-08-16 21:17:27 UTC
So the top 10 urls (after crude bots filtering):

[root@supercolony httpd]# grep /community/documentation/index.php  www.gluster.org-access_log |grep -v bing | grep -v 'Yahoo!' |grep -v Googlebot | awk '{print $7}'  |sort |uniq -c  | sort -rn |head -n 10

     48 /docs-redirect/
     35 /community/documentation/index.php/Gluster_3.1:_Manually_Mounting_Volumes
     26 /community/documentation/index.php?title=Special:RecentChanges&feed=atom
     16 /community/documentation/index.php/QuickStart
     14 /community/documentation/index.php?title=Main_Page&feed=atom&action=history
     14 /community/documentation/index.php/Gluster_3.2:_Starting_Gluster_Geo-replication
     13 /community/documentation/index.php/Gluster_3.2:_Configuring_Distributed_Striped_Volumes
     12 /community/documentation/index.php/Getting_started_overview
     11 /community/documentation/index.php/Gluster_3.2:_Setting_Volume_Options
     10 /community/documentation/index.php/Main_Page

So the most popular url are either 3.2 time, or generic pages.

A more thorough check and examination will unfortunately have to wait for me to sleep.

Comment 2 M. Scherer 2016-08-16 21:53:56 UTC
So, looking in more details, for the url Gluster_3.2:_Configuring_Distributed_Striped_Volumes , there is more bots I didn't filtered, and the same ip downloading the page 10 times.

The same goes for Gluster_3.1:_Manually_Mounting_Volumes, 27 hits from the same ip in Island, and bots. And ip from the same country ( 2 times ), and 2 indians hits.

I suspect that we would need more data to see what should be mapped, and/or make a editorial choice based on existing stuff.


Alternatively, someone can decide to revert the complete change and redirection for the time being, but that trading one set of issue for another one.

Comment 3 M. Scherer 2016-08-17 12:10:13 UTC
So I did a quick verification on the whole set of logs, and we have since the 26 July around 22 000 hits. 

# grep /community/documentation/index.php www.gluster.org-access_log* |wc -l
22708

Around 90% of the traffic is bots:
# grep /community/documentation/index.php www.gluster.org-access_log*  |grep -v g2reader-bot/ | grep -v Slurp\; |grep -vi bingbot |grep -vi googlebot |grep -v Baiduspider/ |grep -v AhrefsBot/  |grep -v MJ12bot/ | grep -v 'Sogou web' |grep -v SeznamBot/ |grep -v electricmonk/ | grep -v 'HaosouSpider;' |grep -v archive.org_bot  |grep -v Feedly/1.0 |grep -v SputnikBot/ | grep -v yoozBot |wc -l
2598

I suspect on top of that that there is lots of refresh and duplicate ips

# grep /community/documentation/index.php www.gluster.org-access_log*  |grep -v g2reader-bot/ | grep -v Slurp\; |grep -vi bingbot |grep -vi googlebot |grep -v Baiduspider/ |grep -v AhrefsBot/  |grep -v MJ12bot/ | grep -v 'Sogou web' |grep -v SeznamBot/ |grep -v electricmonk/ | grep -v 'HaosouSpider;' |grep -v archive.org_bot  |grep -v Feedly/1.0 |grep -v SputnikBot/ | grep -v yoozBot  |awk '{print $1}' |awk -F: '{print $2}'  |sort -u |wc -l
649

Then trying to group by network just show around 600 hits. That's roughly 2 to 3 visitors per day on the wiki. 

After removing the various hacking attempt (aimed at joomla), the hit on the redirect page itself, the tentative to login for spam, and favicon, we are down to 1500 hits (without deduplication):

# grep /community/documentation/index.php www.gluster.org-access_log*  |grep -v g2reader-bot/ | grep -v Slurp\; |grep -vi bingbot |grep -vi googlebot |grep -v Baiduspider/ |grep -v AhrefsBot/  |grep -v MJ12bot/ | grep -v 'Sogou web' |grep -v SeznamBot/ |grep -v electricmonk/ | grep -v 'HaosouSpider;' |grep -v archive.org_bot  |grep -v Feedly/1.0 |grep -v SputnikBot/ | grep -v yoozBot |grep -v docs-redirect   |awk '{print $7}' |grep -v 'Special:UserLogin' |grep -v '&action=history'  |grep -v '%22%20h=/' |grep -v /favicon.ico |wc -l
1524


Then the 30 most popular URLs are:

[root@supercolony httpd]# grep /community/documentation/index.php www.gluster.org-access_log*  |grep -v g2reader-bot/ | grep -v Slurp\; |grep -vi bingbot |grep -vi googlebot |grep -v Baiduspider/ |grep -v AhrefsBot/  |grep -v MJ12bot/ | grep -v 'Sogou web' |grep -v SeznamBot/ |grep -v electricmonk/ | grep -v 'HaosouSpider;' |grep -v archive.org_bot  |grep -v Feedly/1.0 |grep -v SputnikBot/ | grep -v yoozBot |grep -v docs-redirect   |awk '{print $7}' |grep -v 'Special:UserLogin' |grep -v '&action=history'  |grep -v '%22%20h=/' |grep -v /favicon.ico |sort |uniq -c  |sort -rn | head -n 30
    206 /community/documentation/index.php/Gluster_3.1:_Manually_Mounting_Volumes
    143 /community/documentation/index.php/Gluster_3.2:_Setting_Volume_Options
     87 /community/documentation/index.php/QuickStart
     69 /community/documentation/index.php/Gluster_3.2:_Starting_Gluster_Geo-replication
     52 /community/documentation/index.php/Gluster_3.2:_gluster_Command
     43 /community/documentation/index.php/Main_Page
     37 /community/documentation/index.php/Translators/storage/bdb
     37 /community/documentation/index.php/Gluster_3.2:_Monitoring_your_GlusterFS_Workload
     36 /community/documentation/index.php/Gluster_3.2:_Terminology
     35 /community/documentation/index.php/Gluster_3.2:_Displaying_Volume_Information
     29 /community/documentation/index.php/Gluster_3.2:_Expanding_Volumes
     24 /community/documentation/index.php/Gluster_3.2:_Manually_Mounting_Volumes
     22 /community/documentation/index.php/GlusterFS_Concepts
     21 /community/documentation/index.php/Gluster_3.2:_Configuring_Distributed_Striped_Volumes
     16 /community/documentation/index.php/User_Guide
     16 /community/documentation/index.php/Gluster_3.2:_Tuning_Volume_Options
     16 /community/documentation/index.php/Getting_started_overview
     15 /community/documentation/index.php/Gluster_3.4:_Brick_Restoration_-_Replace_Crashed_Server
     14 /community/documentation/index.php/Gluster_3.1:_Understanding_the_GlusterFS_License
     12 /community/documentation/index.php/Translators/performance
     12 /community/documentation/index.php/Gluster_Translators
     12 /community/documentation/index.php/GlusterHPC_FAQ
     12 /community/documentation/index.php/Gluster_3.2:_Manually_Mounting_Volumes_Using_NFS
     12 /community/documentation/index.php/Getting_started_test_it_out
     10 /community/documentation/index.php/About_GlusterFS_3.3
      9 /community/documentation/index.php/Gluster_3.2:_Installing_GlusterFS_on_Red_Hat_Package_Manager_(RPM)_Distributions
      9 /community/documentation/index.php/Gluster_3.2:_GlusterFS_Geo-replication_Deployment_Overview
      9 /community/documentation/index.php/Documenting_the_undocumented
      8 /community/documentation/index.php/MediaWiki:Userlogin
      8 /community/documentation/index.php/Gluster_3.2:_Updating_Memory_Cache_Size

Comment 4 Nigel Babu 2018-08-13 04:19:17 UTC
I'd like to close this bug as WONT FIX.

We should identify gaps in our current docs and file issues to fix them against glusterdocs.


Note You need to log in before you can comment on or make changes to this bug.