Bug 1367588
| Summary: | Improve the redirection for specific URL for RTD coming from old website | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | M. Scherer <mscherer> |
| Component: | project-infrastructure | Assignee: | bugs <bugs> |
| Status: | CLOSED WONTFIX | QA Contact: | |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | mainline | CC: | amye, bugs, gluster-infra, misc, ndevos, nigelb |
| Target Milestone: | --- | Keywords: | Reopened, Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| URL: | http://thread.gmane.org/gmane.comp.file-systems.gluster.infra/1534 | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1359062 | Environment: | |
| Last Closed: | 2018-08-13 04:19:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1359062 | ||
| Bug Blocks: | |||
|
Description
M. Scherer
2016-08-16 21:14:46 UTC
So the top 10 urls (after crude bots filtering):
[root@supercolony httpd]# grep /community/documentation/index.php www.gluster.org-access_log |grep -v bing | grep -v 'Yahoo!' |grep -v Googlebot | awk '{print $7}' |sort |uniq -c | sort -rn |head -n 10
48 /docs-redirect/
35 /community/documentation/index.php/Gluster_3.1:_Manually_Mounting_Volumes
26 /community/documentation/index.php?title=Special:RecentChanges&feed=atom
16 /community/documentation/index.php/QuickStart
14 /community/documentation/index.php?title=Main_Page&feed=atom&action=history
14 /community/documentation/index.php/Gluster_3.2:_Starting_Gluster_Geo-replication
13 /community/documentation/index.php/Gluster_3.2:_Configuring_Distributed_Striped_Volumes
12 /community/documentation/index.php/Getting_started_overview
11 /community/documentation/index.php/Gluster_3.2:_Setting_Volume_Options
10 /community/documentation/index.php/Main_Page
So the most popular url are either 3.2 time, or generic pages.
A more thorough check and examination will unfortunately have to wait for me to sleep.
So, looking in more details, for the url Gluster_3.2:_Configuring_Distributed_Striped_Volumes , there is more bots I didn't filtered, and the same ip downloading the page 10 times. The same goes for Gluster_3.1:_Manually_Mounting_Volumes, 27 hits from the same ip in Island, and bots. And ip from the same country ( 2 times ), and 2 indians hits. I suspect that we would need more data to see what should be mapped, and/or make a editorial choice based on existing stuff. Alternatively, someone can decide to revert the complete change and redirection for the time being, but that trading one set of issue for another one. So I did a quick verification on the whole set of logs, and we have since the 26 July around 22 000 hits.
# grep /community/documentation/index.php www.gluster.org-access_log* |wc -l
22708
Around 90% of the traffic is bots:
# grep /community/documentation/index.php www.gluster.org-access_log* |grep -v g2reader-bot/ | grep -v Slurp\; |grep -vi bingbot |grep -vi googlebot |grep -v Baiduspider/ |grep -v AhrefsBot/ |grep -v MJ12bot/ | grep -v 'Sogou web' |grep -v SeznamBot/ |grep -v electricmonk/ | grep -v 'HaosouSpider;' |grep -v archive.org_bot |grep -v Feedly/1.0 |grep -v SputnikBot/ | grep -v yoozBot |wc -l
2598
I suspect on top of that that there is lots of refresh and duplicate ips
# grep /community/documentation/index.php www.gluster.org-access_log* |grep -v g2reader-bot/ | grep -v Slurp\; |grep -vi bingbot |grep -vi googlebot |grep -v Baiduspider/ |grep -v AhrefsBot/ |grep -v MJ12bot/ | grep -v 'Sogou web' |grep -v SeznamBot/ |grep -v electricmonk/ | grep -v 'HaosouSpider;' |grep -v archive.org_bot |grep -v Feedly/1.0 |grep -v SputnikBot/ | grep -v yoozBot |awk '{print $1}' |awk -F: '{print $2}' |sort -u |wc -l
649
Then trying to group by network just show around 600 hits. That's roughly 2 to 3 visitors per day on the wiki.
After removing the various hacking attempt (aimed at joomla), the hit on the redirect page itself, the tentative to login for spam, and favicon, we are down to 1500 hits (without deduplication):
# grep /community/documentation/index.php www.gluster.org-access_log* |grep -v g2reader-bot/ | grep -v Slurp\; |grep -vi bingbot |grep -vi googlebot |grep -v Baiduspider/ |grep -v AhrefsBot/ |grep -v MJ12bot/ | grep -v 'Sogou web' |grep -v SeznamBot/ |grep -v electricmonk/ | grep -v 'HaosouSpider;' |grep -v archive.org_bot |grep -v Feedly/1.0 |grep -v SputnikBot/ | grep -v yoozBot |grep -v docs-redirect |awk '{print $7}' |grep -v 'Special:UserLogin' |grep -v '&action=history' |grep -v '%22%20h=/' |grep -v /favicon.ico |wc -l
1524
Then the 30 most popular URLs are:
[root@supercolony httpd]# grep /community/documentation/index.php www.gluster.org-access_log* |grep -v g2reader-bot/ | grep -v Slurp\; |grep -vi bingbot |grep -vi googlebot |grep -v Baiduspider/ |grep -v AhrefsBot/ |grep -v MJ12bot/ | grep -v 'Sogou web' |grep -v SeznamBot/ |grep -v electricmonk/ | grep -v 'HaosouSpider;' |grep -v archive.org_bot |grep -v Feedly/1.0 |grep -v SputnikBot/ | grep -v yoozBot |grep -v docs-redirect |awk '{print $7}' |grep -v 'Special:UserLogin' |grep -v '&action=history' |grep -v '%22%20h=/' |grep -v /favicon.ico |sort |uniq -c |sort -rn | head -n 30
206 /community/documentation/index.php/Gluster_3.1:_Manually_Mounting_Volumes
143 /community/documentation/index.php/Gluster_3.2:_Setting_Volume_Options
87 /community/documentation/index.php/QuickStart
69 /community/documentation/index.php/Gluster_3.2:_Starting_Gluster_Geo-replication
52 /community/documentation/index.php/Gluster_3.2:_gluster_Command
43 /community/documentation/index.php/Main_Page
37 /community/documentation/index.php/Translators/storage/bdb
37 /community/documentation/index.php/Gluster_3.2:_Monitoring_your_GlusterFS_Workload
36 /community/documentation/index.php/Gluster_3.2:_Terminology
35 /community/documentation/index.php/Gluster_3.2:_Displaying_Volume_Information
29 /community/documentation/index.php/Gluster_3.2:_Expanding_Volumes
24 /community/documentation/index.php/Gluster_3.2:_Manually_Mounting_Volumes
22 /community/documentation/index.php/GlusterFS_Concepts
21 /community/documentation/index.php/Gluster_3.2:_Configuring_Distributed_Striped_Volumes
16 /community/documentation/index.php/User_Guide
16 /community/documentation/index.php/Gluster_3.2:_Tuning_Volume_Options
16 /community/documentation/index.php/Getting_started_overview
15 /community/documentation/index.php/Gluster_3.4:_Brick_Restoration_-_Replace_Crashed_Server
14 /community/documentation/index.php/Gluster_3.1:_Understanding_the_GlusterFS_License
12 /community/documentation/index.php/Translators/performance
12 /community/documentation/index.php/Gluster_Translators
12 /community/documentation/index.php/GlusterHPC_FAQ
12 /community/documentation/index.php/Gluster_3.2:_Manually_Mounting_Volumes_Using_NFS
12 /community/documentation/index.php/Getting_started_test_it_out
10 /community/documentation/index.php/About_GlusterFS_3.3
9 /community/documentation/index.php/Gluster_3.2:_Installing_GlusterFS_on_Red_Hat_Package_Manager_(RPM)_Distributions
9 /community/documentation/index.php/Gluster_3.2:_GlusterFS_Geo-replication_Deployment_Overview
9 /community/documentation/index.php/Documenting_the_undocumented
8 /community/documentation/index.php/MediaWiki:Userlogin
8 /community/documentation/index.php/Gluster_3.2:_Updating_Memory_Cache_Size
I'd like to close this bug as WONT FIX. We should identify gaps in our current docs and file issues to fix them against glusterdocs. |