Bug 1490994 - Missing redirect on the new wordpress blog for older links
Summary: Missing redirect on the new wordpress blog for older links
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: website
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-12 17:05 UTC by Amye Scavarda
Modified: 2017-10-04 16:31 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-04 16:31:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Screenshot of WordPress Redirect Options (21.61 KB, image/png)
2017-09-12 17:05 UTC, Amye Scavarda
no flags Details

Description Amye Scavarda 2017-09-12 17:05:09 UTC
Created attachment 1324978 [details]
Screenshot of WordPress Redirect Options

Description of problem:
Blog posts from blog.gluster.org don't have a redirect setup and old links are getting 404'd. 

Attaching screenshot of our options to redirect this within WordPress install, or we can do this in another way that I haven't thought of.

Comment 1 M. Scherer 2017-09-12 23:46:54 UTC
We would need a bit more information regarding the "old links". The current redirection we have on the current server have been done on the httpd level:


 ^/blog/(.*)     =>  http://blog.gluster.org/$1
 ^/(2\d{3})/(.*) =>  http://blog.gluster.org/$1/$2
 ^/tag/(.*)      =>  http://blog.gluster.org/tag/$1
 ^/category/(.*) =>  http://blog.gluster.org/category/$1
 ^/author/(.*)   =>  http://blog.gluster.org/author/$1
 ^/feed/(.*)     =>  http://blog.gluster.org/feed/$1

But that was mostly so we can have a separate blog from main website.

Comment 2 Amye Scavarda 2017-09-12 23:58:44 UTC
Oh, I see.
That gets complicated but not impossible.
Here's a recent example of a 404:
/2009/12/gluster-storage-platform-featured-press/feed/ 

/2013/07/feedback-requested-governance-of-glusterfs-project/ 

/2013/09/git-reincarnating-remote-master-branch-so-you-can-easily-pr-to-upstream/ 

What might help is looking at the blog.gluster.org WordPress install to see what the URL pattern was and decide what the best next steps are.

Comment 3 Amye Scavarda 2017-09-19 16:58:45 UTC
Pinging on this, what else do we need to do to make progress here?

Comment 4 M. Scherer 2017-09-19 17:10:31 UTC
We need to have the source, and the destination.

For example, what is the destination we want for /2009/12/gluster-storage-platform-featured-press/feed/  ?

What would be the one for /2013/09/git-reincarnating-remote-master-branch-so-you-can-easily-pr-to-upstream/ 

We can't have a redirection without all the sources and all the destinations.

Comment 5 Amye Scavarda 2017-09-19 17:36:11 UTC
http://www.gluster.org/git-reincarnating-remote-master-branch-so-you-can-easily-pr-to-upstream/ is the answer to the direct one. 
Unfortunately, it looks like the import done by the web developers has stripped out all of the previous URL date information.

What's the best way forward here?

Comment 6 Karsten Wade 2017-09-19 21:14:03 UTC
(In reply to Amye Scavarda from comment #5)
> http://www.gluster.org/git-reincarnating-remote-master-branch-so-you-can-
> easily-pr-to-upstream/ is the answer to the direct one. 
> Unfortunately, it looks like the import done by the web developers has
> stripped out all of the previous URL date information.
> 
> What's the best way forward here?

Couple of thoughts come to mind, which are a bit brute-force.

I think then we'd have to have an X:Y mapping of every single source and destination as explicit full paths mapped to full paths.

How many posts are there? We'd need a rewrite rule for each one.

Can the HTTP server handle N 100s of these rewrites? Should seem so, yes?

(It occurs to me that if you did want the full dates in the path again, we could go re-import of data. Then have a set of rewrite rules from the source stripped-of-dates-path to the new path -- rewrite rules coming AND going. Idea is to gracefully transition from the old and an interim-new to a permanent new.)

If we did all the above with 301 codes, could we get the search indexing resolved too?

Comment 7 M. Scherer 2017-09-19 21:46:00 UTC
I suspect a regexp like:

^/\d{4}/\d{2}/(.*)

to:
 
/$1

should suffice for the blog part.

Ideally, the import should have also used the settings of the old blog. 

A quick verification on the internet show that's in Settings->Permalink, cf the doc: https://codex.wordpress.org/Using_Permalinks

I have set a vhost oldblog.gluster.org if people want to take a look at the existing settings there.

Comment 8 Nigel Babu 2017-09-20 07:08:14 UTC
As far as I can see, we have more than a redirection problem. When I visit http://blog.gluster.org/2013/09/git-reincarnating-remote-master-branch-so-you-can-easily-pr-to-upstream/ I'm greeted with a WPEngine error. This is most likely because our domain points to glsuter.wpengine.com, but isn't added to wpengine itself as a domain that it should accept.

Is this intended? Do we plan to have blog.gluster.org added as a domain on the wpengine install and then go to gluster.org/2013/09/git-reincarnating-remote-master-branch-so-you-can-easily-pr-to-upstream/ ?

If this is not intended, should blog.gluster.org/$foo redirect to gluster.org/$foo?

Comment 9 Amye Scavarda 2017-09-25 17:59:27 UTC
Not intentional at all, I've now added blog.gluster.org to the domains that it should accept.

Comment 10 Karsten Wade 2017-09-27 17:17:28 UTC
@Amye -- do you have a test server setup at WPEngine?

It looks as if we should try Misc's regexp from above as a redirection rule, but I only have access to the live instance for www. I think it's a low risk of breaking the site if tested live, but more than zero risk.

Comment 11 Amye Scavarda 2017-09-28 16:25:15 UTC
How would we be able to test redirects like this on the staging server? 
I'm more than happy to make that available, I'm just not sure how it could work.

Comment 12 Karsten Wade 2017-10-03 18:28:44 UTC
I don't know how we can test the redirect on staging not having set it all up, it just seemed like a better idea than testing on production.

That said, we can do was we did previously -- test the regexp on production, and revert if it doesn't work, then return to Michael for more help if there is a problem with his regexp.

Do you want me to run this test? We don't have 'rules of engagement' around the WordPress instance, so I'm not doing anything without coordination first.

Comment 13 Amye Scavarda 2017-10-03 19:14:24 UTC
At this point, it's the biggest issues we have on the website. Karsten, go ahead and test it in the WP site and we'll see if that cuts down the level of 404s we're getting. 
Let's give it 4 hours and review at that time, 4pm Pacific time.

Comment 14 Karsten Wade 2017-10-03 19:46:06 UTC
OK, I put in the regexp and it seems to be working. The test case from above:

https://www.gluster.org/2013/09/git-reincarnating-remote-master-branch-so-you-can-easily-pr-to-upstream/

redirects to:

http://www.gluster.org/git-reincarnating-remote-master-branch-so-you-can-easily-pr-to-upstream/

No 404 error.

Also, nothing else seems to be broken more than it was before. :) I.e. www.gluster.org and the blog articles still work.

*crosses fingers*

Comment 15 Amye Scavarda 2017-10-04 16:31:24 UTC
We seem to have cleaned up our 404s! 
Adding this into the WP nginx configuration since this tests out fine.  
Marking as closed, current release.


Note You need to log in before you can comment on or make changes to this bug.