Validation with rewritemap

ISAPI_Rewrite is Apache mod_rewrite compatible URL rewriter for Microsoft IIS
User avatar
Posts: 92
Joined: 01 Dec 2012, 14:22

Validation with rewritemap

03 Jun 2013, 16:29

Hi,

I have a site where there are a number of dynamic pages that will be served based on a "magic" number.

There are two types of these: category numbers, and individual numbers.

This works fine: the problem I have is when there is an invalid "magic" number specified.

If the number is not valid, we return a page that's similar to a 404, but it isn't actually a 404.

E.g. for category pages:

http://www.marqueehireguide.com/marquee-hire-c21.html - this is a real, static file.
Whereas http://www.marqueehireguide.com/marquee-hire-in-bedfordshire-c36.html is a dynamically-created page.
And http://www.marqueehireguide.com/marquee-hire-in-narnia-c9999.html is an invalid page.

For various reasons, there are some old links to pages that don't exist any longer.
Google is punishing the site because there are several pages that have very similar content.

What we'd like to happen is:

1) If there is an actual file that matches the URL, just serve it.
2) If not, check to see if the "magic" number is a valid one (using a rewritemap, I'm guessing).
3) If neither of these work, allow it to drop through to a 404 page.

What I'm having trouble understanding is how to achieve the middle one- only rewrite the page if it's one of the allowed values.

Is it possible to just have a file that has e.g.

c34 valid
c36 valid
c38 valid
c41 valid

... and so on.

Also, how to have a rule that says: "If the file exists, just serve it and don't check further"?

User avatar
Posts: 1264
Joined: 07 Mar 2012, 10:16

Re: Validation with rewritemap

04 Jun 2013, 14:39

Hello,

Thanks for such a descriptive and clear explanation. It's a rare thing these days.
Since the principle "if the file exists - server it" is defined by default, lets focus on the other one: - "if file does not exist AND NOT located in a mapfile".

LEts try:
Code: Select all
#define mapfile:
RewriteMap mapfile txt:mapfile.txt [NC]

# not a real file
RewriteCond %{REQUEST_FILENAME} !-f
# and not a real directory
RewriteCond %{REQUEST_FILENAME} !-d
# if the result of the mapfile lookup is NOT FOUND
RewriteCond ${mapfile:$1|NOT_FOUND} NOT_FOUND
# show 410 GONE status (can be changed to F - Forbodden 403)
RewriteRule ^.+-([a-z]{1}\d+)\.html$ - [G]


The pattern is created to find a magic number that consists of 1 letter and some digits afterwards. This magic number is always located before ".html".

Regards
Andrew

User avatar
Posts: 92
Joined: 01 Dec 2012, 14:22

Re: Validation with rewritemap

05 Jun 2013, 11:45

Thank you for your kind comments. :) I do try to ask *useful* questions- my bible in this regard is Eric Raymond's article on the subject: http://www.catb.org/esr/faqs/smart-questions.html


What I *think* we want to do is to rewrite the URL to "not-currently-listed.html", with a 301, but that's detail. :)

OK- most of the info about RewriteMaps I found in the Apache docs- where it said that RewriteMaps have to be defined globally and not in a .htaccess file; does this restriction not apply to Isapi_Rewrite?

When you say "if the file exists, serve it" is defined by default- that's assuming no other rule touches it, of course? So there's no need for an explicit rule, as long as there is no over-enthusiastic catch-all?

With the -c{number} forms, there are a few historic files which we want to rewrite to newer forms, or to the root; at the moment, I have a raft of individual rules for this.
Would I be better with a rewrite-map for this purpose, too? I.e. is a rewrite map with {oldurl} {newurl} more efficient than say a couple of dozen specific rules?

Thanks for your help on this.

User avatar
Posts: 1264
Joined: 07 Mar 2012, 10:16

Re: Validation with rewritemap

05 Jun 2013, 15:01

OK- most of the info about RewriteMaps I found in the Apache docs- where it said that RewriteMaps have to be defined globally and not in a .htaccess file; does this restriction not apply to Isapi_Rewrite?

In ISAPI_Rewrite we don't have hese limitations. You can define it in .htaccess, the main thing is to locate mapfile in the same folder with the config file, where you define this mapfile.

When you say "if the file exists, serve it" is defined by default- that's assuming no other rule touches it, of course? So there's no need for an explicit rule, as long as there is no over-enthusiastic catch-all?

that is correct.

With the -c{number} forms, there are a few historic files which we want to rewrite to newer forms, or to the root; at the moment, I have a raft of individual rules for this.
Would I be better with a rewrite-map for this purpose, too? I.e. is a rewrite map with {oldurl} {newurl} more efficient than say a couple of dozen specific rules?

That's up to you. In any case you'll have 2 columns, in the rule provided you'll have 2 identical columns. HOwever, you may edit your rule and instead of [F] or [G] use 301-redirect and redirect each and every to a specific location.
OR
you can create several mapfiles. One that redirects, lets say, 20 URLs to the root. If a URL doesn't match this mapfile, it goes to an identical rule, but with a difference mapfile, where you redirect a certain portion to a different location.

Regards
Andrew

User avatar
Posts: 92
Joined: 01 Dec 2012, 14:22

Re: Validation with rewritemap

05 Jun 2013, 22:09

If I understand your code, then the mapfile needs to contain two copies of the full "filename"?

Can I turn this on its head, and have:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond ${mapfile:$1|NOT_FOUND} !NOT_FOUND
RewriteRule .*-c([0-9]+)\.html$ mqh/default.asp?category=mar-runs&service=make-page&magic=$1&oqs=$& [L,NC,QSA,NS]

... and let anything that *doesn't* match just drop through to 404?

Secondly, is it possible to "map" just the "magic" value and not the whole URL? Just for ease of maintenance.

Finally - does Isapi_Rewrite automatically notice that a mapfile has changed, or do we have to do something to reload the map?

User avatar
Posts: 1264
Joined: 07 Mar 2012, 10:16

Re: Validation with rewritemap

06 Jun 2013, 01:05

Hey,

Your code looks okay and makes sense.

Secondly, is it possible to "map" just the "magic" value and not the whole URL? Just for ease of maintenance.

That what we did. This pattern ".*-c([0-9]+)\.html" matches everything after 'c' and before '.html'

Finally - does Isapi_Rewrite automatically notice that a mapfile has changed, or do we have to do something to reload the map?

Yes, it does. However, we had cases, when it was generated automatically from an external source and the workload at the live server was high. Customers would request URLs before the new least was being generated and it resulted in 404. So the workaround was to write a jave script that doesn't replace old mapfile until a new one is completed.

Regards
Andrew

User avatar
Posts: 92
Joined: 01 Dec 2012, 14:22

Re: Validation with rewritemap

06 Jun 2013, 05:51

HeliconAndrew wrote:Hey,

Your code looks okay and makes sense.

Secondly, is it possible to "map" just the "magic" value and not the whole URL? Just for ease of maintenance.

That what we did. This pattern ".*-c([0-9]+)\.html" matches everything after 'c' and before '.html'

Finally - does Isapi_Rewrite automatically notice that a mapfile has changed, or do we have to do something to reload the map?

Yes, it does. However, we had cases, when it was generated automatically from an external source and the workload at the live server was high. Customers would request URLs before the new least was being generated and it resulted in 404. So the workaround was to write a jave script that doesn't replace old mapfile until a new one is completed.

Regards
Andrew


OK, I'm missing something here. That's the rule in the RewriteRule, but the RewriteCond is showing the whole URL (REQUEST_URL) as the entity to be tested for mapping. I'm asking can we extract and check just the magic number for the test, or does it have to be entire URLs?

Or am I totally off the point, here?

User avatar
Posts: 92
Joined: 01 Dec 2012, 14:22

Re: Validation with rewritemap

06 Jun 2013, 05:58

HeliconAndrew wrote:Hey,

Your code looks okay and makes sense.

Secondly, is it possible to "map" just the "magic" value and not the whole URL? Just for ease of maintenance.

That what we did. This pattern ".*-c([0-9]+)\.html" matches everything after 'c' and before '.html'

Finally - does Isapi_Rewrite automatically notice that a mapfile has changed, or do we have to do something to reload the map?

Yes, it does. However, we had cases, when it was generated automatically from an external source and the workload at the live server was high. Customers would request URLs before the new least was being generated and it resulted in 404. So the workaround was to write a jave script that doesn't replace old mapfile until a new one is completed.



OK- though I don't think that will be an issue- it's just that when a new advertiser is created, we'd want to add their info to the advertiser (i-type URLs) map. But that's a handful a day, at best, and there won't be existing links around to create 404s. The problem is more when one is removed, but even then you'd be very unlucky to have one hit in the short time before the new map "takes". The "soft 404s" are basically caused because someone bookmarks an advertiser, or Google caches a link, and then they don't renew, or change their name or something.

The advertiser links are where we'd want to fire any that don't match to a 301 rewrite to "not-currently-listed.html" or something similar.

The category map (c-type URLs) would change very infrequently.

User avatar
Posts: 1264
Joined: 07 Mar 2012, 10:16

Re: Validation with rewritemap

06 Jun 2013, 09:51

OK, I'm missing something here. That's the rule in the RewriteRule, but the RewriteCond is showing the whole URL (REQUEST_URL) as the entity to be tested for mapping. I'm asking can we extract and check just the magic number for the test, or does it have to be entire URLs?


I don't see RewriteCond with {REQUEST_URL} being tested for mapping.

Thanks for your clarification, but I think the rule should work. The only thing that I'd suggest now is a dding another rule, 'match-all'-rule that would throw "not-currently-listed.html" in case nothing was found.

Regards
Andrew

User avatar
Posts: 92
Joined: 01 Dec 2012, 14:22

Re: Validation with rewritemap

10 Jun 2013, 14:39

HeliconAndrew wrote:
OK, I'm missing something here. That's the rule in the RewriteRule, but the RewriteCond is showing the whole URL (REQUEST_URL) as the entity to be tested for mapping. I'm asking can we extract and check just the magic number for the test, or does it have to be entire URLs?


I don't see RewriteCond with {REQUEST_URL} being tested for mapping.

Thanks for your clarification, but I think the rule should work. The only thing that I'd suggest now is a dding another rule, 'match-all'-rule that would throw "not-currently-listed.html" in case nothing was found.

Regards
Andrew


The RewriteCond before the RewriteRule references the entire REQUEST_URL; I was wondering if it's possible to validate using just the "magic" part of the URL (i.e. the C12345 bit). If not, I can make it write out the whole URL.

Also, if you're validating against the URL, but then using a RewriteRule that doesn't reference the RewriteMap directly, does it matter what's in the second column?

Or can it just consist of a list like:

category-name-c12345.html OK
another-category-c23456.html OK

and so on?

User avatar
Posts: 1264
Joined: 07 Mar 2012, 10:16

Re: Validation with rewritemap

11 Jun 2013, 20:03

The RewriteCond before the RewriteRule references the entire REQUEST_URL; I was wondering if it's possible to validate using just the "magic" part of the URL (i.e. the C12345 bit). If not, I can make it write out the whole URL.


If you simply want to use the pattern in a condition, not in the rule, there's no difference. If you are concerned about the performance and you want to catch the magic number one line of code earlier, than I should tell you that the rule is matched first and the conditions are matched ONLY IF the rule is matched.

Also, if you're validating against the URL, but then using a RewriteRule that doesn't reference the RewriteMap directly, does it matter what's in the second column?
Or can it just consist of a list like:
Code: Select all
category-name-c12345.html OK
another-category-c23456.html OK


Correct. It doesn't matter what is there, but there should be something. ISAPI_Rewrite parses the string and there should be 2 elements in a row.

Regards
Andrew

User avatar
Posts: 92
Joined: 01 Dec 2012, 14:22

Re: Validation with rewritemap

18 Jun 2013, 12:17

Thanks Andrew, you've been very helpful.

I've got the level-1 validation working.

If an advertiser changes their name, they keep the same "magic" number but the URL is different. The way that the active page mapping works, without the validation, the old version will still work (e.g. if BigMarquees changes its name to HugeMarquees, then both
big-marquees-i123456.html and huge-marquees-i123456.html will provide the same information.

With the validation, it will cause a 404.

So, I want to add in a second mapfile of renames.

I'm assuming that what I want to do is:

RewriteMap i-renames txt:i-renames.txt [NC]

Where the mapfile contains:

old-url new-url

e.g.

big-marquees-i123456.html huge-marquees-i123456.html


RewriteCond %(REQUEST_FILE) -f
RewriteCond $(i-renames:$1|NOT_FOUND) !NOT_FOUND
RewriteRule .% $(i-renames:$1) [L,NC,QSA,NS,R=301]

- is that all I need to have a rewrite map for any pair of URLs?

User avatar
Posts: 92
Joined: 01 Dec 2012, 14:22

Re: Validation with rewritemap

18 Jun 2013, 14:36

Arghh!

I tried this:

RewriteMap magic-i txt:magic-i.txt [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond ${magic-i:$1|NOT_FOUND} !NOT_FOUND
RewriteRule (.*)-(i[0-9]+)\.html$ /mqt/default.asp?category=mar-runs&service=MakeAdPage&magic=$2&oqs=$1 [L,NC,QSA,NS]

The magic-i.txt file contains essentially the URL and then either "active" or "trial"

Top section:

buchannan-marquees-i610301.html Active
buster-marquees-i610864.html Active
devon-marquee-co-i608934.html Active

and so on.

I'm *sure* I had this working, but now it's stopped- it blocks *all* such URLs, instead of just invalid ones.

Similarly, for the c-type URLs, I had:

magic-c.txt

Marquee-hire-in-hampshire-c32.html 32
Marquee-hire-in-dorset-c34.html 34
Marquee-hire-in-bedfordshire-c36.html 36

etc.

And..

# magic category rules..
RewriteMap magic-c txt:magic-c.txt [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .*-c([0-9]+)\.html$ mqh/default.asp?category=mar-runs&service=make-page&magic=$1&oqs=$& [L,NC,QSA,NS]

- which also blocks *all* URLs, not just invalid ones.

What am I missing, here?

You can see the test site: http://mqh.mapper.info

User avatar
Posts: 1264
Joined: 07 Mar 2012, 10:16

Re: Validation with rewritemap

18 Jun 2013, 22:13

Hmmm... is it currently working? I see it is.
Even if I add some '1111' in the middle of the URL.

Regards
Andrew
Attachments
Screen Shot 2013-06-18 at 8.13.10 PM.png
Screen Shot 2013-06-18 at 8.13.10 PM.png (397.43 KiB) Viewed 4947 times

User avatar
Posts: 92
Joined: 01 Dec 2012, 14:22

Re: Validation with rewritemap

19 Jun 2013, 06:33

HeliconAndrew wrote:Hmmm... is it currently working? I see it is.
Even if I add some '1111' in the middle of the URL.

Regards
Andrew


It's working because I have the rewrite map references commented out- otherwise, all such links fail!

I can't see why- I don't want to post the entire .htaccess file on here, but I could send it to you. Or you could grab it as helicon.zip, along with the mapfiles.

The code looks right, but the rewriterule is never activated.

User avatar
Posts: 92
Joined: 01 Dec 2012, 14:22

Re: Validation with rewritemap

19 Jun 2013, 06:37

HeliconAndrew wrote:Hmmm... is it currently working? I see it is.
Even if I add some '1111' in the middle of the URL.

Regards
Andrew


That's exactly what we want to stop!

We want to pin down the rewrite to only "approved" URLs, and treat any variants as a 404.
We also want to be able to add a line to a file when a rename is done, so that a changed URL will be 301 rewritten- I thought this would also be best done with a mapfile, though I'm not sure what form the rules should take, given a mapfile of [old-url] [new-url] (I've added a sample i-changes.txt to the zipfile).

User avatar
Posts: 92
Joined: 01 Dec 2012, 14:22

Re: Validation with rewritemap

19 Jun 2013, 07:26

Your primer on rewritemaps shows:

Code: Select all
# Set a variable ("map") to access map.txt from config
RewriteMap map txt:map.txt

# Use tolower function to convert string to lowercase
RewriteMap lower int:tolower

# Get requested file name
RewriteCond %{REQUEST_URI} ^/([^/.]+)\.html$ [NC]

# Seek file name in map-file
RewriteCond ${map:${lower:%1}|NOT_FOUND} !NOT_FOUND

# Perform rewriting if the record was found in map-file
RewriteRule .? /index.php?q=${map:${lower:%1}} [NC,L]


I'm not sure what that first condition is testing - mine is testing {REQUEST_FILE} to ensure it's not a real file, if it is, it should be served.
I'm assuming that the two "columns" in the mapfile are simply whitespace-separated?

User avatar
Posts: 1264
Joined: 07 Mar 2012, 10:16

Re: Validation with rewritemap

19 Jun 2013, 08:50

I suggest we continue working on this via email or helpdesk. In such way we'd be able to share logging and rules without any problems.

I'm not sure what that first condition is testing - mine is testing {REQUEST_FILE} to ensure it's not a real file, if it is, it should be served.

If a pattern is a rule is ambiguously defined(e.g. (.*) or similar) than a condition like that simple specifies that we're working only on URLs that don't physically exist. Usually I'd refer to that as SEO-friendly. For example, "/producs/our-best-products-ordered-by-range" doesn't exist as a real file. YOu can easily avoid this condition with %{REQUEST_FILENAME}, but in this case ALL urls will go through the rule and will address mapfile.

I'm assuming that the two "columns" in the mapfile are simply whitespace-separated?

Correct. It can be 2,3 or 6 spaces, if needed for formatting purposes.

Return to ISAPI_Rewrite 3.0

Who is online

Users browsing this forum: No registered users and 3 guests