mod_cache – Helicon Tech Blog

Make your websites work faster

ruslan — Thu, 27 Jan 2011 11:25:44 +0000

In April 2010 Google announced, that speed of a web site will be considered as an aspect of web search ranking. This means that web site creators and webmasters need to optimize their websites. What is the best way to do that? What if web developers are out of reach? What if changing the code is not an option? We can offer several simple and fast solutions for improving your web site performance using Helicon Ape. To start with you need to download Helicon Ape and install it on your Windows server. Native Helicon Ape Manager allows you to change your web server setting using .htaccess text files. General simple advices:

1. Cache statics

Caching static content (pictures, css files, javascript files) on the client’s side (in browser) means that having received static file once browser saves it in cache and doesn’t make a request to the server next time the html-document is requested. File will be taken from cache. Both sides win: client sends less requests, web site is working faster and server processes less requests. For instance, ordinary WordPress post page has over a dozen links to the static files (css files, pictures, scripts). Time spent on downloading these files exceeds time spent on downloading the post itself. Once having caching enabled the static content will be downloaded only once. While moving to the next page the only thing that will be downloaded is page itself. All static files will be taken from cache. In order to make browser cache static content, http-response must contain specific headers: Expires and Cache-Control. Those headers are set by mod_expires and mod_headers modules. For enabling caching, create .htacces file with the following content inside the static folder:

ExpiresActive On
Header set Cache-Control public
ExpiresByType image/.+  "access 15 days"
ExpiresByType text/css  "access 5 days"
ExpiresByType application/x-javascript "access 5 days"
ExpiresByType application/javascript "access 5 days"

In case there’s no such directory for static content and files are spread across folders of web site, than if you create following .htacces in the root of the site it will cache all static content on the web site by file extension:


ExpiresActive On
Header set Cache-Control public
ExpiresByType image/.+  "access 15 days"
ExpiresByType text/css  "access 5 days"
ExpiresByType application/x-javascript "access 5 days"
ExpiresByType application/javascript "access 5 days"

This configuration makes server send http-responses to clients with information that pictures are to be cached for 15 days and scripts and css-files for 5 days.

2. Compress responses on the run

In order to save some time on loading the content, you can compress it. All modern browsers are able to receive comressed gzip-traffic. Text files (html-files, css-files, scripts, json-data) can be easily compressed and allow you to save 20-90% of traffic. Same time, music and video files can hardly be compressed as they have already be sized with special codecs. Here’s an example of enabling gzip-compression. Add the following line in .htaccess in the root of web site:

SetEnvIf (mime text/.*) or (mime application/x-javascript) gzip=9

As you can see, this configuration is quite simple. It’s enough to have all text documents (html, css files) and javascript-files compressed before going to the client’s side. It is worth saying, that server compresses responses only for those browsers, that support compressing. Browser informs server about its features through the headers of html-request.

3. Cache dynamic responses at server side

Often large amount of requests, addressed to database server, hinder the web site performance. For example, blog’s main page shows recent entries, recent comments, navigation menu, category list and tags. Those are several complicated requests to database. In case that information does not change often or the relevance is not vital, html-responses need to be cached without hesitation. You can choose to cache the blog’s main page once in 5-10 minutes. But that would be enough to improve main page performance in browser. Practically, application developer must decide what pages need to be cached and for how long. Also he needs to bring into life caching mechanism “out of the box” . Unfortunatelly, that doesn’t happen most of the time. Likely, mod_cache in Helicon Ape will simply and easily allow you to enable caching at server side. mod_cache supports two types of cache: disk cache and memory cache. First type saves caches data on the drive, and the second one does on memory. Memory caching is more preferable. If your server doesn’t have enough RAM, use disk cache. For example, to cache site’s homepage, we need to add the following lines in .htaccess in the root:

Header set Cache-Control public,max-age=600
SetEnvIf request_uri ^/$ cache-enable=mem

This configuration enforces caching of site’s homepage request for 10 min (600sec). Response are cached in memory. Be careful! You need to enable caching carefully. For example, pages that need authentificaton mustn’t be cached as they contain private data and need to provide different information for different users. In any cases, caching must be taking application logic into account. We’ve reviewed three simple steps for increasing the speed of your web site. Besides tangible speed-boost, which you will notice at once, the acceleration must well enhance your rating in search engine results. You can see performance graph of www.helicontech.com made using Google Webmaster tools after a simple optimization. So equip your site with these tricks and enjoy dual benefit!

Automatic cache cleaning in Helicon Ape

ruslan — Wed, 12 May 2010 10:30:00 +0000

Up to version 3.0.0.39 Helicon Ape cache could be cleaned only manually. Memory-based cache (mod_mem_cache) required recycling of corresponding application and disk-based cache (mod_disk_cache) cleaning implied manual files deletion form the file system. That was slightly inconvenient, to put it mildly.

Helicon Ape 3.0.0.39 (and newer) got support of cache cleaning upon request.
Now, to clean cache just set the cache-clear environment variable. This will cause cache cleaning corresponding to the current request (context).

Setting cache-clear variable using SetEnvIf directive provides flexibility of conditions that will lead to cache cleaning.

For example, to trigger cache cleaning by requesting specific URL:

SetEnvIf request_uri ^/system/cache/clear/$ cache-clear=1

To trigger cache cleaning by appending specific query string value:

SetEnvIf Query_String clear_cache_request cache-clear=1

For security purposes it may be necessary to clean cache only from definite IP address:

SetEnvIf (Query_String clear_cache_request) and (Remote_Addr 11\.22\.33\.44) cache-clear=1

Cache cleaning may take some time, so be ready.

When cache cleaning is over, the error.log (with info or higher verbosity level) will display the number of deleted records.

Best regards,
Ruslan, Anton – Helicon Tech Team

Example of mod_cache application

ruslan — Wed, 21 Jan 2009 15:23:00 +0000

In the previous articles we told you what cache is and how it works in Helicon Ape. Now it’s time to use obtained knowledge in practice. Today we gonna apply caching for PHP application called qdig that helps organize images web-gallery. Read how to register PHP on IIS7 in our article about WordPress.

Creating online photo album

Let’s create photos folder in site root and fill it with our photos. Now we are downloading qdig. To make it simpler we’ll extract only one index.php file and put it into the same directory.

The gallery is already working: http://localhost/photos/index.php

Measuring performance

To measure request rate we’ll use ab.exe application:

ab.exe -n 200 -c 2 "http://localhost/photos/index.php?Qwd=.&Qif=DSC00410.JPG&Qiv=name&Qis=M"

The result is a bit more than 16 requests per second.

Switching on mod_cache and mod_expires

To enable necessary modules, let’s uncomment the following lines in Helicon Ape httpd.conf file:

LoadModule expires_module    modules/mod_expires.so
LoadModule cache_module      modules/mod_cache.so

Analyzing cached request

To make mod_cache cache not all requests but only unique ones, let’s figure out what qdig request parameters mean and how request uniqueness depends on them:

Qwd – folder where image files reside – AFFECTS request uniqueness;
Qif – file name – AFFECTS request uniqueness;
Qiv – mode of file names representation – AFFECTS request uniqueness;
Qis – image size – DOESN’T AFFECT request uniqueness;
Qtmp – representation mode – DOESN’T AFFECT request uniqueness;

Thus, cache key will use only Qwd, Qif and Qiv parameters.
The piece of config for mod_cache will look like:


  CacheEnable mem
  CacheVaryByParams Qwd Qif Qiv

Expiration time

index.php script does not set Cache-Control and Expires headers, but, as we already know, they are really important for successful caching. So we’ll set these headers by ourselves. And for that purpose we’ll use mod_expires functionality:

ExpiresActive On
ExpiresByType text/html "access 1 hour"

Above directives set expiration time to 1 hour.
The resulting .htaccess is as follows:

Measuring performance once again

ab.exe -n 200 -c 2 "http://localhost/photos/index.php?Qwd=.&Qif=DSC00410.JPG&Qiv=name&Qis=M"

And now the result is about 94 requests per second!

That’s all you need to do to achieve sixfold performance growth.
This example clearly demonstrates the ease and efficiency of Helicon Ape caching feature.

How mod_cache works?

ruslan — Mon, 12 Jan 2009 16:12:00 +0000

Helicon Ape release (coming very-very soon) will contain mod_cache module. And as we promised in our previous article we are now giving you more thorough description of mod_cache operation.

mod_cache starts working

After authentication/authorization events but prior to request handler execution mod_cache comes out on the scene.At this stage the module performs the following:

checks whether it’s possible to use cached response for the current request
if yes, generates a key and searches cached response using this key
if the response is found in cache, the module gives it back to the client and request processing is over — request handler is not invoked.

Cacheable or not cacheable: request check

Response may be cached if request meets the following requirements:

request method is GET
request does not contain Authorization header
Cache-Control request header must not be no-cache. This condition is ignored if CacheIgnoreCacheControl On is used
Pragma request header must not be no-cache. This condition is ignored if CacheIgnoreCacheControl On is used

mod_cache attempts to save response

When request handler has completed its job and all defined filters have been applied to response, mod_cache starts to operate. At this stage the module performs the following:

estimates the capability of response caching
checks if CacheEnable is set for this request
generates cache key
defines the period of time to store response in cache (absolute expiration time)
saves response in cache according to the key

Cacheable or not cacheable: response check

The following conditions are considered when deciding whether response is cacheable (all must be met at a time):

request method is GET
response status is 200 (200, 203, 300, 301 or 410 in Apache)
Expires response header contains valid “future” date
responses containing expiration time (i.e. Expires or Cache-Control: max-age=XX headers), Etag header or Last-Modified header. This condition is ignored if CacheIgnoreNoLastMod is used
- if request has a QueryString, only those responses containing expiration time are cached (i.e. Expires or Cache-Control: max-age=XX headers). This condition is ignored if CacheIgnoreQueryString On is used
Cache-Control request header must not be no-cache. This condition is ignored if CacheStoreNoStore On is used
Cache-Control request header must not be private. This condition is ignored if CacheStorePrivate On is used
request does not contain Authorization header (for Apache: if Cache-Control contains s-maxage, must-revalidate or public)
Vary response header does not contain “*”.

Cache key generation

Response is saved in cache according to the key. This key includes:

normalized (canonical) request URI without QueryString or, in case of proxy request, normalized proxy request URL;
all QueryString parameters and their values in alphabetical order (default behavior)
- CacheIgnoreQueryString On directive cancels addition of request parameters to the cache key
- CacheVaryByParams param1 param2 ... directive defines parameters to be included into cache key
all request headers specified in CacheVaryByHeaders header1 header2 ... directive. Headers are not included to the cache key by default.
If response contains Vary header, all request headers specified in it are included into cache key.

When cached response dies

HTTP response is stored in cache for a specific period of time that is computed in the following way:

If response contains Expires header and its value is valid and does not refer to the past, cached response will be stored till the time specified in it.
If response contains Cache-Control header with either max-age=X or s-maxage=X, cached response will be stored in cache for X seconds.
If response contains Last-Modified header, cached response will be stored in cache until:expiry date = date + min((date – lastmod) * factor, maxexpire),where date – current date,lastmod – value of Last-Modified header,factor – float value set via CacheLastModifiedFactor directive (default value = 0,1),maxexpire – value set via CacheMaxExpire directive (default value = 86400 seconds = 1 day).
If mod_cache was unable to calculate expiration date using one of aforementioned methods (this is possible if response doesn’t have Expires, Cache-Control, Last-Modified headers BUT has Etag header), it (date) is equated to default value of 1 hour that may be reset using CacheDefaultExpire directive.

This load of text might look a little unclear for you at a glance, but in reality this is a well-composed and highly efficient scheme. And our upcoming article will convince you in this.

Web Caching: what is it?

ruslan — Fri, 09 Jan 2009 14:20:00 +0000

What is that and what’s it for?

Web cache is a vital instrument to build lightning-fast web apps. Web cache stores HTTP responses that may be provided to the user without making a request to the server, i.e. no ASP/PHP scripts execution and database queries are necessary. And that’s cool!
Web-caching allows to substantially reduce response time — time the server needs to give the response — as reading from cache is much faster than processing request with PHP handler.
Web-caching minimizes traffic — if one uses intermediate caches (gateway or proxy cache), request won’t reach the origin server — response will be given back by an intermediate caching server.

Cache breeds

Server cache

This cache works on the origin server. Applications and server itself use it to store parts of responses (e.g. web pages) or complete responses. Server cache may be used on application (e.g. memcached + php or HttpRuntime.Cache + ASP.NET) or HTTP server level (e.g. mod_cache in Apache, OutputCache in IIS7).

Proxy cache

It lives between clients and origin servers and may only store public representations that do not require authorization (unlike private representations). Proxy cache is widely used by providers to reduce traffic.

Browser cache

It lives in browser and is capable of storing private data. Browser cache is used for example for Back button operation.

How does Server Cache work?

Cacheless configuration

Cacheless configuration forces server to process each incoming request and generate new response even if the same resource is requested several times running. That is senseless time- and resources-consuming operation that puts excessive load on the server.

First request to cache-enabled server

When the specific resource is requested from the server for the first time caching system checks if it’s possible to cache the response, then it looks for response in cache and fails to find it. Request moves further along the server pipeline triggering necessary handlers and filters. When the response is ready caching system saves it to cache before sending to the client.

Subsequent requests to cache-enabled server

Upon further requests to this resource caching system checks if it’s possible to cache the response, then it looks for response in cache and this time finds it! Then the response is retrieved from cache and sent to the client. And that’s it! No server handlers and filters are executed.
Responses are stored in cache for a certain period of time. When this time elapses cached response is labeled as not valid and is removed from cache. Next request to that same resource is processed as if it is requested from the server for the first time (see “First request to cache-enabled server’).

Conclusion

As you could see, Server Cache favors lower server load and faster response time. In the next article concerning cache we’ll give more thorough explanation of this process and illustrate it with examples.