caching – Helicon Tech Blog

Make your websites work faster

ruslan — Thu, 27 Jan 2011 11:25:44 +0000

In April 2010 Google announced, that speed of a web site will be considered as an aspect of web search ranking. This means that web site creators and webmasters need to optimize their websites. What is the best way to do that? What if web developers are out of reach? What if changing the code is not an option? We can offer several simple and fast solutions for improving your web site performance using Helicon Ape. To start with you need to download Helicon Ape and install it on your Windows server. Native Helicon Ape Manager allows you to change your web server setting using .htaccess text files. General simple advices:

1. Cache statics

Caching static content (pictures, css files, javascript files) on the client’s side (in browser) means that having received static file once browser saves it in cache and doesn’t make a request to the server next time the html-document is requested. File will be taken from cache. Both sides win: client sends less requests, web site is working faster and server processes less requests. For instance, ordinary WordPress post page has over a dozen links to the static files (css files, pictures, scripts). Time spent on downloading these files exceeds time spent on downloading the post itself. Once having caching enabled the static content will be downloaded only once. While moving to the next page the only thing that will be downloaded is page itself. All static files will be taken from cache. In order to make browser cache static content, http-response must contain specific headers: Expires and Cache-Control. Those headers are set by mod_expires and mod_headers modules. For enabling caching, create .htacces file with the following content inside the static folder:

ExpiresActive On
Header set Cache-Control public
ExpiresByType image/.+  "access 15 days"
ExpiresByType text/css  "access 5 days"
ExpiresByType application/x-javascript "access 5 days"
ExpiresByType application/javascript "access 5 days"

In case there’s no such directory for static content and files are spread across folders of web site, than if you create following .htacces in the root of the site it will cache all static content on the web site by file extension:


ExpiresActive On
Header set Cache-Control public
ExpiresByType image/.+  "access 15 days"
ExpiresByType text/css  "access 5 days"
ExpiresByType application/x-javascript "access 5 days"
ExpiresByType application/javascript "access 5 days"

This configuration makes server send http-responses to clients with information that pictures are to be cached for 15 days and scripts and css-files for 5 days.

2. Compress responses on the run

In order to save some time on loading the content, you can compress it. All modern browsers are able to receive comressed gzip-traffic. Text files (html-files, css-files, scripts, json-data) can be easily compressed and allow you to save 20-90% of traffic. Same time, music and video files can hardly be compressed as they have already be sized with special codecs. Here’s an example of enabling gzip-compression. Add the following line in .htaccess in the root of web site:

SetEnvIf (mime text/.*) or (mime application/x-javascript) gzip=9

As you can see, this configuration is quite simple. It’s enough to have all text documents (html, css files) and javascript-files compressed before going to the client’s side. It is worth saying, that server compresses responses only for those browsers, that support compressing. Browser informs server about its features through the headers of html-request.

3. Cache dynamic responses at server side

Often large amount of requests, addressed to database server, hinder the web site performance. For example, blog’s main page shows recent entries, recent comments, navigation menu, category list and tags. Those are several complicated requests to database. In case that information does not change often or the relevance is not vital, html-responses need to be cached without hesitation. You can choose to cache the blog’s main page once in 5-10 minutes. But that would be enough to improve main page performance in browser. Practically, application developer must decide what pages need to be cached and for how long. Also he needs to bring into life caching mechanism “out of the box” . Unfortunatelly, that doesn’t happen most of the time. Likely, mod_cache in Helicon Ape will simply and easily allow you to enable caching at server side. mod_cache supports two types of cache: disk cache and memory cache. First type saves caches data on the drive, and the second one does on memory. Memory caching is more preferable. If your server doesn’t have enough RAM, use disk cache. For example, to cache site’s homepage, we need to add the following lines in .htaccess in the root:

Header set Cache-Control public,max-age=600
SetEnvIf request_uri ^/$ cache-enable=mem

This configuration enforces caching of site’s homepage request for 10 min (600sec). Response are cached in memory. Be careful! You need to enable caching carefully. For example, pages that need authentificaton mustn’t be cached as they contain private data and need to provide different information for different users. In any cases, caching must be taking application logic into account. We’ve reviewed three simple steps for increasing the speed of your web site. Besides tangible speed-boost, which you will notice at once, the acceleration must well enhance your rating in search engine results. You can see performance graph of www.helicontech.com made using Google Webmaster tools after a simple optimization. So equip your site with these tricks and enjoy dual benefit!

Disk-based caching for IIS7

ruslan — Fri, 03 Jul 2009 14:02:00 +0000

We have already shed some light on the question of web caching and its benefits in our «Web Caching: what is it?» article. We’ve also explained how web caching is realized in Helicon Ape in «How mod_cache works?» article. Back then mod_cache could only store requests in the memory, but now, after Helicon Ape 1.2.0.21 release, there’s mod_disk_cache module that enables cashing to the hard drive.

Enabling mod_disk_cache

To make mod_disk_cache working one should perform the following simple steps:

Create disk cache folder. Say, c:\inetpub\cache.
Grant Read & Write permisions for that folder to users running Application Pools on your IIS. By default it’s IIS_IUSRS group.
Point to that folder in httpd.conf using CacheRoot directive:
```
CacheRoot c:\inetpub\cache
```
Enable caching for requests to /app/ for example. To do that, specify in httpd.conf:
```
CacheEnable disk /app/
```
or in .htaccess inside /app/ folder:
```
CacheEnable disk
```

That’s the minimum configuration needed to have something cached. Now all requests containing expiration time (e.g. Cache-Control header) will be cashed on disk.

mod_disk_cache saves cached requests into hierarchical folder structure inside CacheRoot. Length of names and levels of these folders are defined by CacheDirLength and CacheDirLevels directives. Caching gives out even better effect when used together with mod_gzip module which compresses response before caching and sending it to the client.

Pros and cons of disk caching

The tests we’ve conducted showed that the speed of mem-based and disk-based cache is roughly equal.

The main advantage of disk cache is that cached data is stored on disk and does not depend on applications recycling, IIS and hardware reset, in contrast to memory cache that is stored until the first application recycling or IIS reset.

The shortage of disk cache lies in absense of intrenal recycling mechanism for the expired records that are not used any more. But that’s not that critical:) The workaround may be: configure sheduled recycling of all cache once a day, i.e. remove all subfolders (or aged records only) and their content from CacheRoot.

New caching functionality in Helicon Ape 1.2.0.21

ruslan — Tue, 23 Jun 2009 13:59:00 +0000

Helicon Ape is in the limelight again. This time it’s boasting two new increadibly important and useful modules – mod_disk_cache and mod_mem_cache extending it’s caching capabilities to the next level.

These modules allow to cache dynamically generated content on the hard disk or in RAM respectively.
The article covering all aspects of new caching modules is coming soon.

We are really grateful to our clients who help us discover unobvious bugs which we attempt to fix in the shortest terms possible.

P.S. The next module to be introduced is mod_linkfreeze implementing LinkFreeze features for Windows Server 2008 bypassing ISAPI limitations.

Stay with us!

HeliconTech Team.

Introduction to mod_gzip

ruslan — Thu, 22 Jan 2009 15:12:00 +0000

Every day millions of web-servers around the world receive billions of bytes of network traffic. Each year the speed of Internet connections increases. Hosting providers offer perfect tariffs. It seems the mankind is going to forget about traffic saving problem and sink it to oblivion. But even with HDD volume growth the users still haven’t forgotten about archivers. The same thought can be applied to web traffic. You can say: “I have 10 Mbit unlimited connection and the problem with traffic saving isn’t mine”. Yeah, 10 Mbit is very well. But what will you say if you get to know that it is possible to save more than 60% of the traffic? First of all lots of users have less than 10 Mbit connection. Indeed the growing popularity of mobile devices selects a main role for the traffic saving. A lot of PDA, cellphone and smartphone users would say ‘thank you’ if your web-server is saving their traffic and money. To sum up I’d like to say that traffic saving is timely and important process in modern web-server technologies.

Until recently HeliconTech had one specialized solution for content compression – HeliconJet. We have decided to include its functionality to our new product – Helicon Ape , accounting for its importance. So far as Ape stands for APache Emulation, it’s very important not to invent new syntax nd directives but use existing Apache assets.

There are 2 popular compression modules – conventional mod_deflate and mod_gzip. The last one is written by third party developer and is not supplied with Apache. We have decided to implement both modules because users are using them to the equal degree. At the moment only basic
mod_gzip functionality is realized but we are planing to extend it in the nearest future. Technically Ape will have one compression module which will be able to support both mod_gzip and mod_deflate syntaxes. Our primary goal is to give you an ability to easily use existing Apache configuration without any changes.

Let’s have a look at basic content compression principles and mod_gzip operation. This module applies GZIP format which uses Deflate compression algorithm. The module is based on .NET version of the popular library ZLib. Please note, Helicon Ape is written in managed code only!

Web-client (browser) exchanges technical information (so-called HTTP headers) with web-server. These headers contain important information helping client and server get mutual understanding. Client can point to accessible data type and needed content. Taking into account client abilities the server prepares and sends the content. After that technical information helps client understand what to do with the server response.

But we are not gonna dive deep into HTTP protocol subtleties as there are tons of info on this topic in the Internet. Lets recur to mod_gzip . General scheme of its operation is given below:

As you can see not only server takes part in considering whether to compress content or not. It is easy to understand ’cause if browser isn’t capable of uncompressing GZIP, then all mod_gzip operation will be senseless and the user will get rubbish. Web-client must send Accept-Encoding header with gzip, x-gzip or deflate value to let mod_gzip know whether the client supports compression.

In its turn, if the module makes a decision to compress content, it sets Content-Encoding: gzip header to inform the client that GZIP uncompression must be used. So, each chain on the scheme above plays
important role.

But to better understand mod_gzip logic, please have a look at this flowchart:

The sequence is used by mod_gzip to make compress/not compress decision. We’ll now give a brief explanatin of each stage:

When request comes to the server mod_gzip (if it’s ON) can start its “dirty”
work.
Firstly, the module defines whether the content is already compressed. If it is, mod_gzip
leaves things as is.
If it’s not, the module analyses request headers sent by the client. mod_gzip
can move on only if there’s Accept-Encoding header with gzip, x-gzip
or deflate value.
On the next step the module performs check set by
specific directives inside configuration files. Based on results of these
checks decision about content compressino is made.
If it’s necessary to use GZIP, the module will SET Content-Encoding: gzip header, ’cause
otherwise the client may fail to process server response correctly.
Besides, there’s a special Vary header in which mod_gzip specifies what its actions depend on
(Vary: Accept-Encoding). This header is used for caching, so it’s detailed description will appear in the
upcoming articles.

It’s possible that in next versions will have slightly different logic, but we’ll surely inform you about that.

Resume

This article is just a brief introduction to Helicon Ape mod_gzip module.
We are thinking of writing much more material on that and other topics to help you use our little agile monkey (Ape) easily and efficiently.

Best wishes,
HeliconTech Team

Example of mod_cache application

ruslan — Wed, 21 Jan 2009 15:23:00 +0000

In the previous articles we told you what cache is and how it works in Helicon Ape. Now it’s time to use obtained knowledge in practice. Today we gonna apply caching for PHP application called qdig that helps organize images web-gallery. Read how to register PHP on IIS7 in our article about WordPress.

Creating online photo album

Let’s create photos folder in site root and fill it with our photos. Now we are downloading qdig. To make it simpler we’ll extract only one index.php file and put it into the same directory.

The gallery is already working: http://localhost/photos/index.php

Measuring performance

To measure request rate we’ll use ab.exe application:

ab.exe -n 200 -c 2 "http://localhost/photos/index.php?Qwd=.&Qif=DSC00410.JPG&Qiv=name&Qis=M"

The result is a bit more than 16 requests per second.

Switching on mod_cache and mod_expires

To enable necessary modules, let’s uncomment the following lines in Helicon Ape httpd.conf file:

LoadModule expires_module    modules/mod_expires.so
LoadModule cache_module      modules/mod_cache.so

Analyzing cached request

To make mod_cache cache not all requests but only unique ones, let’s figure out what qdig request parameters mean and how request uniqueness depends on them:

Qwd – folder where image files reside – AFFECTS request uniqueness;
Qif – file name – AFFECTS request uniqueness;
Qiv – mode of file names representation – AFFECTS request uniqueness;
Qis – image size – DOESN’T AFFECT request uniqueness;
Qtmp – representation mode – DOESN’T AFFECT request uniqueness;

Thus, cache key will use only Qwd, Qif and Qiv parameters.
The piece of config for mod_cache will look like:


  CacheEnable mem
  CacheVaryByParams Qwd Qif Qiv

Expiration time

index.php script does not set Cache-Control and Expires headers, but, as we already know, they are really important for successful caching. So we’ll set these headers by ourselves. And for that purpose we’ll use mod_expires functionality:

ExpiresActive On
ExpiresByType text/html "access 1 hour"

Above directives set expiration time to 1 hour.
The resulting .htaccess is as follows:

Measuring performance once again

ab.exe -n 200 -c 2 "http://localhost/photos/index.php?Qwd=.&Qif=DSC00410.JPG&Qiv=name&Qis=M"

And now the result is about 94 requests per second!

That’s all you need to do to achieve sixfold performance growth.
This example clearly demonstrates the ease and efficiency of Helicon Ape caching feature.

How mod_cache works?

ruslan — Mon, 12 Jan 2009 16:12:00 +0000

Helicon Ape release (coming very-very soon) will contain mod_cache module. And as we promised in our previous article we are now giving you more thorough description of mod_cache operation.

mod_cache starts working

After authentication/authorization events but prior to request handler execution mod_cache comes out on the scene.At this stage the module performs the following:

checks whether it’s possible to use cached response for the current request
if yes, generates a key and searches cached response using this key
if the response is found in cache, the module gives it back to the client and request processing is over — request handler is not invoked.

Cacheable or not cacheable: request check

Response may be cached if request meets the following requirements:

request method is GET
request does not contain Authorization header
Cache-Control request header must not be no-cache. This condition is ignored if CacheIgnoreCacheControl On is used
Pragma request header must not be no-cache. This condition is ignored if CacheIgnoreCacheControl On is used

mod_cache attempts to save response

When request handler has completed its job and all defined filters have been applied to response, mod_cache starts to operate. At this stage the module performs the following:

estimates the capability of response caching
checks if CacheEnable is set for this request
generates cache key
defines the period of time to store response in cache (absolute expiration time)
saves response in cache according to the key

Cacheable or not cacheable: response check

The following conditions are considered when deciding whether response is cacheable (all must be met at a time):

request method is GET
response status is 200 (200, 203, 300, 301 or 410 in Apache)
Expires response header contains valid “future” date
responses containing expiration time (i.e. Expires or Cache-Control: max-age=XX headers), Etag header or Last-Modified header. This condition is ignored if CacheIgnoreNoLastMod is used
- if request has a QueryString, only those responses containing expiration time are cached (i.e. Expires or Cache-Control: max-age=XX headers). This condition is ignored if CacheIgnoreQueryString On is used
Cache-Control request header must not be no-cache. This condition is ignored if CacheStoreNoStore On is used
Cache-Control request header must not be private. This condition is ignored if CacheStorePrivate On is used
request does not contain Authorization header (for Apache: if Cache-Control contains s-maxage, must-revalidate or public)
Vary response header does not contain “*”.

Cache key generation

Response is saved in cache according to the key. This key includes:

normalized (canonical) request URI without QueryString or, in case of proxy request, normalized proxy request URL;
all QueryString parameters and their values in alphabetical order (default behavior)
- CacheIgnoreQueryString On directive cancels addition of request parameters to the cache key
- CacheVaryByParams param1 param2 ... directive defines parameters to be included into cache key
all request headers specified in CacheVaryByHeaders header1 header2 ... directive. Headers are not included to the cache key by default.
If response contains Vary header, all request headers specified in it are included into cache key.

When cached response dies

HTTP response is stored in cache for a specific period of time that is computed in the following way:

If response contains Expires header and its value is valid and does not refer to the past, cached response will be stored till the time specified in it.
If response contains Cache-Control header with either max-age=X or s-maxage=X, cached response will be stored in cache for X seconds.
If response contains Last-Modified header, cached response will be stored in cache until:expiry date = date + min((date – lastmod) * factor, maxexpire),where date – current date,lastmod – value of Last-Modified header,factor – float value set via CacheLastModifiedFactor directive (default value = 0,1),maxexpire – value set via CacheMaxExpire directive (default value = 86400 seconds = 1 day).
If mod_cache was unable to calculate expiration date using one of aforementioned methods (this is possible if response doesn’t have Expires, Cache-Control, Last-Modified headers BUT has Etag header), it (date) is equated to default value of 1 hour that may be reset using CacheDefaultExpire directive.

This load of text might look a little unclear for you at a glance, but in reality this is a well-composed and highly efficient scheme. And our upcoming article will convince you in this.

Web Caching: what is it?

ruslan — Fri, 09 Jan 2009 14:20:00 +0000

What is that and what’s it for?

Web cache is a vital instrument to build lightning-fast web apps. Web cache stores HTTP responses that may be provided to the user without making a request to the server, i.e. no ASP/PHP scripts execution and database queries are necessary. And that’s cool!
Web-caching allows to substantially reduce response time — time the server needs to give the response — as reading from cache is much faster than processing request with PHP handler.
Web-caching minimizes traffic — if one uses intermediate caches (gateway or proxy cache), request won’t reach the origin server — response will be given back by an intermediate caching server.

Cache breeds

Server cache

This cache works on the origin server. Applications and server itself use it to store parts of responses (e.g. web pages) or complete responses. Server cache may be used on application (e.g. memcached + php or HttpRuntime.Cache + ASP.NET) or HTTP server level (e.g. mod_cache in Apache, OutputCache in IIS7).

Proxy cache

It lives between clients and origin servers and may only store public representations that do not require authorization (unlike private representations). Proxy cache is widely used by providers to reduce traffic.

Browser cache

It lives in browser and is capable of storing private data. Browser cache is used for example for Back button operation.

How does Server Cache work?

Cacheless configuration

Cacheless configuration forces server to process each incoming request and generate new response even if the same resource is requested several times running. That is senseless time- and resources-consuming operation that puts excessive load on the server.

First request to cache-enabled server

When the specific resource is requested from the server for the first time caching system checks if it’s possible to cache the response, then it looks for response in cache and fails to find it. Request moves further along the server pipeline triggering necessary handlers and filters. When the response is ready caching system saves it to cache before sending to the client.

Subsequent requests to cache-enabled server

Upon further requests to this resource caching system checks if it’s possible to cache the response, then it looks for response in cache and this time finds it! Then the response is retrieved from cache and sent to the client. And that’s it! No server handlers and filters are executed.
Responses are stored in cache for a certain period of time. When this time elapses cached response is labeled as not valid and is removed from cache. Next request to that same resource is processed as if it is requested from the server for the first time (see “First request to cache-enabled server’).

Conclusion

As you could see, Server Cache favors lower server load and faster response time. In the next article concerning cache we’ll give more thorough explanation of this process and illustrate it with examples.