Caching static content (pictures, css files, javascript files) on the client’s side (in browser) means that having received static file once browser saves it in cache and doesn’t make a request to the server next time the html-document is requested. File will be taken from cache. Both sides win: client sends less requests, web site is working faster and server processes less requests. For instance, ordinary WordPress post page has over a dozen links to the static files (css files, pictures, scripts). Time spent on downloading these files exceeds time spent on downloading the post itself. Once having caching enabled the static content will be downloaded only once. While moving to the next page the only thing that will be downloaded is page itself. All static files will be taken from cache. In order to make browser cache static content, http-response must contain specific headers: Expires and Cache-Control. Those headers are set by mod_expires and mod_headers modules. For enabling caching, create .htacces file with the following content inside the static folder:
ExpiresActive On
Header set Cache-Control public
ExpiresByType image/.+ "access 15 days"
ExpiresByType text/css "access 5 days"
ExpiresByType application/x-javascript "access 5 days"
ExpiresByType application/javascript "access 5 days"
In case there’s no such directory for static content and files are spread across folders of web site, than if you create following .htacces in the root of the site it will cache all static content on the web site by file extension:
<Files ~ \.(gif|png|jpg|css|js)>
ExpiresActive On
Header set Cache-Control public
ExpiresByType image/.+ "access 15 days"
ExpiresByType text/css "access 5 days"
ExpiresByType application/x-javascript "access 5 days"
ExpiresByType application/javascript "access 5 days"
</Files>
This configuration makes server send http-responses to clients with information that pictures are to be cached for 15 days and scripts and css-files for 5 days.
In order to save some time on loading the content, you can compress it. All modern browsers are able to receive comressed gzip-traffic. Text files (html-files, css-files, scripts, json-data) can be easily compressed and allow you to save 20-90% of traffic. Same time, music and video files can hardly be compressed as they have already be sized with special codecs. Here’s an example of enabling gzip-compression. Add the following line in .htaccess in the root of web site:
SetEnvIf (mime text/.*) or (mime application/x-javascript) gzip=9
As you can see, this configuration is quite simple. It’s enough to have all text documents (html, css files) and javascript-files compressed before going to the client’s side. It is worth saying, that server compresses responses only for those browsers, that support compressing. Browser informs server about its features through the headers of html-request.
Often large amount of requests, addressed to database server, hinder the web site performance. For example, blog’s main page shows recent entries, recent comments, navigation menu, category list and tags. Those are several complicated requests to database. In case that information does not change often or the relevance is not vital, html-responses need to be cached without hesitation. You can choose to cache the blog’s main page once in 5-10 minutes. But that would be enough to improve main page performance in browser. Practically, application developer must decide what pages need to be cached and for how long. Also he needs to bring into life caching mechanism “out of the box” . Unfortunatelly, that doesn’t happen most of the time. Likely, mod_cache in Helicon Ape will simply and easily allow you to enable caching at server side. mod_cache supports two types of cache: disk cache and memory cache. First type saves caches data on the drive, and the second one does on memory. Memory caching is more preferable. If your server doesn’t have enough RAM, use disk cache. For example, to cache site’s homepage, we need to add the following lines in .htaccess in the root:
Header set Cache-Control public,max-age=600
SetEnvIf request_uri ^/$ cache-enable=mem
This configuration enforces caching of site’s homepage request for 10 min (600sec). Response are cached in memory. Be careful! You need to enable caching carefully. For example, pages that need authentificaton mustn’t be cached as they contain private data and need to provide different information for different users. In any cases, caching must be taking application logic into account. We’ve reviewed three simple steps for increasing the speed of your web site. Besides tangible speed-boost, which you will notice at once, the acceleration must well enhance your rating in search engine results. You can see performance graph of www.helicontech.com made using Google Webmaster tools after a simple optimization. So equip your site with these tricks and enjoy dual benefit!
To make mod_disk_cache working one should perform the following simple steps:
c:\inetpub\cache
.CacheRoot
directive:
CacheRoot c:\inetpub\cache
/app/
for example. To do that, specify in httpd.conf:
CacheEnable disk /app/
or in .htaccess inside /app/ folder:
CacheEnable disk
That’s the minimum configuration needed to have something cached. Now all requests containing expiration time (e.g. Cache-Control header) will be cashed on disk.
mod_disk_cache saves cached requests into hierarchical folder structure inside CacheRoot. Length of names and levels of these folders are defined by CacheDirLength
and CacheDirLevels
directives. Caching gives out even better effect when used together with mod_gzip module which compresses response before caching and sending it to the client.
The tests we’ve conducted showed that the speed of mem-based and disk-based cache is roughly equal.
The main advantage of disk cache is that cached data is stored on disk and does not depend on applications recycling, IIS and hardware reset, in contrast to memory cache that is stored until the first application recycling or IIS reset.
The shortage of disk cache lies in absense of intrenal recycling mechanism for the expired records that are not used any more. But that’s not that critical:) The workaround may be: configure sheduled recycling of all cache once a day, i.e. remove all subfolders (or aged records only) and their content from CacheRoot
.
These modules allow to cache dynamically generated content on the hard disk or in RAM respectively.
The article covering all aspects of new caching modules is coming soon.
We are really grateful to our clients who help us discover unobvious bugs which we attempt to fix in the shortest terms possible.
P.S. The next module to be introduced is mod_linkfreeze implementing LinkFreeze features for Windows Server 2008 bypassing ISAPI limitations.
Stay with us!
HeliconTech Team.
]]>Until recently HeliconTech had one specialized solution for content compression – HeliconJet. We have decided to include its functionality to our new product – Helicon Ape , accounting for its importance. So far as Ape stands for APache Emulation, it’s very important not to invent new syntax nd directives but use existing Apache assets.
There are 2 popular compression modules – conventional mod_deflate and mod_gzip. The last one is written by third party developer and is not supplied with Apache. We have decided to implement both modules because users are using them to the equal degree. At the moment only basic
mod_gzip functionality is realized but we are planing to extend it in the nearest future. Technically Ape will have one compression module which will be able to support both mod_gzip and mod_deflate syntaxes. Our primary goal is to give you an ability to easily use existing Apache configuration without any changes.
Let’s have a look at basic content compression principles and mod_gzip operation. This module applies GZIP format which uses Deflate compression algorithm. The module is based on .NET version of the popular library ZLib. Please note, Helicon Ape is written in managed code only!
Web-client (browser) exchanges technical information (so-called HTTP headers) with web-server. These headers contain important information helping client and server get mutual understanding. Client can point to accessible data type and needed content. Taking into account client abilities the server prepares and sends the content. After that technical information helps client understand what to do with the server response.
But we are not gonna dive deep into HTTP protocol subtleties as there are tons of info on this topic in the Internet. Lets recur to mod_gzip . General scheme of its operation is given below:
As you can see not only server takes part in considering whether to compress content or not. It is easy to understand ’cause if browser isn’t capable of uncompressing GZIP, then all mod_gzip operation will be senseless and the user will get rubbish. Web-client must send Accept-Encoding
header with gzip
, x-gzip
or deflate
value to let mod_gzip know whether the client supports compression.
In its turn, if the module makes a decision to compress content, it sets Content-Encoding: gzip
header to inform the client that GZIP uncompression must be used. So, each chain on the scheme above plays
important role.
But to better understand mod_gzip logic, please have a look at this flowchart:
The sequence is used by mod_gzip to make compress/not compress decision. We’ll now give a brief explanatin of each stage:
Accept-Encoding
header with gzip
, x-gzip
deflate
value.Content-Encoding
: gzip
header, ’causeVary
header in which mod_gzip specifies what its actions depend onVary: Accept-Encoding
). This header is used for caching, so it’s detailed description will appear in theIt’s possible that in next versions will have slightly different logic, but we’ll surely inform you about that.
This article is just a brief introduction to Helicon Ape mod_gzip module.
We are thinking of writing much more material on that and other topics to help you use our little agile monkey (Ape) easily and efficiently.
Best wishes,
HeliconTech Team
Let’s create photos folder in site root and fill it with our photos. Now we are downloading qdig. To make it simpler we’ll extract only one index.php
file and put it into the same directory.
The gallery is already working: http://localhost/photos/index.php
To measure request rate we’ll use ab.exe application:
ab.exe -n 200 -c 2 "http://localhost/photos/index.php?Qwd=.&Qif=DSC00410.JPG&Qiv=name&Qis=M"
The result is a bit more than 16 requests per second.
To enable necessary modules, let’s uncomment the following lines in Helicon Ape httpd.conf
file:
LoadModule expires_module modules/mod_expires.so
LoadModule cache_module modules/mod_cache.so
To make mod_cache cache not all requests but only unique ones, let’s figure out what qdig request parameters mean and how request uniqueness depends on them:
Thus, cache key will use only Qwd, Qif and Qiv parameters.
The piece of config for mod_cache will look like:
<Files index.php>
CacheEnable mem
CacheVaryByParams Qwd Qif Qiv
</Files>
index.php
script does not set Cache-Control and Expires headers, but, as we already know, they are really important for successful caching. So we’ll set these headers by ourselves. And for that purpose we’ll use mod_expires functionality:
ExpiresActive On
ExpiresByType text/html "access 1 hour"
Above directives set expiration time to 1 hour.
The resulting .htaccess is as follows:
ab.exe -n 200 -c 2 "http://localhost/photos/index.php?Qwd=.&Qif=DSC00410.JPG&Qiv=name&Qis=M"
And now the result is about 94 requests per second!
That’s all you need to do to achieve sixfold performance growth.
This example clearly demonstrates the ease and efficiency of Helicon Ape caching feature.
mod_cache
operation.
After authentication/authorization events but prior to request handler execution mod_cache
comes out on the scene.At this stage the module performs the following:
Response may be cached if request meets the following requirements:
Cache-Control
request header must not be no-cache
. This condition is ignored if CacheIgnoreCacheControl On
is usedPragma
request header must not be no-cache
. This condition is ignored if CacheIgnoreCacheControl On
is usedWhen request handler has completed its job and all defined filters have been applied to response, mod_cache
starts to operate. At this stage the module performs the following:
CacheEnable
is set for this requestThe following conditions are considered when deciding whether response is cacheable (all must be met at a time):
Expires
response header contains valid “future” dateExpires
or Cache-Control: max-age=XX
headers), Etag
header or Last-Modified
header. This condition is ignored if CacheIgnoreNoLastMod
is used
Expires
or Cache-Control: max-age=XX
headers). This condition is ignored if CacheIgnoreQueryString On
is usedCache-Control
request header must not be no-cache
. This condition is ignored if CacheStoreNoStore On
is usedCache-Control
request header must not be private
. This condition is ignored if CacheStorePrivate On
is usedAuthorization
header (for Apache: if Cache-Control
contains s-maxage
, must-revalidate
or public
)Vary
response header does not contain “*”.Response is saved in cache according to the key. This key includes:
CacheIgnoreQueryString On
directive cancels addition of request parameters to the cache keyCacheVaryByParams param1 param2 ...
directive defines parameters to be included into cache keyCacheVaryByHeaders header1 header2 ...
directive. Headers are not included to the cache key by default.Vary
header, all request headers specified in it are included into cache key.HTTP response is stored in cache for a specific period of time that is computed in the following way:
Expires
header and its value is valid and does not refer to the past, cached response will be stored till the time specified in it.Cache-Control
header with either max-age=X
or s-maxage=X
, cached response will be stored in cache for X seconds.Last-Modified
header, cached response will be stored in cache until:expiry date = date + min((date – lastmod) * factor, maxexpire),where date – current date,lastmod – value of Last-Modified
header,factor – float value set via CacheLastModifiedFactor
directive (default value = 0,1),maxexpire – value set via CacheMaxExpire
directive (default value = 86400 seconds = 1 day).mod_cache
was unable to calculate expiration date using one of aforementioned methods (this is possible if response doesn’t have Expires
, Cache-Control
, Last-Modified
headers BUT has Etag
header), it (date) is equated to default value of 1 hour that may be reset using CacheDefaultExpire
directive.This load of text might look a little unclear for you at a glance, but in reality this is a well-composed and highly efficient scheme. And our upcoming article will convince you in this.
]]>Web cache is a vital instrument to build lightning-fast web apps. Web cache stores HTTP responses that may be provided to the user without making a request to the server, i.e. no ASP/PHP scripts execution and database queries are necessary. And that’s cool!
Web-caching allows to substantially reduce response time — time the server needs to give the response — as reading from cache is much faster than processing request with PHP handler.
Web-caching minimizes traffic — if one uses intermediate caches (gateway or proxy cache), request won’t reach the origin server — response will be given back by an intermediate caching server.
This cache works on the origin server. Applications and server itself use it to store parts of responses (e.g. web pages) or complete responses. Server cache may be used on application (e.g. memcached + php or HttpRuntime.Cache + ASP.NET) or HTTP server level (e.g. mod_cache in Apache, OutputCache in IIS7).
It lives between clients and origin servers and may only store public representations that do not require authorization (unlike private representations). Proxy cache is widely used by providers to reduce traffic.
It lives in browser and is capable of storing private data. Browser cache is used for example for Back button operation.
Cacheless configuration forces server to process each incoming request and generate new response even if the same resource is requested several times running. That is senseless time- and resources-consuming operation that puts excessive load on the server.
When the specific resource is requested from the server for the first time caching system checks if it’s possible to cache the response, then it looks for response in cache and fails to find it. Request moves further along the server pipeline triggering necessary handlers and filters. When the response is ready caching system saves it to cache before sending to the client.
Upon further requests to this resource caching system checks if it’s possible to cache the response, then it looks for response in cache and this time finds it! Then the response is retrieved from cache and sent to the client. And that’s it! No server handlers and filters are executed.
Responses are stored in cache for a certain period of time. When this time elapses cached response is labeled as not valid and is removed from cache. Next request to that same resource is processed as if it is requested from the server for the first time (see “First request to cache-enabled server’).
As you could see, Server Cache favors lower server load and faster response time. In the next article concerning cache we’ll give more thorough explanation of this process and illustrate it with examples.
]]>