Cache

By default, the cache system is enabled. However, in order to actually generate cache entries, the following management script needs to be executed:

$ python manage.py publish

This will then generate all pages for which a cache entry exists. By default, all CMS-related pages are cached.

Cached files are being written to public_html/cache/. By default, the path to the public_html folder is set up to be relative to the main installation folder of your application:

../../public_html/

For example, if your application is installed in the folder ~/app/myapp/ then the public_html folder is configured to be ~/public_html/. Therefore the cache is generated in ~/public_html/cache/.

The path to the cache folder can be customised via the CACHE_ROOT settings variable.

Web-server Integration

The idea is this: If there is a corresponding cached file in the cache folder for any given page, then such page should be served by the web-server directly without invoking WSGI, python, Django and Cubane. This is obviously orders of magnitude faster.

A cached file should only be served if all of the items below hold true:

  • The request is a GET or HEAD request.
  • The request has an empty query string.
  • A corresponding cached file exists in the cache folder

If your web-server is apache 2, then the following configuration options may be used to express such logic:

# redirect non-www. to www.
RewriteCond %{HTTP_HOST} !^www.
RewriteRule ^(.*)$ https://www.%{HTTP_HOST}$1 [R=301,L]

# simple GET requests without query strings are cached (if file exists)
RewriteCond %{THE_REQUEST} ^(GET|HEAD)
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{REQUEST_URI} ^([^.]+)$
RewriteCond %{DOCUMENT_ROOT}/cache/%1index.html -f
RewriteRule ^[^.]+$ /cache/%1index.html [QSA,L]

# accessing cache directly is forbidden
RewriteCond %{REQUEST_URI} ^/cache/.*$
RewriteRule ^/cache/.*$ - [F]

The first block redirects to https://www.example.com/ if the request URL does not start with www, e.g. https://example.com (we assume that HTTPS is used).

The second block is the actual URL rewrite based on the conditions we’ve identified above: If the request is a GET or HEAD request with an empty query string and a corresponding cached file exists, then the URL is rewritten to point to that very same (cached) file.

The last block prevents any files to be served directly. E.g. https://www.example.com/cache/index.html should not be served directly unless it was constructed by the redirect rule.

Invalidation

The cache can be invalidated by simply running the following management script:

$ python manage.py invalidate

Also the cache is invalidated whenever any relevant entity is changed by using the backend system.

When invalidating the cache, all cached files are renamed by prefixing the files with a dot character (.). This process ensures that

  • All cached files will no longer match, therefore any incoming request will be dispatched via python, Django and Cubane.
  • The content of all cached files still exist and can be placed back very quickly.

Detecting Changes

When generating the cache, a page may not be generated again if the following conditions are both holding true:

  • The cached file has been invalidated but still exists (prefixed with a dot character).
  • The content might have changed due to an analysis of last modification timestamps.

If the content did not change then the previously generated cache file is simply renamed back to its original file name.

Otherwise, if the content did change, then the old file is replaced with new content that is generated for the entire page.

Before rendering a page, a template context is derived which contains information about model instances such as the current page, navigation items, footer elements and other entities.

Cubane makes the following assumptions:

  • It is safe to materialise all database queries that are provided via the template context prior to rendering the page.
  • A template context only contains information that is relevant to rendering the corresponding page.
  • All relevant model instances have been derived from cubane.models.DateTimeBase or have a timestamp property with the name updated_on indicating the date and time when the last modification of the entity has been made.

Based on such context, cubane will scan through all entities and determines that latest timestamp value it can find of any entity. This may represent a timestamp of one of the following items:

  • The current page
  • Any related page
  • Any item that is presented in any navigation section
  • The general timestamp when the last code deployment has happened.
  • A general timestamp when an item has been deleted the last time.

Based on this data, Cubane will work out if the largest timestamp that has been derived from a template context is before or after the timestamp of the cached file. If the cached file’s timestamp is later then the cached file is restored without having to render the page again. Otherwise the page is rendered again replacing any cached content.

Clear Cache

The entire cache can be removed entirely by running the following management command:

$ python manage.py clearcache

Adding cached entries

Additional cached entries might be added by adding the corresponding pages to the sitemap with the cached argument set to True.

By default, custom entries that are added to the sitemap are not cached. If an entry should be cached then the cached argument of the cubane.cms.views.CustomSitemap.add() or cubane.cms.views.CustomSitemap.add_url() method needs to be set to True.

For example:

from cubane.cms.views import CMS

class MyCMS(CMS):
    def on_custom_sitemap(self, sitemap):
        super(MyCMS, self).on_custom_sitemap(sitemap)
        sitemap.add_url('/my-custom-url/', cached=True)

Then the cache system will generate a cached version of the corresponding page with the URL /my-custom-url/ once the publish management command is executed.

Backend Integration

By default, the CMS system will add a Publish button. The publish button is only visible if the cache has been invalidated. Once the button has been pressed the cache system will generate all cache items and the button will disappear again.

The button is not visible if the cache system has been disabled via the CACHE_ENABLED settings variable.

Sometimes you may not want to have such button at all. For example, you could simply execute the publish command as a cron-job periodically, so that the cache system will always publish automatically after a certain period of time.

This is safe to do since you cannot run the publish command multiple times simultaneously. Running the management script will terminate any script what might be running at the moment and will then continue to execute.

The button can be removed even with the cache system still being activated by setting the settings variable CACHE_PUBLISH_ENABLED to False.