Invalidation

Summary

NOTE: This proposal has been superseded by Limited Invalidation.

Standard HTTP caching is well suited for specifying a retention policy over static data. Dynamic or user modified data as is found in a typical opensocial application forces a difficult tradeoff between update interval and cache retention. It is often the case that longer cache lifetimes appear buggy to the user because of inconsistent data or frozen updates after edits.

Several containers have expressed an interesting in caching data from applications to improve page rending time. There are also new features proposed for the 0.9 spec, such as data pipelining, that enable server-side retreival and processing to assemble a view. Combining these two technologies will allow containers to rapidly return pre-composed pages instead of waiting on a developer's machine.

Developers implementing the protocol should find that their application have greater prominance tanks to precomposition and lower operating costs. However, it is entirely optional, so a beginner developer doesn't need to implement it on the first day and may rely on HTTP caching as a simpler alternative.

Invalidation is implemented at the HTTP level and can be applied to any resource retreived by a container.

Discussion Thread

Terminology

  • Resource - the result of resolving a URL, and typically a unit of caching
  • Key - A set identifier for a resource; resource can have many keys and keys may apply to many resources
  • Invalidation - An instruction to expire all resources associated with a key
  • Expire - A cached document that is no longer fresh (by HTTP or because of a key invalidation)
  • Container - the OpenSocial container that acts as the principal or proxy for requests to the developer machine
  • Cache - proxy and storage for resource retreival and endpoint for receiving invalidation
  • Endpoint - URL for invalidating key sets
  • Origin - developer owned machine containing the resources

Resource States

  • UNCACHED - a resource is not present in the cache and it will be retreived the next time it's requested. The cache may decided to drop files at any time.
  • UNEXPIRED - a resource is in cache and known not to be expired, it will be returned from the cache next time it is requested.
  • EXPIRED - a resource is in cache, but is thought to be expired. On the next request, the cache will revalidate or retreive from the origin.

Origin States

  • NONE - there's no known origin, there should be no resources from this origin in cache
  • ACTIVE - an identifier for the origin is known and it's still within the ttl
  • EXPIRED - an identifier for a object is known, but it is outside the ttl. All resource from this origin are considered EXPIRED.
  • TERMINATED - the origin is known to changed identifiers, all keys associated with this origin are INVALID and all resources are EXPIRED.

Key States

  • VALID - the key is known to be valid.
  • INVALID - the key is known to be invalid and all resources associated with the key are EXPIRED.

Scope

  1. Discovering an invalidation endpoint
  2. Establishing subscriptions
  3. Assigning keys to resources
  4. Invalidating keys
  5. Garbage collecting subscriptions

Reference

A cache relationship is negotiated through HTTP headers sent with each request and returned with each response. The origin is then required to notify the cache which keys are invalid and by extension which documents are invalid.

Invalidate-Endpoint request header

The Invalidate-Endpoint header is sent when the cache supports invalidation-based expiration for the requested resource. When this header is not sent, standard HTTP cache negotiation applies.

Invalidate-Endpoint = "Invalidate-Endpoint" ":" absoluteURI

Example:

Invalidate-Endpoint: http://apps.yahooapis.com/invalidate/77f8s8-

Invalidate response header

A resource that can be cached will contain one of more Invalidate response header.

Invalidate = "Invalidate" ":" 0#(invalidate-directive)

invalidate-directive = 
    "id" "=" <"> token <">
  | "ttl" "=" 1*DIGIT
  | "keys" "=" <"> token *(LWS token) <">

id

Origin assigned identifier for the cache endpoint. The cache MUST use this identifier to determine if keys stored in it's database are valid until invalidated by the origin. If a cache observes a change in this value, the origin is TERMINATED and the cache MUST assume that all keys are invalid. An unspecified id is a unique value, a transition between an unspecified id and a specified id of any value (including an empty string) MUST behave exactly like a change in the id. A change in id must never validate keys or unexpire resources.

ttl

Lifetime for the endpoint relationship in seconds. This value SHOULD correlate to the maximum retry for delivering an invalidation notification. If no key has become VALID or INVALID during the ttl interval, the cache MUST consider all resources with keys from the origin to be EXPIRED. To prevent this expiry, the cache MAY spontaneously request a document to test the connection. At the end of the ttl interval, the origin MAY safely discard all information associated with the endpoint. If unspecified, the default value is "172800" (2 days).

keys

Space delimited keys assigned to the resource. Invalidating any of these keys MUST expire the associated resource even if other cache control directives would assign a longer lifetime. The cache MUST include default keys to every resource, and the cache or container MAY apply further keys. There is no mechanism within the invalidation protocol to transmit default assigned keys, an origin MAY infer those keys from other aspects of the request.

The example:

Invalidate:

MUST be interpretted as null ID, ttl=172800, keys="<default keys>". When this header is completely absent, the resource will be stored according to ordinary cache control rules.

A typical example that issues a complete directive to establish an identifier for the invalidate endpoint, set a lifetime of 4 days and assign keys to the resource.

Invalidate: id="1", ttl=345600, keys="user1 still+more+keys they%60re+URI+encoded"

Cache Assigned Keys

The cache always assigns the following keys to a resource:

uriPath - /opensocial/view.php?opensocial_ownerid=myspace.com:55566827

from the request line HTTP message

host - stickerapp.com

from the Host HTTP header

invalidate endpoint - http://myspaceapis.com/invalidate/62663

from the Invalidate-Endpoint HTTP header

An OpenSocial request includes the additional keys:

gadgetUrl - http://myapp.com/gadget.xml

applied to all resources that are not user-specific but are referenced from the gadget file: message bundles, template libraries. The expected use of this key is to support publishing resource changes.

opensocial id - google.com:gga8880ah

The opensocial id used to make an authenticated request. Supports a simple case which will invalidate all views associated with a particular user.

Invalidation Request

Keys are invalidated by HTTP POST to the invalidate endpoint. The post entity MUST be content type text/plain and contain a whitespace delimited set of keys that are now invalid.

Example:

POST /invalidate/myapp.com HTTP/1.1
Content-Type: text/plain
Content-Length: 37

friend1 top10 eggs
jellybeans peters

The response entity is not used. HTTP status codes should be interpreted as normal, with the following guidance:

2xx

The invalidate should be considered successful and no further attempts to deliver will be made.

3xx

Redirect to the new URL and try again, if the code is a permenant redirect update any databases that support endpoint storage. If the redirect cannot be resolved (like an authentication request), the origin SHOULD treat the response as a permenant error.

4xx

A permenant error, forget everything about this endpoint. An origin MAY choose to retry later but it SHOULD make fewer attempts than if a temporary error code is returned. A 410 GONE SHOULD be interpretted as an immediate termination of subscriptions.

5xx

A temporary error, retry delivery of invalidate at a later time. Repeated errors should eventually lead to canceling the subscription.

POSTs to the invalidation endpoint MUST require a two-legged OAuth signature.