| | h2. Overview |
| | |
| | Coherence supports transparent read-write caching of any datasource, including databases, web services, packaged applications and filesystems, however, databases are the most common use case. As shorthand "database" will be used to describe any back-end data source. Effective caches must support both intensive read-only *and* read-write operations, and in the case of read-write operations, the cache and database must be kept fully synchronized. To accomplish this, Coherence supports *Read-Through, Write-Through, Refresh-Ahead and Write-Behind* caching. |
| | |
| | {info:title=For use with Partitioned (Distributed) and Near cache topologies} |
| | Read-through/write-through caching (and variants) are intended for use only with the Partitioned (Distributed) cache topology (and by extension, Near cache). Local caches support a subset of this functionality. Replicated and Optimistic caches should not be used. |
| | {info} |
| | |
| | h2. Pluggable Cache Store {anchor:Pluggable Cache Store} |
| | |
| | A {{[#CacheStore]}} is an application-specific adapter used to connect a cache to a underlying datasource. The {{CacheStore}} implementation accesses the datasource via a data access mechanism (e.g., [Hibernate|http://www.hibernate.org], JDO, JDBC, another application, mainframe, another cache, etc.). The {{CacheStore}} understands how to build a Java object using data retrieved from the datasource, map and write an object to the datasource, and erase an object from the datasource. |
| | |
| | Both the datasource connection strategy and the datasource-to-application-object mapping information are specific to the datasource schema, application class layout, and operating environment. Therefore, this mapping information must be provided by the application developer in the form of a {{CacheStore}} implementation. |
| | |
| | h2. Read-Through Caching {anchor:Read-Through Cache} |
| | |
| | When an application asks the cache for an entry, for example the key {{X}}, and {{X}} is not already in the cache, Coherence will automatically delegate to the {{CacheStore}} and ask it to load {{X}} from the underlying datasource. If {{X}} exists in the datasource, the {{CacheStore}} will load it, return it to Coherence, then Coherence will place it in the cache for future use and finally will return {{X}} to the application code that requested it. This is called *Read-Through* caching. [#Refresh-Ahead Cache] functionality may further improve read performance (by reducing perceived latency). |
| | |
| | !read-through.jpg|align=center! |
| | |
| | h2. Write-Through Caching {anchor:Write-Through Cache} |
| | |
| | Coherence can handle updates to the datasource in two distinct ways, the first being *Write-Through*. In this case, when the application updates a piece of data in the cache (i.e. calls {{put(...)}} to change a cache entry,) the operation will not complete (i.e. the put will not return) until Coherence has gone through the {{CacheStore}} and successfully stored the data to the underlying datasource. This does not improve write performance at all, since you are still dealing with the latency of the write to the datasource. Improving the write performance is the purpose for the [#Write-Behind Cache] functionality. |
| | |
| | !write-through.jpg|align=center! |
| | |
| | h2. Refresh-Ahead Caching {anchor:Refresh-Ahead Cache} |
| | |
| | In the *Refresh-Ahead* scenario, Coherence allows a developer to configure the cache to automatically and asynchronously reload (refresh) any recently accessed cache entry from the cache loader prior to its expiration. The result is that once a frequently accessed entry has entered the cache, the application will not feel the impact of a read against a potentially slow cache store when the entry is reloaded due to expiration. The refresh-ahead time is configured as a percentage of the entry's expiration time; for instance, if specified as 0.75, an entry with a one minute expiration time that is accessed within fifteen seconds of its expiration will be scheduled for an asynchronous reload from the cache store. |
| | |
| | h2. Write-Behind Caching {anchor:Write-Behind Cache} |
| | |
| | In the *Write-Behind* scenario, modified cache entries are asynchronously written to the datasource after a configurable delay, whether after 10 seconds, 20 minutes, a day or even a week or longer. For *Write-Behind* caching, Coherence maintains a write-behind queue of the data that needs to be updated in the datasource. When the application updates {{X}} in the cache, {{X}} is added to the write-behind queue (if it isn't there already; otherwise, it is replaced), and after the specified write-behind delay Coherence will call the {{CacheStore}} to update the underlying datasource with the latest state of {{X}}. Note that the write-behind delay is relative to the first of a series of modifications -- in other words, the data in the datasource will never lag behind the cache by more than the write-behind delay. |
| | |
| | The result is a "read-once and write at a configurable interval" (i.e. much less often) scenario. There are four main benefits to this type of architecture: |
| | # The application improves in performance, because the user does not have to wait for data to be written to the underlying datasource. (The data is written later, and by a different execution thread.) |
| | # The application experiences drastically reduced database load: Since the amount of both read and write operations is reduced, so is the database load. The reads are reduced by caching, as with any other caching approach. The writes - which are typically much more expensive operations - are often reduced because multiple changes to the same object within the write-behind interval are "coalesced" and only written once to the underlying datasource ("write-coalescing"). Additionally, writes to multiple cache entries may be combined into a single database transaction ("write-combining") via the {javadoc:com.tangosol.net.cache.CacheStore#storeAll(java.util.Map)|CacheStore.storeAll()} method. |
| | # The application is somewhat insulated from database failures: the *Write-Behind* feature can be configured in such a way that a write failure will result in the object being re-queued for write. If the data that the application is using is in the Coherence cache, the application can continue operation without the database being up. This is easily attainable when using the Coherence Partitioned Cache, which partitions the entire cache across all participating cluster nodes (with local-storage enabled), thus allowing for enormous caches. |
| | # Linear Scalability: For an application to handle more concurrent users you need only increase the number of nodes in the cluster; the effect on the database in terms of load can be tuned by increasing the write-behind interval. |
| | |
| | !write-behind.jpg|align=center! |
| | |
| | h2. Write-Behind Requirements |
| | |
| | While enabling write-behind caching is simply a matter of adjusting one configuration setting, ensuring that write-behind works as expected is more involved. Specifically, application design must address several design issues up-front. |
| | |
| | The most direct implication of write-behind caching is that database updates occur outside of the cache transaction; that is, the cache transaction will (in most cases) complete before the database transaction(s) begin. This implies that the database transactions must never fail; if this cannot be guaranteed, then rollbacks must be accomodated. |
| | |
| | As write-behind may re-order database updates, referential integrity constraints must allow out-of-order updates. Conceptually, this means using the database as ISAM-style storage (primary-key based access with a guarantee of no conflicting updates). If other applications share the database, this introduces a new challenge -- there is no way to guarantee that a write-behind transaction will not conflict with an external update. This implies that write-behind conflicts must be handled heuristically or escalated for manual adjustment by a human operator. |
| | |
| | As a rule of thumb, mapping each cache entry update to a logical database transaction is ideal, as this guarantees the simplest database transactions. |
| | |
| | Because write-behind effectively makes the cache the system-of-record (until the write-behind queue has been written to disk), business regulations must allow cluster-durable (rather than disk-durable) storage of data and transactions. |
| | |
| | In Coherence 3.1 and earlier, rebalancing (due to failover/failback) will result in the re-queueing of all cache entries in the affected cache partitions (typically 1/N where N is the number of servers in the cluster). While the nature of write-behind (asynchronous queueing and load-averaging) minimizes the direct impact of this, for some workloads this can be problematic. These applications should make use of {javadoc:com.tangosol.net.cache.VersionedBackingMap}. In Coherence 3.2, backups will be notified when a modified entry has been successfully written to the datasource, avoiding the need for this strategy. |
| | |
| | h2. Selecting a Cache Strategy |
| | |
| | *Read-Through/Write-Through vs cache-aside* |
| | There are two common approaches to the cache-aside pattern in a clustered environment. One involves checking for a cache miss, then querying the database, populating the cache, and continuing application processing. This can result in multiple database hits if different application threads perform this processing at the same time. Alternatively, applications may perform double-checked locking (which works since the check is atomic with respect to the cache entry). This, however, results in a substantial amount of overhead on a cache miss or a database update (a clustered lock, additional read, and clustered unlock -- up to 10 additional network hops, or 6-8ms on a typical gigabit ethernet connection, plus additional processing overhead and an increase in the "lock duration" for a cache entry). |
| | |
| | By using inline caching, the entry is locked only for the 2 network hops (while the data is copied to the backup server for fault-tolerance). Additionally, the locks are maintained locally on the partition owner. Furthermore, application code is fully managed on the cache server, meaning that only a controlled subset of nodes will directly access the database (resulting in more predictable load and security). Additionally, this decouples cache clients from database logic. |
| | |
| | *Refresh-Ahead vs Read-Through* |
| | Refresh-ahead offers reduced latency compared to read-through, but only if the cache can accurately predict which cache items are likely to be needed in the future. With full accuracy in these predictions, refresh-ahead will offer reduced latency and no added overhead. The higher the rate of misprediction, the greater the impact will be on throughput (as more unnecessary requests will be sent to the database) -- potentially even having a negative impact on latency should the database start to fall behind on request processing. |
| | |
| | *Write-Behind vs Write-Through* |
| | If the requirements for write-behind caching can be satisfied, write-behind caching may deliver considerably higher throughput and reduced latency compared to write-through caching. Additionally write-behind caching lowers the load on the database (fewer writes), and on the cache server (reduced cache value deserialization). |
| | |
| | h2. Idempotency |
| | |
| | All {{CacheStore}} operations should be designed to be idempotent (that is, repeatable without unwanted side-effects). For write-through and write-behind caches, this allows Coherence to provide low-cost fault-tolerance for partial updates by re-trying the database portion of a cache update during failover processing. For write-behind caching, idempotency also allows Coherence to combine multiple cache udpates into a single {{CacheStore}} invocation without affecting data integrity. |
| | |
| | Applications that have a requirement for write-behind caching but which must avoid write-combining (e.g. for auditing reasons), should create a "versioned" cache key (e.g. by combining the natural primary key with a sequence id). |
| | |
| | h2. Write-Through Limitations |
| | |
| | Coherence does not support two-phase {{CacheStore}} operations across multiple {{CacheStore}} instances. In other words, if two cache entries are updated, triggering calls to {{CacheStore}} modules sitting on separate cache servers, it is possible for one database update to succeed and for the other to fail. In this case, it may be preferable to use a cache-aside architecture (updating the cache and database as two separate components of a single transaction) in conjunction with the application server transaction manager. In many cases it is possible to design the database schema to prevent logical commit failures (but obviously not server failures). Write-behind caching avoids this issue as "puts" are not affected by database behavior (and the underlying issues will have been addressed earlier in the design process). This limitation will be addressed in an upcoming release of Coherence. |
| | |
| | h2. Cache Queries |
| | [Cache queries|Querying the Cache] only operate on data stored in the cache and will not trigger the {{CacheStore}} to load any missing (or potentially missing) data. Therefore, applications that query {{CacheStore}}-backed caches should ensure that all necessary data required for the queries has been pre-loaded. For efficiency, most bulk load operations should be done at application startup by streaming the dataset directly from the database into the cache (batching blocks of data into the cache via {javadoc:com.tangosol.util.ConcurrentMap#putAll(java.util.Map)|NamedCache.putAll()}. The loader process will need to use a "[Controllable CacheStore|http://forums.tangosol.com/thread.jspa?messageID=1788]" pattern to disable circular updates back to the database. The {{CacheStore}} may be controlled via Invocation service (sending agents across the cluster to modify a local flag in each JVM) or by setting the value in a Replicated cache (a different cache service) and reading it in every {{CacheStore}} method invocation (minimal overhead compared to the typical database operation). A custom MBean can also be used, a simple task with Coherence's clustered JMX facilities. |
| | |
| | h2. Creating a CacheStore Implementation {anchor:CacheStore} |
| | |
| | {{CacheStore}} implementations are pluggable, and depending on the cache's usage of the datasource you will need to implement one of two interfaces: |
| | * {javadoc:com.tangosol.net.cache.CacheLoader|CacheLoader} for read-only caches |
| | * {javadoc:com.tangosol.net.cache.CacheStore|CacheStore} which extends {{CacheLoader}} to support read-write caches |
| | |
| | These interfaces are located in the {{com.tangosol.net.cache}} package. The {{CacheLoader}} interface has two main methods: {javadoc:com.tangosol.net.cache.CacheLoader#load(java.lang.Object)|load(Object key)} and {javadoc:com.tangosol.net.cache.CacheLoader#loadAll(java.util.Collection)|loadAll(Collection keys)}, and the {{CacheStore}} interface adds the methods {javadoc:com.tangosol.net.cache.CacheStore#store(java.lang.Object,%20java.lang.Object)|store(Object key, Object value)}, {javadoc:com.tangosol.net.cache.CacheStore#storeAll(java.util.Map)|storeAll(Map mapEntries)}, {javadoc:com.tangosol.net.cache.CacheStore#erase(java.lang.Object)|erase(Object key)} and {javadoc:com.tangosol.net.cache.CacheStore#eraseAll(java.util.Collection)|eraseAll(Collection colKeys)}. |
| | |
| | See the section titled [sample cache store|Sample CacheStore] for an example implementation. |
| | |
| | h2. Plugging in a CacheStore Implementation |
| | |
| | To plug in a {{CacheStore}} module, specify the {{CacheStore}} implementation class name within the {{[distributed-scheme]}}/{{[backing-map-scheme|distributed-scheme#backing-map-scheme]}}/{{[read-write-backing-map-scheme]}}/{{[cachestore-scheme]}} cache configuration element. |
| | |
| | The {{[read-write-backing-map-scheme]}} configures a {javadoc:com.tangosol.net.cache.ReadWriteBackingMap}. This backing map is composed of two key elements: an internal map that actually caches the data (see {{[internal-cache-scheme|read-write-backing-map-scheme#internal-cache-scheme]}}), and a {{CacheStore}} module that interacts with the database (see {{[cachestore-scheme]}}). |
| | |
| | {code:xml} |
| | <?xml version="1.0"?> |
| | |
| | <!DOCTYPE cache-config SYSTEM "cache-config.dtd"> |
| | |
| | <cache-config> |
| | <caching-scheme-mapping> |
| | <cache-mapping> |
| | <cache-name>com.company.dto.*</cache-name> |
| | <scheme-name>distributed-rwbm</scheme-name> |
| | </cache-mapping> |
| | </caching-scheme-mapping> |
| | |
| | |
| | <caching-schemes> |
| | <distributed-scheme> |
| | <scheme-name>distributed-rwbm</scheme-name> |
| | <backing-map-scheme> |
| | <read-write-backing-map-scheme> |
| | |
| | <internal-cache-scheme> |
| | <local-scheme/> |
| | </internal-cache-scheme> |
| | |
| | <cachestore-scheme> |
| | <class-scheme> |
| | <class-name>com.company.MyCacheStore</class-name> |
| | <init-params> |
| | <init-param> |
| | <param-type>java.lang.String</param-type> |
| | <param-value>{cache-name}</param-value> |
| | </init-param> |
| | </init-params> |
| | </class-scheme> |
| | </cachestore-scheme> |
| | </read-write-backing-map-scheme> |
| | </backing-map-scheme> |
| | </distributed-scheme> |
| | </caching-schemes> |
| | </cache-config> |
| | {code} |
| | |
| | {info}The {{[init-params|init-params (cache)]}} element contains an ordered list of parameters that will be passed into the {{CacheStore}} constructor. The {{\{cache-name\}}} configuration macro is used to pass the cache name into the {{CacheStore}} implementation, allowing it to be mapped to a database table. For a complete list of available macros, please see the documentation section titled [Parameter Macros]. |
| | |
| | For more detailed information on configuring write-behind and refresh-ahead, please see the {{read-write-backing-map-scheme}} [element documentation|read-write-backing-map-scheme#configuration-elements], taking note of the {{write-batch-factor}}, {{refresh-ahead-factor}}, {{write-requeue-threshold}} and {{rollback-cachestore-failures}} elements.{info} |
| | |
| | {note:title=Thread Count} |
| | | The use of a {{CacheStore}} module will substantially increase the utilization of the cache service threads (even the fastest database select is orders of magnitude slower than updating an in-memory structure). As a result, the cache service [thread count| DistributedCache Service Parameters#thread-count] will need to be increased (typically in the range 10-100). The most noticeable symptom of an insufficient thread pool is increased latency for cache requests (without corresponding behavior in the backing database). |
| | | The use of a {{CacheStore}} module will substantially increase the consumption of cache service threads (even the fastest database select is orders of magnitude slower than updating an in-memory structure). As a result, the cache service [thread count| DistributedCache Service Parameters#thread-count] will need to be increased (typically in the range 10-100). The most noticeable symptom of an insufficient thread pool is increased latency for cache requests (without corresponding behavior in the backing database). |
| | {note} |
| | |
| | h2. Implementation Considerations |
| | Please keep the following in mind when implementing a {{CacheStore}}. |
| | |
| | h3. Re-entrant Calls |
| | The {{CacheStore}} implementation must not call back into the hosting cache service. This includes OR/M solutions that may internally reference Coherence cache services. Note that calling into another cache service instance is allowed, though care should be taken to avoid deeply nested calls (as each call will "consume" a cache service thread and could result in deadlock if a cache service threadpool is exhausted). |
| | |
| | h3. Cache Server Classpath |
| | The classes for cache entries (a.k.a. Value Objects, Data Transfer Objects, etc) must be in the cache server classpath (as the cache server must serialize-deserialize cache entries to interact with the {{CacheStore}} module. |
| | |
| | h3. CacheStore Collection Operations |
| | The {javadoc:com.tangosol.net.cache.CacheStore#storeAll(java.util.Map)|CacheStore.storeAll()} method is most likely to be used if the cache is configured as write-behind and the <write-batch-factor> is configured. The {javadoc:com.tangosol.net.cache.CacheLoader#loadAll(java.util.Collection)|CacheStore.loadAll()} method is not used by Coherence at present. For similar reasons, its first use will likely require refresh-ahead to be enabled. |
| | |
| | h3. Connection Pools |
| | Database connections should be retrieved from the container connection pool (or a 3rd party connection pool) or by using a thread-local lazy-initialization pattern. As dedicated cache servers are often deployed without a managing container, the latter may be the most attractive option (though the cache service thread-pool size should be constrained to avoid excessive simultaneous database connections). |