This is an example implementation of Push Replication pattern.
The Push Replication Pattern advocates that;
Operations (such as insert, update and delete) occurring on Data as one Site should be pushed using one or more Publishers to associated Destinations.
Where;
A Publisher is responsible for optimistically reproducing the said Operations (typically in the order in which they originally occurred) at the associated Destination.
If a Destination becomes unavailable for some reason, the Operations to be reproduced should be queued so that they may be reproduced (in the original order in which they occurred) at a later point in time.
This implementation of the Push Replication Pattern additionally advocates that;
The Data on which Operations occur are Coherence Cache Entries.
Operations to be reproduced include inserts, updates, and deletions of Cache Entries*. This includes NamedCache put, remove method calls together with side-effects created by invoking EntryProcessors.
The Operations that are enqueued for reproduction are called Entry Operations. They contain the key, old and new values of the cache entries in Binary form.
The Destination at which Entry Operations will be replicated/reproduced is dependent on the implementation of the associated Publisher
Custom Publisher implementations may be implemented and configured to publish Entry Operations to any Destination.
Typical Publishers and thus Destinations may be anyone of the following; a local cluster, a remote cluster, a file system, a database, an i/o stream, a logging system etc.
A Cluster may act as both a sender of Operations and receiver of Operations. That is, multi-way multi-site push replication is permitted.
Rationale
The purpose of this pattern is to provide an extensible, high-performance, highly-available, general purpose scalable framework to support the reproducing Entry Operations occurring in one or more Caches in a Coherence Cluster to one or more possibly globally distributed Coherence Clusters, Caches or other Destinations.
This project (like other Coherence Incubator projects) uses Apache Ivy for dependency specification and management. While a standard ivy.xml definition file ships with the source and documentation distribution, the following diagram visually indicates the current dependencies.
What's New?
This release of the Push Replication implementation encapsulates many significant changes since the release of Push Replication Pattern 2.6.0. While the approach and central concepts used in the implementation remain mostly unchanged, almost all of the interfaces and implementing classes have been refactored to achieve the goals of the release, namely to allow developers to;
a). Completely specify Push Replication configuration declaratively in a standard Coherence Cache Configuration file. For the most part this means that Push Replication may now be adopted without changing implementing application classes or developing initialization applications for the pattern.
b). Replace as needed the internal infrastructure Push Replication uses for "messaging" and managing queues when asynchronously publishing Entry Operations. The goal being to enable organizations to leverage existing messaging infrastructure for Push Replication. With in the implementation this infrastructure is now known as the "Push Replication Provider".
The following changes have been made since the original Push Replication 3.0.0 release;
Resolved issue where ConflictResolvers were required to be serializable. This is no longer the case as we use ClassScheme<ConflictResolver>s instead (these are naturally serializable)
Added support for use of the cache-name macro parameter in local/remote cache names
Resolved issue where the JMX MBeans for PublishingServices where not being removed from local JMX registries when they moved from one node to another in the cluster.
Resolved issue where the JMX MBean "suspend" operation would not work during recovery of a PublishingService
Resolved issue where the first few updates on a cache when a cluster started would not be published with a Publisher.
Corrected misspelling of the get/setRemotePublisherScheme in the RemoteClusterPublisherScheme class
Resolved issue where numerous Remote Cluster/Cache publishers may indefinitely hold a lock on the local cluster configuration, thus making it impossible to start other local services and/or all extend members to connect to the cluster lazily.
Resolved issue where the use of a RemoteCachePublisher may cause infinite cluster publishing loops between clusters.
The following changes were made as part of in the original Push Replication 3.0.0 release;
Removed all uses of static classes. Push Replication now uses the Extensible Environments provided by Coherence Common 1.7.0.
Removed need to hard code individual site names in configuration. Instead site and cluster names are automatically detected based on cluster provided information.
Configuring the Push Replication Pattern
As mentioned above, this release provides developers with the ability to use xml declarations with in Coherence Cache Configuration files to configure how Push Replication operates. Like the processing pattern, this has been achieved through the introduction of the Push Replication Namespace, the reference documentation for which is available here.
The following outlines the xml configuration structure for the Push Replication Pattern and where changes typically occur with in a Coherence Cache Configuration document.
In the "Active Passive" deployment model updates to data made in the active grid are are sent to the passive grid asynchronously and ordered per NamedCache.
"Hub and Spoke" aka "Master and Slaves"
In the "Hub and Spoke" deployment model updates to data made in the active "hub" grid are are sent to any number of passive "spoke" grids asynchronously and ordered per NamedCache.
"Active Active" aka "Hot Hot" aka "Federated"
In the "Active Active" deployment model updates to data made in either of the active grids are are sent to other active grid asynchronously and ordered per NamedCache.
"Centralized Replication"
In the "Centralized Replication" model, a cluster serves as a hub to capture and route Entry Operations to a set of "leaf" clusters. This strategy best when a cluster "owns" a set of entries in a cache (i.e. is exclusively responsible for updating a set of entries between all clusters). Leaf clusters only ever publish to a "hub" cluster, which is responsible for processing all Entry Operations from all leafs, and propagating said Entry Operations to all other leafs clusters.
In this model, one of the clusters is configured as a "hub" by specifying the <replication-role>. Each of the other clusters is designated as "leaf" clusters.
With these properties, Entry Operations occurring in each cluster that are artifacts of another cluster are reproduced locally and are not further published elsewhere. Thus "leaf" clusters behave as end-points for replication activity.
However the "hub" cluster will publisher all Entry Operations (both those that occurred locally and those from leaf clusters) to all other clusters (except the cluster from which the Entry Operation originated)
Each model additionally supports Conflict Resolution at the destination site through the specification of ConflictResolvers. The default conflict resolver (called a BruteForceConflictResolver) will simply overwrite the existing value - that is, last write wins.
We highly recommend that all clusters using Push Replication are uniquely identifiable. To achieve this, each cluster should be configured such that the combination of their site and cluster names are unique. To declare the name of the geographical site in which a cluster is located you can either use the Coherence system override property tangosol.coherence.site or configure the <cluster-config>. Likewise to declare the name of a cluster use either the Coherence system override property tangosol.coherence.cluster or again, configure the the <cluster-config>.
Frequently Asked Questions
What are the ordering characteristics of push replication?
Push Replication publishes information based on mutations, called Entry Operations, of cache entries. It relies on the Push Replication Provider to provide ordering of Entry Operations for publishing. This means that for a given entry in a cache, any updates made to that entry will be published in the same order, unless of course the publishers are using Publishing Transformers that may mutate the Entry Operations prior to publishing. The ordering of publication across different entries in a cache, or between updates to different caches is not enforced.
How can the Push Replication Publishers be monitored?
Like other patterns in the Incubator and Coherence itself, Push Replication supports monitoring and some management via JMX. By simply enabling JMX on the Coherence Cluster, each of the PublishingServices will be presented in the JMX tree, detailing current publishing state and statistics.
How do I enable Push Replication to one or more (remote) Coherence Clusters?
In order to publish to a remote cluster, you must;
Configure and enable one or more proxies on the remote cluster(s).
Ensure that the members of the remote cluster(s) have the Push Replication Pattern (and dependencies) in the class path
Configure and enable one or more Remote Invocation Services in the "hub" with the addresses of the remote cluster proxy members (or if you're using Coherence 3.4+, use an AddressProvider). An example scheme is defined in the coherence-pushreplicationpattern-cache-config.xml file.
Declare an appropriate RemoteClusterPublisherScheme or RemoteCachePublisherScheme for each cache.
How are conflicts resolved when publishing Entry Operations into caches in local or remote clusters?
The LocalCachePublisherSchemes and RemoteCachePublisherSchemes classes support the specification of a ConflictResolvers in their configuration. By specifying a ConflictResolver in your application configuration, Push Replication may then detect and appropriately resolve any underlying conflicts. When not specified, the BruteForceConflictResolver is used as the default.