GemFire Architecture
GemFire Architecture
GemFire Enterprise
Architectural Overview
Release 5.0
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 2
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 3
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 4
GemFire Architecture
1 - What is GemFire?
GemFire Enterprise is a high-performance, distributed operational data management infrastructure that sits
between your clustered application processes and back-end data sources to provide very low-latency, predictable, high-throughput data sharing and event distribution.
GemFire harnesses the memory and disk resources across a network to form a real-time data fabric or grid.
By primarily managing data in memory, GemFire enables extremely high-speed data sharing that turns a
network of machines into a single logical data management unit a data fabric.
GemFire is used for managing operational data Unlike a traditional centralized disk-based database management system used for managing very large quantities of data, GemFire is a real-time data sharing facility specifically optimized for working with operational data needed by real-time applications it is the
now data, the fast-moving data shared across many processes and applications. It is a layer of abstraction
in the middle tier that collocates frequently-used data with the application and works with back-end databases behind the scenes.
DATA
BASE
DATA ACCESS
& SYNCHRONIZATION
APPLICATION PROCESSES
FILES
FILES
FILES
It does this without compromising the availability or consistency of data a configurable policy dictates the
number of redundant memory copies for various data types, storing data synchronously or asynchronously
on disk and using a variety of failure detection models built into the distribution system to ensure data correctness.
GemFire Architecture
CONFIDENTIAL
page 5
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 6
GemFire Architecture
Peer-to-Peer Topology
In a peer-to-peer distributed system, applications can use GemFire 'mirroring' to replicate the data to other
nodes or to partition large volumes of data across multiple nodes.
Peer-to-Peer Scenario: Embedded cache
APPLICATION
PROCESS
EMBEDDED
CACHE
database
PARTIAL DATA
In this cache configuration, the cache is maintained locally in each peer application
and shares the cache space with the application memory space. Given the proximity
of the data to the application, this configuration offers the best performance when
the cache hit rate is high. Multiple such embedded caches can be linked together to
form a distributed peer-to-peer network.
When using the embedded cache, applications configure data regions either using an
XML-based cache configuration file or through runtime invocation of the GemFire
API. The behavior of the cache and the patterns of distribution reflect the cache
configuration options chosen by one of these methods. For example, a data region
configured as a 'mirror' will replicate all data and events on the region across the
distributed system when the member joins. This 'mirror' acts as a magnet all operations on the distributed data region are automatically replicated on it, resulting in
a union of all entries in that region across the distributed system.
MIRROR
REGION
PARTIAL DATA
database
GemFire Architecture
CONFIDENTIAL
page 7
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 8
GemFire Architecture
"virtualized"
data buckets
Figure: Partitioning data across many nodes
In a 'partitioned cache', all of the cache instances have unique regions (except for mirrored regions), and the
peer-to-peer model can scale to over 100 cache instances holding tens to hundreds of gigabytes of data in
main memory. Applications simply operate on the data locally and behind the scenes and GemFire manages
the data across the data partitions of the distributed system while ensuring that data access is at most a single network hop. Region configurations control the memory management and redundancy behavior of these
partitioned regions. The following declaration creates a region that is part of a larger partitioned region, and
it indicates that all data will be dynamically put in 'buckets' that are determined by GemFire hashing algorithms. These hashed 'buckets' are spread across the available nodes (the default is 113 buckets, but it is
configurable). The declaration also indicates that GemFire will replicate it and maintain the copy in a different node.
<cache>
<vm-root-region name='items'>
<region-attributes>
<partition-attributes redundant-copies='1'>
</partition-attributes>
</region-attributes>
</vm-root-region>
</cache>
Example: Declaring a partitioned region
Configuring the required level of redundancy automatically configures the error handling: when a node
holding a partition fails, the data regions held in it are redistributed to other nodes to ensure that the desired
GemFire Architecture
CONFIDENTIAL
page 9
GemFire Architecture
redundancy levels are maintained (with no duplication of an entry on any given node). Further, node failures will cause client requests to be redirected to a backup node automatically.
New members can be dynamically added (or removed) to increase memory capacity as the data volume
managed in the data region increases without impacting any deployed applications. Thus, the GemFire system is very useful in applications where operational data size can grow to unknown size and/or the rate of
data change is high.
Scaling a peer-to-peer system
A peer-to-peer system offers low latency, one-hop data distribution, dynamic discovery, and transparency of
location. In this arrangement, members maintain direct, socket-to-socket connections to one another. When
the number of members grows, the number of connections grows exponentially. This limits the scalability
of an IP-multicast system to a few score members. To make the system more scalable, GemFire offers a
reliable multicast (UDP) for scalable distribution of messages between data regions. When regions have
this option enabled, the system may contain tens or hundreds of thousands of members without significant
performance degradation.
Peer-to-peer distribution is useful for replicating data among servers, guaranteeing that the data is always
available to clients. The next section outlines how a client-server topology uses a peer-to-peer architecture
to replicate data in a server tier and how clients can use GemFire's load-balancing policies to maximize
server performance.
GemFire Architecture
CONFIDENTIAL
page 10
GemFire Architecture
Client-Server Topology
Client-server caching is a deployment model that allows a large number of caches in any number of nodes
to be connected to one another in a client-server relationship. In this configuration, client cachesGemFire
caches at the outer edge of the hierarchy, communicate with server caches on back-end servers. Server
caches can in turn be clients of other server caches or can replicate their data by enlisting server peers as
mirrors. This is a federated approach to caching data for a very large number of consumers and is a highlyscalable topology.
Applications use the client-server topology to host the data in one or more GemFire 'cache servers'. These
cache servers are configured via XML files to host the data regions and to replicate one another's data using
the peer-to-peer topology; cache servers can explicitly specify data regions to be 'mirrors' for replicating
server-tier data to each other. As the 'mirroring' attribute is configured at a data region level for each server,
application deployments have the flexibility to manage how and where the data is replicated across servers.
For instance, one data region could be configured to 'mirror' on each cache server, while, a second data region could be configured such that only a partial data set is distributed across cache servers and the server
can fetch the missing data from a back-end database upon a miss.
Transparent data access
A client can fetch data from its server transparently and on demand: if a client requests data that is not in its
local cache (a miss) the request is delegated to the server cache. A miss on the server will typically result in
the data being fetched from either one of its peers, another server in another server tier, or the back-end
database that it (or one of its peers) is connected to. This configuration is well suited for architectures where
there are a large number of distributed application processes, each caching a facet of the data originating
from one or more server caches. With directed, point-to-point cache communications between the clients
and the servers, the traffic to the client caches is dramatically reduced. The server cache is typically deployed on a bigger machine and, by holding common data its cache, shields unnecessary traffic to the backend data source. The servers can be part of a distributed system, connected to one another as peers, mirroring data with one another for high availability and for load balancing of client requests.
Client-server communications
Communication between a server and a client is based on a connection created by the client. This allows a
client to connect to servers over a firewall. Client caches can register interest lists with a server that identify
the subset of data entries that they are interested in and when these data entries change, the server will send
updates to the interested clients. Server-to-client communication can be made asynchronous, if necessary,
via a server queuing mechanism that allows for a configurable maximum limit on the number of entries in
the queue. Events pushed into the queue can also be conflated, so that the client receives only the most recent value for a given region entry. This ensures that the client-side performance does not bottleneck the
cache server and impede its ability to scale to a large number of clients.
A multi-threaded client will, by default, open a separate socket to the server for each thread that is communicating to the server. In return, the server must open a socket for each of the client sockets. As the number
of clients grows, server performance can degrade due to the large number of sockets that it uses to communicate with the clients. One way to manage this is to configure the clients' threads to share a pool of sockets.
This frees the server to communicate all client requests using the same, single client socket and greatly increases the scalability of the clients.
GemFire Architecture
CONFIDENTIAL
page 11
GemFire Architecture
CACHE
SERVERS
CACHE CLIENTS
FAILED
SERVER
CLIENTS
RETRY
GemFire Architecture
CONFIDENTIAL
page 12
GemFire Architecture
back to the healthy server list and the clients are load balanced to include the server. The time period between failed server pings is also configurable.
SERVER ON
'FAILED
SERVER' LIST
CLIENTS
RECONNECTED
GemFire Architecture
CONFIDENTIAL
page 13
GemFire Architecture
Data on demand
GemFire provides a simple set of plug-in interfaces for application developers to enable connectivity with
remote data sources. To dynamically fetch data from an external data store, application developers plug in a
'CacheLoader' to load data from the external source into a GemFire cache. This loader is automatically invoked when an object lookup in the cache results in a miss. Upon loading data into the caller's local cache,
GemFire distributes the new data to other cache nodes in accordance with the local cache's distribution policy and the remote cache's 'mirroring' policies. To synchronize changes to data in the cache with the external data store, the developer installs a write-through 'CacheWriter' in the cache that has close proximity to
the data store. This writer synchronously writes changes to the data store before applying the change to the
distributed cache.
Both plugins (the CacheLoader and CacheWriter) can be configured either through the GemFire cache
XML configuration or dynamically with the GemFire caching APIs. Each loader or writer is associated with
a single cache region, so the developer may install different loaders and writers in different cache regions.
GemFire also offers flexibility on where cache loaders, writers and listeners are invoked. For instance, in a
widely distributed environment, the data source may not be accessible from all nodes due to security constraints or the network topology. A cache miss on a cache that does not have access to the data source will
automatically fetch the data from a remote data loader (usually in close proximity to the data source). Similarly, writers and listeners can be executed remotely. This loose coupling of applications to data sources
allows new applications to scale across an enterprise without unnecessary costs associated with replicating
data.
GemFire Architecture
CONFIDENTIAL
page 14
GemFire Architecture
WAN Topology
Peer-to-peer clusters are subject to scalability problems due to the inherent tight coupling between the
peers. These scalability problems are magnified if a data center has multiple clusters or if the data center
sites are spread out geographically across a WAN. GemFire offers a novel model to address these topologies, ranging from a single peer-to-peer cluster all the way to reliable communications between data centers
across the WAN. This model allows distributed systems to scale out in an unbounded and loosely-coupled
fashion without loss of performance, reliability and data consistency. At the core of this architecture is a
gateway hub for connecting to distributed systems at remote sites in a loosely-coupled fashion.
Tokyo cache
New
York
gateway
Tokyo
Gateway
Hub
New York
Gateway
Hub
queue
acknowledgment
Message
queue
Dispatcher
New
York
gateway
Message
queue
Dispatcher
gateway
hub
London
gateway
gateway
hub
Message
Dispatcher
Message
queue
Dispatcher
acknowledgment
London
Gateway
Hub
Tokyo
gateway
London cache
Figure: Gateway hub distributing data across a WAN
Each GemFire distributed system can assign a process as its gateway hub, which may contain multiple
gateways that connect to other distributed systems. Updates that are made in one system can be propagated
to another system via a queuing mechanism managed by each gateway. The receiving distributed system
sends acknowledgments after the messages have been successfully processed at the other end. In this fashion, data consistency is ensured across the data centers. Messages in the gateway queue are processed in
batches, which can be size-based or time-based.
GemFire Architecture
CONFIDENTIAL
page 15
GemFire Architecture
The declarations below create a single gateway hub (server) that receives data from two locations through
gateways (clients). It receives data from the two gateways through port '11111' and sends acknowledgments
to each gateway through ports '22222' and '33333'.
<cache>
<gateway-hub id='US' port='11111'>
<gateway id='JP'>
<gateway-endpoint id='JP' host='192.168.1.104' port='22222'/>
</gateway>
<gateway id='GB'>
<gateway-endpoint id='GB' host='192.168.1.105' port='33333'/>
</gateway>
</gateway-hub>
</cache>
Example: Declaring a gateway hub and two gateways
GemFire can scale to thousands of nodes by having multiple distributed systems, where each distributed
system hosts just a few cache servers and uses the WAN topology create a mega-cluster of distributed systems.
Error handling
When messages sent via a gateway are not processed correctly at the receiving end, those messages are
resent or appropriate warnings or errors are logged, depending on the actual scenario in question. This
flexible model allows several different topologies to be modeled in such a fashion that data can be distributed or replicated across multiple data centers so that single points of failure and data inconsistency issues
are avoided.
Primary and secondary gateway hubs
Backup gateway hubs can be configured to handle automatic fail-over of the primary hub. As gateway hubs
come online, each attempts to get a lock on a globally visible object. The hub that gets the lock becomes the
primary. Any other hubs write a warning that they did not get the lock and idle, waiting on the lock. If the
primary hub fails, the lock is released and the next hub in line gets the lock and becomes the new primary.
The primary gateway hub maintains a persistent messaging queue that the secondary mirrors. This guarantees that when the secondary takes over from the primary, no data or messages will be lost.
GemFire Architecture
CONFIDENTIAL
page 16
GemFire Architecture
Tokyo
Gateway
Hub
New York
Gateway
Hub
New
York
gateway
gateway
hub
Tokyo
gateway
(primary)
London
Gateway
Hub
(secondary)
London
Gateway
Hub
London cache
New
York
gateway
gateway
hub
Tokyo
gateway
London cache
GemFire Architecture
CONFIDENTIAL
page 17
GemFire Architecture
Case Studies
Here are some real-world examples of how these different topologies are being used in production by
GemFire customers.
GemFire Architecture
CONFIDENTIAL
page 18
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 19
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 20
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 21
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 22
GemFire Architecture
Data Distribution
Each member typically creates one or more cache data 'regions'. Each region is created with configuration
attributes that control where the data is managed (the physical location), the region's data distribution characteristics, and its memory management and data consistency models.
Data distribution between members is intelligent as each member has enough topological information to
know which members potentially share the data it is managing. The\is enables the distribution layer to route
data, operations, and messages only to the right nodes.
By default, GemFire provides a peer-to-peer distribution model where each system member is aware of
every other member. GemFire offers a choice in the transport layer - TCP/IP or Reliable Multicast (UDP).
At a region level, multicast communication can be turned on or off based on local network policies and
other considerations. Cache distribution and consistency policies are configured for on a region-by-region
basis.
SENDING
REGION
RECEIVING
REGION
RECEIVING
REGION
CONFIDENTIAL
page 23
GemFire Architecture
GemFire supports various distribution models, languages and consistency models for applications to choose
from.
For replicating data across many cache instances, GemFire offers the following options:
Replication on demand: In this model, the data resides only in the member region that originally created
it. Data is distributed to other members' regions only when the other members request the data. The data
is lazily pulled from the originator to the requesting member. Once the data arrives, it will automatically
receive updates as long as the member region retains the key or maintains interest in the object.
Key replication: This model can preserve network bandwidth and be used for low bandwidth networks
because it makes the data available to all members by pushing the data entry key to the other members.
Once a member has the key, it can dynamically 'get' the data when it needs it.
Total replication: Both the data and its corresponding key are replicated.
Region Scope
Data is stored in the cache in cache 'regions'. Each region generally contains a particular type of data or a
subset of data that logically belongs together. Each region has two key configuration parameters: its scope
and its mirror-type. Scope controls whether or not the region will distribute its data to other member regions of the same name, and if it does distribute the data, how it guarantees that the message and data arrived on the receiving end. Scope comes in four flavors:
Local - No distribution
Distributed-no-ack - Send the data to the first member and send the data to the next member without
waiting for any acknowledgment from the first
Distributed-ack - Send the data and wait for an acknowledgment from the receiver before sending the
data to the next member
Global - Before distributing any data, get a distributed lock on every entry that will be updated in the
other members. Once the lock is had, distribute the data. When all data is securely in place, release all of
the locks.
Scope controls the sending of the data.
When a member notices incoming data (it's listening on a socket buffer), it scans the incoming message to
determine how it should respond to the sender.
Synchronous communication without acknowledgment (distributed-no-ack scope)
Applications that do not have very strict consistency requirements and have very low latency requirements
should use synchronous communication model without acknowledgments to synchronize data across cache
nodes. This is the default distribution model and provides the lowest response time and highest throughput.
Though the communication is out-of-band, the sender cache instance reduces the probability of data conflicts by attempting to dispatch messages as quickly as possible.
Synchronous communication with acknowledgment (distributed-ack scope)
Regions can also be configured to do synchronous messaging with acknowledgments from other cache
members. With this configuration, the control returns back to the application sender only after a receiving
cache has acknowledged acceptance of the data. Each message is sent synchronously in this way until all
receivers have been sent the data and each has acknowledged storing it.
GemFire Architecture
CONFIDENTIAL
page 24
GemFire Architecture
synchronous
messages
RECEIVING
REGION
second message
RECEIVING
REGION
first message
wait for acknowledgment
wait again
RECEIVING
REGION
lock entry
operation on entry
unlock entry
RECEIVING
REGION
lock entry
operation on entry
unlock entry
GemFire Architecture
CONFIDENTIAL
page 25
GemFire Architecture
Transactions - Operations on data in a local region can be wrapped in GemFire cache transactions to ensure the integrity of the data.
Region Mirroring
GemFire distributed regions are groups of data that have their own distribution, persistence, and data management policies. The data in the sending region is serialized into the socket buffer of each receiving member and each member is sent the data synchronously. In the case of a receiver that slow down this synchronous behavior, the sender can treat the slow receiver differently: by switching to an synchronous queuing of
the messages to this 'slow receiver'.
When the region on the receiving end hears the data arrive in its socket buffer, it checks the header of the
data to determine how to respond to the receiver. Receiver acknowledgments are based on the sender's
scope: the sender message contains a description of the scope of the sending region in the message header.
After determining when to acknowledge the message, the receiver processes the newly-arrived data according to its 'data-policy'. It can:
Store the data only if it already has an older version (mirror-type= 'none')
Store the key to the data and ignore the newly-arrived value (mirror-type='keys')
Store both the key and the value (mirror-type='keys-values'
Each distributed region in the distributed system has its own designated sending and receiving behavior and
is the result of partitioning the data into groups and assigning each group of data its own distribution characteristics according to the application's requirements.
Keeping updates to selected data (mirroring off)
If one member's region has been designed to hold a subset of the other members' region data, the region
will:
load its data on initialization
be configured with mirror-type='keys'
By loading the subset of data that it is designed to hold and by mirroring its data with 'mirror-type=keys',
the region will only accept updates to data that it already has. Any new data sent by other members will be
ignored.
Providing access to remote data (mirroring keys only)
Mirroring keys provides a region with access to all data in the other members, but saves memory by forcing to fetch the values on demand. When the local member requests the data from the region, the region
will know that the values exists elsewhere in the cache by virtue of having the key. It then uses the key to
perform a 'net search' of other members, fetches the data from another member, and stores it locally when it
receives it. Once the value is stored locally, it will accept updates to the value as they arrive.
This scenario is useful when an application can't predict which data it will need on a given node, but once
data is accessed on the node, it will often be needed again.
Keeping all data locally (mirroring keys and values)
A region with mirror-type set to 'keys-values' is a true mirror. It will duplication any region that has a full
data set, or it will aggregate data from other regions into a full data set locally. It is critical for creating redundancy in server tiers and for applications that require immediate access to all data on a node.
GemFire Architecture
CONFIDENTIAL
page 26
GemFire Architecture
Cache Configuration
Data management policies and distribution patterns are typically established by members' property files and
XML cache configuration files. These configurations are remarkably easy to declare. This extract from the
system property file indicates that the member will be joining the distributed system associated with mcast
address 'goliath' and mcast port '10333'.
# in gemfire.properties file
mcast-address=goliath
mcast-port=10333
The next extract from the member's cache configuration file will cause the member to create two regions,
one of which ('items') will distribute data and wait for acknowledgment from each member that it distributes to (scope='distributed-ack') and it will only accept updates to data that it already has (default mirrortype='none'). The other region ('offerings') will not wait for acknowledgment when it distributes data (it
uses the default scope='distributed-no-ack') but will act as an aggregator of all of the 'offerings' entries in
the distributed system caches (by declaring mirror-type' 'keys-values'):
<!-- in cache.xml file -->
<cache>
<vm-root-region name='items'>
<region-attributes scope='distributed-ack'>
</region-attributes>
</vm-root-region>
<vm-root-region name='offerings'>
<region-attributes mirror-type='keys-values'>
</region-attributes>
</vm-root-region>
</cache>
Example: Creating a cache and two regions declaratively
GemFire Architecture
CONFIDENTIAL
page 27
GemFire Architecture
The same result can be had by using the GemFire programming API. When the application launches, it can
configure the cache and its regions by setting system properties and region attributes programmatically:
// connect to the distributed system
Properties properties = new Properties();
properties.setProperty("mcast-address", "goliath");
properties.setProperty("mcast-port", 10333);
DistributedSystem system = DistributedSystem.connect(properties)
// create the cache (it will be used to create the regions)
cache = CacheFactory.create(system);
// create the distributed-ack items region
RegionAttributes regionAttributes = factory.createRegionAttributes();
regionAttributes.scope=Scope.DISTRIBUTED-ACK;
Region region = cache.createVMRegion("items", regionAttributes);
// create the mirror offerings region
regionAttributes = factory.createRegionAttributes();
regionAttributes.mirrorType=MirrorType.KEYS-VALUES;
region = cache.createVMRegion("offerings", regionAttributes);
Example: Creating a cache and two regions programmaticallyThe GemFire data operations are
equally simple: application developers 'put' data, 'get' data, 'remove' data, and 'listen' for actions and operations on the data.
// demonstrate region operations
Object key = "foo_key";
Object value = "foo_value";
region = cache.getRegion("items");
region.create(key, value);
value = region.get(key);
region.put(key, value);
region.invalidate(key);
region.destroy(key);
Example: GemFire cache operations
The simplicity of the API belies the power of the caching mechanisms. Persistence, transactions, overflow,
event notifications, data on demand, fail-over, wide area distribution, continuous queries ... all are easy to
configure and use because much of the cache behavior is transparent to the developer.
GemFire Architecture
CONFIDENTIAL
page 28
GemFire Architecture
Cache Behavior
Automatic backup of region data
Three attributes ('persist-backup', 'disk-dirs' and 'disk-write-attributes') control how and where region data
is backed up to disk and where data that overflows to disk is written. Up to four directories may be specified with the data 'striped' into files in the specified directories. This attribute defaults to use one directory:
the users current directory. (see the 'Persistence and Overflow' section for more details).
Distributed event notifications
Cache listeners can be used to provide asynchronous event notifications to any number of applications connected to a GemFire distributed system. Events on regions and region entries are automatically propagated
to all members subscribing to the region. For instance, region events like adding, updating, deleting or invalidating an entry will be routed to all listeners registered with the region. Data regions can be configured
to have multiple cache listeners to act upon cache events. Furthermore, the order of events can be preserved
when cache operations are performed within the context of a transaction. Event notifications are also triggered when member regions leave or enter the distributed system, or when new regions are created. This
enables application interdependencies to be modeled in SOA-like environments. Applications can also subscribe to or publish region events without caching the data associated with those events. GemFire can thus
be used as a messaging layer, which sends/receives events to multiple cacheless clients, with regions being
equivalent to message destinations (topics/queues). Unlike an enterprise messaging system, the programming model is very intuitive. Applications simply operate on the object model in the cache without having
to worry about message format, message headers, payload, etc. The messaging layer in GemFire is designed with efficiency in mind. GemFire keeps enough topology information in each member cache to optimize inter-cache communications. GemFire Intelligent Messaging transmits messages to only those member caches that can process the message.
How the 'get' operation affects the local entry value
If a 'get' operation misses and retrieves an entry value from outside the local cache through a local cache
loader or by fetching the data from another system member, it automatically attempts to 'put' the retrieved
value into the local cache for future reference. This 'put' will invoke any existing cache writer and run the
local capacity controller if one is configured. Either of these plugins' operation can, in turn, abort the attempt to store the value in the local cache (the successive 'put' operation). The original 'get' operation always bring the entry value to the calling application, but it is possible for the fetched data to remain unavailable in the local cache because the successive put was aborted.
Dealing with member failures
In a standard peer-to-peer environment, all members are considered equivalent. But applications might organize members in a primary/secondary relationship, such that when the primary fails, one of the secondaries can take over with zero impact to the availability of the application. To facilitate this, GemFire provides a health-monitoring API and a membership-callback API. Applications register a membership event
listener and receive notifications when members join or exit the system. Member identity and an indication
of whether they exited gracefully or ungracefully are supplied to the application through these callbacks.
So, how does the distributed system detect member failures? Failure in the distribution layer is identified at
various levels: socket exceptions that indicate link or application failures, timeouts in acknowledgmentbased message processing or timeouts/exceptions in 'heartbeat' processing. (Once a member joins the system, it sends out a 'heartbeat' at regular intervals through its connections to other members to describe its
health to the others and monitors incoming heartbeats from other members for signs of their health.)
GemFire Architecture
CONFIDENTIAL
page 29
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 30
GemFire Architecture
It is also possible to shorten the time spent instantiating a serialized object; using the GemFire Instantiator
class to instantiate DataSerializable objects is much faster because it avoids the reflection mechanisms that
are used during the process of unmarshaling Serializable objects.
public class Pixel implements DataSerializable {
private Point point;
private RGB rgb;
static {
Instantiator.register(new Instantiator(Pixel.class, (byte) 01) {
public DataSerializable newInstance() {
return new Pixel();
}
});
}
// constructors and accessor methods ...
// creates an empty user whose data is filled in its fromData() method
public Pixel() {}
// DataSerializable methods
public void toData(DataOutput arg0) throws IOException {
arg0.writeInt(point.x);
arg0.writeInt(point.y);
arg0.writeInt(rgb.red);
arg0.writeInt(rgb.green);
arg0.writeInt(rgb.blue);
}
public void fromData(DataInput arg0) {
point = new Point(arg0.readInt(), arg0.readInt());
rgb = new RGB(arg0.readInt(), arg0.readInt(), arg0.readInt());
}
}
CONFIDENTIAL
page 31
GemFire Architecture
To store and fetch user-defined .NET objects in the Java cache, the developer must provide Java implementations for these classes, use the provided tools to generate proxies based on the Java .class files, compile
and link the .NET assembly, and include the assembly in the Visual Studio project. With these proxy libraries referencing these proxy libraries in the project, .NET objects can be instantiated, passed as arguments to
the operations and declared as return types in the .NET/Java adaptor API,. All marshaling and unmarshaling
is done by the adaptor API wrappers.
C++ environments have two options: use the .NET wrapper to build managed C++ clients that use the
GemFire cache, or use GemFire Enterprise - C++ to provide native distributed caching for C++ applications. GemFire Enterprise - C++ caches objects in a native C++ format and eliminates the need for mapping
Java objects to C++.
Scenario: Hooking Microsoft Office clients up to the GemFire cache
Since the GemFire .NET Adaptor provides proxies to the GemFire Java API classes, developing Microsoft
Office applications that use the GemFire java cache is simple: just develop Office plugins in Visual Studio
and invoke the cache operations from the .NET adaptor classes. All communications from .NET to Java is
handled transparently by the .NET Adaptor.
GemFire Architecture
CONFIDENTIAL
page 32
GemFire Architecture
REGION
DATA
REGION KEYS
AND VALUES
BACKUP
FILES
GemFire Architecture
CONFIDENTIAL
page 33
GemFire Architecture
Persistence is specified as a region attribute by setting the 'persist-backup' region attribute to true.
<cache>
<vm-root-region name='items'>
<region-attributes persist-backup='true'>
<disk-dirs>
<disk-dir>vol1/items_backup</disk-dir>
<disk-dir>vol2/items_backup</disk-dir>
<disk-dir>vol3/items_backup</disk-dir>
<disk-dir>vol4/items_backup</disk-dir>
</disk-dirs>
<disk-write-attributes>
<asynchronous-writes time-interval='15'/>
</disk-write-attributes>
</region-attributes>
</vm-root-region>
</cache>
Example: Declaring a persistent region and its asynchronous write behavior
Overflow
Overflow is implemented as an eviction option in the LRU capacity controllers and is specified in the cache
configuration file by installing a capacity controller in a region with an eviction action of 'overflow-to-disk'.
REGION
EXCEEDS
CAPACITY
KEYS STAY
IN MEMORY
LEASTRECENTLYUSED
VALUES
put data
GemFire Architecture
CONFIDENTIAL
OVERFLOW
FILES
page 34
GemFire Architecture
Configuring a region for overflow requires a declaration among the region's region-attributes:
<cache>
<vm-root-region name='items'>
<region-attributes persist-backup='true'>
<capacity-controller>
<class-name>
com.gemstone.gemfire.cache.util.MemLRUCapacityController
</class-name>
<parameter name='maximum-megabytes'>
<string>50</string>
</parameter>
<parameter name='eviction-action'>
<string>overflow-to-disk</string>
</parameter>
</capacity-controller>
<disk-dirs>
<disk-dir>vol1/items_overflow</disk-dir>
<disk-dir>vol2/items_overflow</disk-dir>
<disk-dir>vol3/items_overflow</disk-dir>
<disk-dir>vol4/items_overflow</disk-dir>
</disk-dirs>
</region-attributes>
</vm-root-region>
</cache>
Example: Declaring a CapacityController that overflows data to disk
Persistence and Overflow In Combination
Used together, persistence and overflow keep all region entries (keys and data values) on disk with only the
most recently-used entries retained in memory. The removal of an entry value from memory due to overflow has no effect on the disk copy as all entries are already present there due to the persistence.
REGION
EXCEEDS
CAPACITY
LEASTRECENTLY-USED
VALUES EVICTED
ALL
KEYS AND
VALUES
BACKUP
FILES
put data
JUST LIKE
PERSISTENCE
GemFire Architecture
CONFIDENTIAL
page 35
GemFire Architecture
Performance Considerations
While the performance of accessing data on disk has been shown to be quite high, there are a couple of
considerations to take into account when using disk storage.
When an entry is overflowed, its value is written to disk, but the entry that held it and the entry's key remain
in the region. This scheme allows the region to receive updated entry values, but also incurs some overhead
in the VM: in addition to the size of the key data, each entry object itself occupies approximately 40 bytes
of data.
When a mirrored backup region is restored, it is populated with the entry data that was found on disk.
However, some of this data may be out of date with respect to the other members of the distributed system.
To ensure that the mirrored region has the most recent data values, the cache requests the latest value of
each piece of data from the other members of the distributed system. This operation could be quite expensive for a region that contains a lot of data.
Entry retrieval in overflow regions
For data value retrieval in a VM region with an overflow capacity controller installed, memory and disk are
treated as the local cache. When a get is performed in a region with overflow enabled, memory and then
disk are searched for the entry value. If the value is not found, the distributed system runs the standard operations for the retrieval of an entry whose value is missing from the local cache (loaders, remote cache
requests, etc.)
DATA HAS
OVERFLOWED
TO DISK
OVERFLOW
DATA
OVERFLOW
FILES
get data
data returned
DATA IS
FETCHED
FROM DISK
GemFire Architecture
CONFIDENTIAL
page 36
GemFire Architecture
5 - Transactions
GemFire provides support for two kinds of transactions:
GemFire cache transactions - GemFire cache transactions control operations within the GemFire cache
while the GemFire distributed system handles data distribution in the usual way. Use cache transactions to
group the execution of cache operations and to gain the control offered by transactional commit and rollback.
JTA global transactions - JTA global transactions provide another level of transactional control. JTA is a
standard Java interface that can coordinate GemFire transactions and JDBC transactions under one umbrella. The GemFire Enterprise Distributed product includes an implementation of JTA, but GemFire can
enlist in transactions controlled by any implementation of JTA. When GemFire runs in a J2EE container
with its own built-in JTA implementation, GemFire also works with the containers manager.
DATA COPIED TO
TRANSACTION
REGION
DATA
REGION
CONFIDENTIAL
page 37
GemFire Architecture
COMMIT MOVES
DATA BACK
CONFIDENTIAL
page 38
GemFire Architecture
The changes made by GemFire transactions are durable in that once the transactions changes have been
applied, they cannot be undone.
Interaction Between Transactions and Region Scope
A cache transaction can involve operations on multiple regions, each of which can have different attributes.
While a transaction and its operations are applied to the local VM, the resulting transaction state may be
distributed to other VMs according to the attributes of each participating region.
A regions scope affects how data is distributed during the commit phase. Transactions are supported for
these region scopes:
Local - no distribution, handles transactional conflicts locally
Distributed no ACK - handles transactional conflicts locally, less coordination between VMs
Distributed ACK - handles transactional conflicts both locally and between VMs
All distributed regions of the same name in all caches must be configured with the same scope.
Local scope
Transactions on locally scoped regions have no distribution, but they do perform conflict checks in the local
VM. You can have conflict between two threads when their transactions change the same entry. The first
transaction to start the commit process wins. The other transactions commit fails with a conflict, and its
changes are dropped.
You can also have a conflict when a non-transactional operation and a transaction modify the same entry. In
that case, the transactions commit fails.
Distributed-no-ack scope
If you need faster transactions with distributed regions, distributed-no-ack scope produces the best performance. This scope is most appropriate for applications with multiple writers that write to different keys,
or applications with only one writer. For example, you can batch up groups of changes into transactions and
do bulk distribution during large commits, thereby decreasing your overhead per change. The larger commit would make conflicts more likely, unless your application has a single writer. Then using the
distributed-no-ack scope would be safe and give you maximum speed.
In distributed-no-ack regions, GemFire performs conflict checks in the local VM but not remote conflict
checks. As in the local scope case, GemFire recognizes a conflict between two local threads and the second
transaction that attempts to commit fails. If transactions on two different VMs conflict, however, they can
overwrite each other if they both commit at the same time. The application probably would not know about
the conflict in time to stop a commit, because the commit call can return while the changes are still in transit or in the process of being applied to the remote VMs.
Distributed-ack scope
Distributed-ack scope is designed to protect data consistency. This scope provides a much higher level of
coordination among transactions in different VMs than distributed-no-ack.
When you use distributed-ack scope, when the commit call returns for a transaction that has only
distributed-ack regions, you can be sure that the transactions changes have already been sent and processed. Even after a commit operation is launched, the transaction continues to exist during the data distribution. The commit does not complete until the changes are made in the remote caches and the applications
on other JVMs receive the callbacks to verify that those tasks are complete. Any cache and transaction callGemFire Architecture
CONFIDENTIAL
page 39
GemFire Architecture
backs in the remote VM have been invoked. By now, any remote conflicts are reflected in the local region,
and that causes the commit to fail.
TRANSACTION
EXISTS DURING
DISTRIBUTION
COMMIT
DISTRIBUTES
DATA
distributed-ack
acknowledgment
COMMIT WAITS
UNTIL ALL
SUCCEED
GemFire Architecture
CONFIDENTIAL
page 40
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 41
GemFire Architecture
The rollback and failed commit operations are local. When a successful commit writes to a distributed region, however, the transaction results are distributed to the remote VM. The transaction listener on the remote VM reflects the changes the transaction makes in the remote VM, not the local VM. Any exceptions
thrown by the transaction listener are caught by GemFire and logged.
GemFire Architecture
CONFIDENTIAL
page 42
GemFire Architecture
JTA Transactions
JTA provides direct coordination between the GemFire cache and another transactional resource, such as a
database. Using JTA, your application controls all transactions in the same standard way, whether the transactions act on the GemFire cache, a JDBC resource, or both together. By the time a JTA global transaction
is done, the GemFire transaction and the database transaction are both complete.
Using GemFire in a JTA Transaction
Using GemFire with JTA transactions requires these general steps.
During configuration:
1. Configure the JDBC data sources in the cache.xml file.
2. Include the jar file for any data sources in your CLASSPATH.
At run-time:
3. Initialize the GemFire cache.
4. Get the JNDI context and look up the data sources.
5. Start a JTA transaction.
6. Execute operations in the cache and database as usual.
7. Commit the transaction.
The transactional view exists until the transaction completes. If the commit fails or the transaction is rolled
back, the changes are dropped. If the commit succeeds, the changes recorded in the transactional view are
merged into the cache. At any given time, the cache reflects the cumulative effect of all operations on the
cache, including committed transactions and non-transactional operations.
JTA Transaction Limitations
The GemFire Enterprise Distributed implementation of JTA has these limitations:
Only one JDBC database instance per transaction is allowed, although you can have multiple connections to that database.
Multiple threads cannot participate in a transaction.
Transaction recovery after a crash is not supported.
In addition, JTA transactions are subject to the limitations of GemFire cache transactions, which is discussed in the previous section. When a global transaction needs to access the GemFire cache, JTA silently
starts a GemFire cache transaction.
GemFire Operation in Global Transactions
JTA syncs up multiple transactions under one umbrella by enabling transactional resources to enlist with a
global transaction. The GemFire cache can register as a JTA resource through JTA synchronizations. This
allows the JTA transaction manager to call methods like commit and rollback on the GemFire resource and
manage the GemFire transactions. You can bind in JDBC resources so you can look them up in the JNDI
context. When bound in, XAPooledDataSource resources will automatically enlist if called within the context of a transaction.
GemFire Architecture
CONFIDENTIAL
page 43
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 44
GemFire Architecture
6 - Querying
GemFire Enterprise offers a standards-based querying implementation (OQL) for data held in the cache
regions. Object Query Language, OQL, from ODMG (http://www.odmg.org) looks a lot like SQL when
working with flat objects, but provides additional support for navigating object relationships, invoking object methods as part of query execution, etc. OQL queries can be executed on a single data region, across
regions (inner-joined) or even arbitrary object collections supplied dynamically by the application. Query
execution is highly optimized through the use of concurrent, memory-based data structures for both data
and indexes. Applications that do batch updates can turn OFF synchronous index maintenance and allow
the index to be optimally created in the background while the application proceeds to update the cache at
memory speeds.
This chapter introduces the concepts of cache querying and presents examples of using the Object Query
Language to retrieve cached data.
GemFire Architecture
CONFIDENTIAL
page 45
GemFire Architecture
Position
secId varchar
mktValue double
qty double
foreign key (id) refs portfolio
Object Representation
Portfolio
Integer id (also used as key)
String type
String status
Map positions
1
*
Position
String secId
double mktValue
double qty
GemFire Architecture
CONFIDENTIAL
page 46
GemFire Architecture
Cache Data
/root/portfolios Region
Portfolio
XYZ-2
xyz
active
positions
Position
YYY
18.29
5000.00
Portfolio
ABC-2
xyz
active
positions
Position
BBB
50.41
100.00
Position
BBC
55.00
90.00
GemFire Architecture
CONFIDENTIAL
page 47
GemFire Architecture
As you can see, the classic database storage model is flatter, with each table standing on its own. We can
access position data independent of the portfolio data or we can join the two tables if we need to by using
multiple expressions in our FROM clause. With the object model, on the other hand, multiple expressions
in the FROM clause are required for drilling down into the portfolio table to gain access to the position
data. This can be helpful, as the position-to-portfolio relationship is implicit and does not require restating
in our queries. It can also be a hindering factor when we want to ask general questions about the positions
data, independent of the portfolio information. The two queries below illustrate these differences. We query
the list of market values for all positions of active portfolios from the two data models with the statements
below.
We query the list of market values for all positions of active portfolios from the two data models with the
statements below.
SQL Query
SELECT mktValue
FROM portfolio, position
WHERE status='active'
AND portfolio.id = position.id
Example: SQL query for list of market values for all positions of active portfolios
OQL Query
SELECT DISTINCT posnVal.mktValue
FROM /root/portfolios, positions.values posnVal
TYPE Position
WHERE status=active
Example: OQL query for list of market values for all positions of active portfolios
The difference between the queries reflects the difference in data storage models. The database query must
explicitly join the portfolio and position data in the WHERE clause, by matching the position tables foreign key, id, with the portfolio tables primary key. In the object model, this one-to-many relationship was
specified when we defined the positions Map as a field of the portfolio class.
This difference is reflected again when we query the full list of positions market values.
SQL Query
SELECT mktValue FROM position
Example: SQL query for the full list of market values
GemFire Architecture
CONFIDENTIAL
page 48
GemFire Architecture
OQL Query
SELECT DISTINCT posnVal.mktValue
FROM /root/portfolios, positions.values posnVal
TYPE Position
Example: OQL query for the full list of market values
The position table is independent of the portfolio table, so our database query runs through the single table.
The cached position data, however, is only accessible via the portfolio objects. The cache query aggregates
data from each portfolios individual positions Map to create the result set. Note: For the cache query
engine, the positions data only becomes visible when the first expression in the FROM clause, /root/
portfolios, is evaluated.
The next section explores this drill-down nature of cache querying more fully.
Cache
Data in Scope
SELECT DISTINCT posnVal.mktValue
FROM /root/portfolios,
positions.values posnVal TYPE Position
WHERE status='active'
nothing in scope
GemFire Architecture
CONFIDENTIAL
page 49
GemFire Architecture
Cache
Data in Scope
SELECT DISTINCT posnVal.mktValue
FROM /root/portfolios,
positions.values posnVal TYPE Position
WHERE status='active'
Portfolio
Integer id (also used as key)
String type
String status
Map positions
Cache
Data in Scope
Portfolio
Integer id (also used as key)
String type
String status
Map positions
1
*
Position
String secId
double mktValue
double qty
Figure: FROM clause establishes scope that includes all Portfolio positions
In cache querying, the FROM expression brings new data into the query scope.
GemFire Architecture
CONFIDENTIAL
page 50
GemFire Architecture
none
keys
keys-values
local
N/A
N/A
distributed-no-ack
distributed-ack
global
Scope
GemFire Architecture
CONFIDENTIAL
page 51
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 52
GemFire Architecture
CONFIDENTIAL
page 53
GemFire Architecture
tain an attribute named positions, and the entry values collection (specified by /root/portfolios) that does
contain an attribute named positions is not yet part of the query name space.
/* INCORRECT */
/* positions is not an attribute of Region or Collection*/
SELECT DISTINCT * FROM /root/portfolios.positions
/* INCORRECT */
Example: Attempting to use a Collection as a Portfolio
This following SELECT statement is valid because positions is an element of the entry value collection that
is specified by /root/portfolios. The entry value collection is in scope as soon as the specification in the
FROM expression is complete (before WHERE or SELECT are evaluated).
GemFire Architecture
CONFIDENTIAL
page 54
GemFire Architecture
REGION
DATA
BACKUP
FILES
GemFire Architecture
CONFIDENTIAL
page 55
GemFire Architecture
Security
GemFire manages data in many nodes where access to data has to be protected. The security services provided by GemFire have the following characteristics:
On-the-wire protection (data integrity): This mechanism is used to prove that information has not been
modified by a third party (some entity other than the source of the information). All communication between member caches can be made tamper proof again by configuring SSL (key signing). The use of SSL
in GemFire communications is enabled in an all or nothing fashion.
Authentication: This refers to the protocol by which communicating entities prove to one another that they
are acting on behalf of specific identities that are authorized for access. GemFire uses the J2SE JSSE (Java
Secure Sockets Extension) provider for authentication. When SSL with mutual authentication is enabled,
any application cache has to be authenticated by supplying the necessary credentials to the GemFire distributed system before it can join the distributed system. Authentication of connections can be enabled for
each connection in a GemFire system. SSL can be configured for the locator service, the Console, the JMX
agent and the GemFire XML server. The choice of providers for certificates, protocol and cipher suites are
all configurable. The default use of the SUN JSSE provider can easily be switched to a different provider.
DATA
INTEGRITY
AUTHENTICATION
DATA
PRIVACY
GemFire Architecture
CONFIDENTIAL
page 56
GemFire Architecture
For instance, DB Writer can be defined as a role that describes a member in the GemFire distributed system that writes cache updates to a database. The DB Writer role can now be associated as a required role
for another application (Data feeder), whose function is to receive data streams (for e.g., price quotes) from
multiple sources and pass on to a database writer. Once the system is configured in such a fashion, the data
feeder will check to see if at least one of the applications with role DB Writer is online and functional
before it propagates any data updates. If for some reason, none of the DB Writers are available, the price
feeder application can be configured to respond as described above.
MEMBER
REQUIRES
'WRITEBEHIND' ROLE
MEMBER
DECLARES
REGION PLAYS
'WRITEBEHIND' ROLE
MEMBER
LEAVES
CONFIDENTIAL
page 57
GemFire Architecture
course is done with the expectation that network partition would be addressed within a short period of time
and all required roles would become available again. If this reconnect protocol fails, the member shuts
down after logging an appropriate error message. With this self-healing approach, a network
segmentation/partitioning is handled by the distributed without any human intervention.
The operational reliability and consistency of the system can be managed without resorting to overly pessimistic all or nothing style policies (supported by other distributed caching in the market) that have no
application specific context. Traditional distributed caching solutions do not provide an architect with the
ability to define critical members in a distributed system and cannot guarantee that critical members are
always available prior to propagating key data updates leading to missed messages and data inconsistencies. The GemFire role-based model offers the perfect balance of consistency, reliability and performance,
without compromise on any of these dimensions.
Slow or unresponsive receivers/consumers
In most distributed environments, overall system performance and throughput can be adversely impacted if
one of the applications/receivers consumes messages at a rate slower than that of other receivers. For instance, this may be the case when one of the consumers if it is not able to handle a burst of messages, due
to its CPU intensive processing on a message-by-message basis. With GemFire Enterprise, a distribution
timeout can be associated with each consumer, so that if a producer does not receive message acknowledgments within the timeout period from the consumer, it can switch from the default synchronous communication mode to an asynchronous mode for that consumer.
When the asynchronous communication mode is used, a producer batches messages to be sent to a consumer via a queue, the size of which is controlled either via queue timeout policy or a queue max size parameter. Events being sent to this queue can also be conflated if the receiver is interested only in the most
recent value of a data entity.
STARTS
ASYNCHRONOUS
QUEUEING
SLOW
RECEIVER
queue
CONFIDENTIAL
page 58
GemFire Architecture
RESUMES
SYNCHRONOUS
MESSAGING
RECEIVER
RESOLVES
PROBLEM
QUEUE FILLS UP
queue
SLOW
RECEIVER
DISCONNECTS,
RECONNECTS
REQUEST
DISCONNECT/
RECONNECT
Figure: Sender queue fills up, sender requests slow receiver to exit and reconnect
If in an extreme situation, the consumer is not able to receive even the high priority messages, the producer
logs warning messages, based on which a system administrator can manually fix the offending consumer. If
it is not fixed manually, the GemFire system will eventually remove the consumer from the distributed system based on repeated messages logged by a producer. In this fashion, the overall quality of service across
the distributed system is maintained by quarantining an ailing member.
GemFire Architecture
CONFIDENTIAL
page 59
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 60
GemFire Architecture
8 - GemFire Administration
GemFire provides a membership API that can be used to detect a split-brain scenario. The solution to the
problem requires three members at a minimum. When a member detects ungraceful exit of another member, it announces this to other members. Similarly, the other members do the same. If more than one member indicates the unexpected exit of the same member, the member is presumed unreachable. On the other
hand, the member who lost its network link, will notice that all the other members have ungracefully left
the system and can take a different course of action.
GemFire Architecture
CONFIDENTIAL
page 61
GemFire Architecture
The Console can be used by administrators and developers to launch remote GemFire systems and to modify runtime configurations on remote system members.
JMX
The Java Management extensions (the JMX specification) define the architecture, design patterns, APIs, the
services for application / network management and monitoring for Java based systems. GemFire provides
access to all its cache management and monitoring services through a set of JMX APIs. JMX enables
GemFire to be integrated with Enterprise network management systems such as Tivoli, HP Openview, Unicenter, etc. Use of JMX also enables GemFire to be a managed component within an application server that
hosts the cache member instance.
GemFire exposes all its administration and monitoring APIs through a set of JMX MBeans. An optional
JMX agent process can be started to manage the entire distributed system from a single entry point. The
agent provides HTTP and RMI access for remote management consoles or applications, but also makes it
possible to plug-in third party JMX protocol adapters, such as SNMP. The JMX health monitoring APIs
allows administrators to configure how frequently the health of the various GemFire components such as
the distributed system, system manager processes and member caches. For instance, the distributed system
health is considered poor if distribution operations take too long, cache hit ratio is consistently low or the
cache event processing queues are too large. The JMX administrative APIs can be used to start, stop and
access the configuration of the distribution system and its member caches right to the level of each cached
region. The JMX runtime statistics API can be used to monitor statistics gathered by each member cache.
These statistics can be correlated to measure the performance of the cache, the data distribution in the cache
and overall cache scalability.
GemFire command line utility
The GemFire command-line utility allows you to start, stop and otherwise manage a GemFire system from
an operating system command prompt. The gemfire utility provides an alternative to use of the GemFire
Console (gfc) and allows you to perform basic administration tasks from a script. However, all GemFire
administrative operations must be executed on the same machine as the GemFire system and only apply to
a single GemFire system member.
Statistics
GemFire has a very flexible and revealing statistics representation. During execution, a distributed system
has the ability to record many types of data regarding the performance and behavior of the system:
Cache operations - information regarding the type and number of cache operations and how much time
they consume
Messaging - statistics regarding message traffic between the VM and other distributed system members
Virtual machine - statistics describing the VM's memory usage. These can be used to find Java object
leaks
Statistics - statistics that show how much time is spent collecting statistics
Operating system process - operating system statistics on the VM's process. Can be used to determine the
VM's cpu, memory, and disk usage
Member machine - operating system statistics on the VM's machine: total cpu, memory, and disk usage
on the machine
In addition to the predefined statistics, application developers can define custom, application-specific statistics. Any type of information can be recorded at runtime with these custom statistics and they are treated by
GemFire Architecture
CONFIDENTIAL
page 62
GemFire Architecture
the tools (see VSD below) as first-class statistics that can be displayed and analyzed alongside the other,
predefined statistics.
Visual Statistics Display
The Visual Statistics Display (VSD) is a graphical user interface for the viewing the statistics that GemFire
records during application execution. It can be used to chart any GemFire statistics over time, overlay different statistics onto the same chart, and hone in on specific aspects of system performance by filtering and
compressing statistics.
Figure: Visual Statistics Display is used for charting and analyzing statistics
VSD allows the statistics files to be merged into one view. This allows the developer/administrator to analyze dependent behaviors among different distributed system members.
GemFire Admin and Health API
The GemFire Administration API for Java provides programmatic access to much of the same functionality
provided by the GemFire Console for a single distributed system. This section gives an
overview of the primary interfaces and classes that are provided by the API package. The administration
API allows you to configure, start, and stop a distributed system and many of its components. The API is
made up of distributed system administration, component administration, and cache administration. In addition to the core components listed here, the administration API provides interfaces to issue and handle system member alerts and to monitor statistics.
GemFire Architecture
CONFIDENTIAL
page 63
GemFire Architecture
The health monitoring API allows you to configure and monitor system health indicators for GemFire Enterprise Distributed Systems and their components. There are three levels of health: good health that indicates that all GemFire components are behaving reasonably, okay health that indicates that one or more
GemFire components is slightly unhealthy and may need some attention, and poor health that indicates that
a GemFire component is unhealthy and needs immediate attention.
Because each GemFire application has its own definition of what it means to be healthy, the metrics that
are used to determine health are configurable. It provides methods for configuring the health of the distributed system, system managers, and members that host Cache instances. Health can be configured on both a
global and per-machine basis. It also allows you to configure how often GemFires health is evaluated.
The health administration APIs allow you to configure performance thresholds for each component type in
the distributed system (including the distributed system itself). These threshold settings are compared to
system statistics to obtain a report on each components health. A component is considered to be in good
health if all of the user-specified criteria for that component are satisfied. The other possible health settings,
okay and poor, are assigned to a component as fewer of the health criteria are met.
Logging
GemFire's logging facility is based on the Apache Log4j project. It is an extensible logging library for Java.
With Log4j it is possible to enable logging at runtime without modifying the application binary. Log4j lets
the developer control which log statements are output with arbitrary granularity. The Log4j package is designed so that these statements can remain in shipped code without incurring a heavy performance cost.
Logging behavior can be controlled by editing a configuration file, without touching the application binary.
Java VM performance
There are several JVM parameters that will affect performance. Using VSD to identify bottlenecks and
monitor the runtime behavior may suggest tweaking the JVM's configuration with one of the following
parameters:
VMHeapSize - sets both the maximum and initial memory sizes or a Java application
MaxDirectMemorySize - limits the amount of memory the VM allocates for the NIO
direct buffers
UseConcMarkSweepGC and UseParNewGC - controls parallel garbage collection
DisableExplicitGC - Disables periodic gc performed by VM
GemFire Architecture
CONFIDENTIAL
page 64
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 65
GemFire Architecture
Individual member configuration parameters can be set programmatically (using the GemFire API), or can
be declared in the member's cache.xml file.
<!-- in cache.xml file -->
<cache>
<vm-root-region name='items'>
<region-attributes scope='distributed-ack'>
</region-attributes>
</vm-root-region>
<vm-root-region name='offerings'>
<region-attributes mirror-type='keys-values'>
</region-attributes>
</vm-root-region>
</cache>
Example: Configuring a member with the cache.xml file
GemFire Architecture
CONFIDENTIAL
page 66
GemFire Architecture
GemFire Architecture
CONFIDENTIAL
page 67
GemFire Architecture
Creating a region
// GemfireWrapper.java: region creation
public boolean createRegion(String name,
Scope scope, MirrorType mirrorType, boolean persists, File[] dirs,
boolean statsEnabled, LRUAlgorithm controller)
{
AttributesFactory factory = new AttributesFactory();
factory.setScope(scope);
factory.setMirrorType(mirrorType);
if(persists && (dirs != null)) {
factory.setPersistBackup(persists);
factory.setDiskDirs(dirs);
}
if(dirs != null) {
factory.setDiskDirs(dirs);
}
factory.setStatisticsEnabled(statsEnabled);
if(controller != null) {
factory.setCapacityController(controller);
}
RegionAttributes ra = factory.createRegionAttributes();
try {
cache.createVMRegion(name, ra);
return true;
}
catch(RegionExistsException e) { return false; }
catch(TimeoutException e) { return false; }
}
GemFire Architecture
CONFIDENTIAL
page 68
GemFire Architecture
CONFIDENTIAL
page 69
GemFire Architecture
Installing plugins
// GemfireWrapper.java: Install plugins and configure eviction
public void setListener(String name, CacheListener listener) {
Region region = cache.getRegion(name);
AttributesMutator mutator = region.getAttributesMutator();
mutator.setCacheListener(listener);
}
public void setEntryIdleTimeout(String name,ExpirationAttributes a){
Region region = cache.getRegion(name);
AttributesMutator mutator = region.getAttributesMutator();
mutator.setEntryIdleTimeout(expAttrs);
}
public void setWriter(String name, CacheWriter writer) {
Region region = cache.getRegion(name);
AttributesMutator mutator = region.getAttributesMutator();
mutator.setCacheWriter(writer);
}
public void setLoader(String name, CacheLoader loader) {
Region region = cache.getRegion(name);
AttributesMutator mutator = region.getAttributesMutator();
mutator.setCacheLoader(loader);
}
public void close() {
cache.close();
}
}
GemFire Architecture
CONFIDENTIAL
page 70
GemFire Architecture
Using a transaction
// Transaction programming
GemFireWrapper wrapper = GemfireWrapper.getInstance();
CacheTransactionManager manager = cache.getCacheTransactionManager();
manager.begin();
Item item = wrapper.getObject("items", "car");
Item txnItem = CopyHelper.copy(item);
txnItem.setDescription("limo");
try {
manager.commit();
}
catch (CommitConflictException ex) {
}
GemFire Architecture
CONFIDENTIAL
page 71
GemFire Architecture
Taken out:
Performance-oriented applications such as real-time trading systems benefit from in-memory database
technologies for managing unified orderbooks, messaging for capturing real-time data, and for querying
and receiving updates to streaming data. Online web applications that have hundreds of thousands of users
can benefit from load balancing inherent in a distributed cache and from the query services driven by database semantics. Client applications that need to run offline can take advantage of a persistent local cache to
automatically synchronize with a server when it reconnects to the server.
GemFire Architecture
CONFIDENTIAL
page 72