Mastering Mesos - Sample Chapter
Mastering Mesos - Sample Chapter
ee
P U B L I S H I N G
C o m m u n i t y
Akhil Das
$ 54.99 US
34.99 UK
pl
Dipa Dubhashi
Mastering Mesos
Mastering Mesos
Sa
m
E x p e r i e n c e
Mastering Mesos
The ultimate guide to managing, building, and deploying
large-scale clusters with Apache Mesos
D i s t i l l e d
Dipa Dubhashi
Akhil Das
Preface
Apache Mesos abstracts CPU, memory, storage, and other compute resources
away from machines (physical or virtual), enabling fault-tolerant and elastic
distributed systems to easily be built and run effectively. It improves resource
utilization, simplifies system administration, and supports a wide variety of
distributed applications that can be effortlessly deployed by leveraging its
pluggable architecture.
This book will provide a detailed step-by-step guide to deploying a Mesos cluster
using all the standard DevOps tools to write and port Mesos frameworks effectively
and in general demystify the concept of Mesos.
The book will first establish the raison d'tre of Mesos and explain its architecture in
an effective manner. From there, the book will walk the reader through the complex
world of Mesos, moving progressively from simple single machine setups to highly
complex multi-node cluster setups with new concepts logically introduced along the
way. At the end of the journey, the reader will be armed with all the resources
that he/she requires to effectively manage the complexities of today's modern
datacenter requirements.
Preface
Chapter 3, Getting Started with Mesos, covers how to manually set up and run a Mesos
cluster on the public cloud (AWS, GCE, and Azure) as well as on a private datacentre
(on premise). It also discuss the various debugging methods and explores how to
troubleshoot the Mesos setup in detail.
Chapter 4, Service Scheduling and Management Frameworks, introduces several
Mesos-based scheduling and management frameworks or applications that are
required for the easy deployment, discovery, load balancing, and failure handling
of long-running services.
Chapter 5, Mesos Cluster Deployment, explains how a Mesos cluster can be easily
set up and monitored using the standard deployment and configuration
management tools used by system administrators and DevOps engineers. It also
discusses some of the common problems faced while deploying a Mesos cluster
along with their corresponding resolutions.
Chapter 6, Mesos Frameworks, walks the reader through the concept and features of
Mesos frameworks in detail. It also provides a detailed overview of the Mesos API,
including the new HTTP Scheduler API, and provides a recipe to build custom
frameworks on Mesos.
Chapter 7, Mesos Containerizers, introduces the concepts of containers and talks a bit
about Docker, probably the most popular container technology available today. It
also provides a detailed overview of the different "containerizer" options in Mesos,
besides introducing some other topics such as networking for Mesos-managed
containers and the fetcher cache. Finally, an example of deploying containerized
apps in Mesos is provided for better understanding.
Chapter 8, Mesos Big Data Frameworks, acts as a guide to deploying important big data
processing frameworks such as Hadoop, Spark, Storm, and Samza on top of Mesos.
Chapter 9, Mesos Big Data Frameworks 2, guides the reader through deploying important
big data storage frameworks such as Cassandra, the Elasticsearch-Logstash-Kibana
(ELK) stack, and Kafka on top of Mesos.
Introducing Mesos
Apache Mesos is open source, distributed cluster management software that
came out of AMPLab, UC Berkeley in 2011. It abstracts CPU, memory, storage,
and other computer resources away from machines (physical or virtual), enabling
fault-tolerant and elastic distributed systems to be easily built and run effectively.
It is referred to as a metascheduler (scheduler of schedulers) and a "distributed
systems kernel/distributed datacenter OS".
It improves resource utilization, simplifies system administration, and supports
a wide variety of distributed applications that can be deployed by leveraging its
pluggable architecture. It is scalable and efficient and provides a host of features,
such as resource isolation and high availability, which, along with a strong and
vibrant open source community, makes this one of the most exciting projects.
We will cover the following topics in this chapter:
Introduction to frameworks
Mesos in production
[1]
Introducing Mesos
[2]
Chapter 1
The datacenter OS acts as a software layer that aggregates all servers in a datacenter
into one giant supercomputer to deliver the benefits of multilatency, isolation, and
resource control across all microservice applications. Another major advantage is the
elimination of human-induced error during the continual assigning and reassigning
of virtual resources.
From a developer's perspective, this will allow them to easily and safely build
distributed applications without restricting them to a bunch of specialized tools, each
catering to a specific set of requirements. For instance, let's consider the case of Data
Science teams who develop analytic applications that are highly resource intensive.
An operating system that can simplify how the resources are accessed, shared, and
distributed successfully alleviates their concern about reallocating hardware every
time the workloads change.
Of key importance is the relevance of the datacenter OS to DevOps, primarily
a software development approach that emphasizes automation, integration,
collaboration, and communication between traditional software developers and other
IT professionals. With a datacenter OS that effectively transforms individual servers
into a pool of resources, DevOps teams can focus on accelerating development and
not continuously worry about infrastructure issues.
In a world where distributed computing becomes the norm, the datacenter OS is
a boon in disguise. With freedom from manually configuring and maintaining
individual machines and applications, system engineers need not configure specific
machines for specific applications as all applications would be capable of running
on any available resources from any machine, even if there are other applications
already running on them. Using a datacenter OS results in centralized control and
smart utilization of resources that eliminate hardware and software silos to ensure
greater accessibility and usability even for noninfrastructural professionals.
Examples of some organizations administering their hyperscale datacenters via the
datacenter OS are Google with the Borg (and next generation Omega) systems. The
merits of the datacenter OS are undeniable, with benefits ranging from the scalability
of computing resources and flexibility to support data sharing across applications to
saving team effort, time, and money while launching and managing interoperable
cluster applications.
It is this vision of transforming the datacenter into a single supercomputer that
Apache Mesos seeks to achieve. Born out of a Berkeley AMPLab research paper in
2011, it has since come a long way with a number of leading companies, such as
Apple, Twitter, Netflix, and AirBnB among others, using it in production. Mesosphere
is a start-up that is developing a distributed OS product with Mesos at its core.
[3]
Introducing Mesos
Mesos is based on the same principles as the Linux kernel and aims to provide a
highly available, scalable, and fault-tolerant base for enabling various frameworks to
share cluster resources effectively and in isolation. Distributed applications are varied
and continuously evolving, a fact that leads Mesos design philosophy towards a thin
interface that allows an efficient resource allocation between different frameworks
and delegates the task of scheduling and job execution to the frameworks themselves.
The two advantages of doing so are:
Mesos' architecture hands over the responsibility of scheduling tasks to the respective
frameworks by employing a resource offer abstraction that packages a set of resources
and makes offers to each framework. The Mesos master node decides the quantity
of resources to offer each framework, while each framework decides which resource
offers to accept and which tasks to execute on these accepted resources. This method
of resource allocation is shown to achieve a good degree of data locality for each
framework sharing the same cluster.
[4]
Chapter 1
Introduction to frameworks
A Mesos framework sits between Mesos and the application and acts as a layer to
manage task scheduling and execution. As its implementation is application-specific,
the term is often used to refer to the application itself. Earlier, a Mesos framework
could interact with the Mesos API using only the libmesos C++ library, due to which
other language bindings were developed for Java, Scala, Python, and Go among
others that leveraged libmesos heavily. Since v0.19.0, the changes made to the
HTTP-based protocol enabled developers to develop frameworks using the language
they wanted without having to rely on the C++ code. A framework consists of
two components: a) Scheduler and b) Executor.
Scheduler is responsible for making decisions on the resource offers made to it and
tracking the current state of the cluster. Communication with the Mesos master is
handled by the SchedulerDriver module, which registers the framework with the
master, launches tasks, and passes messages to other components.
The second component, Executor, is responsible, as its name suggests, for the
execution of tasks on slave nodes. Communication with the slaves is handled by
the ExecutorDriver module, which is also responsible for sending status updates
to the scheduler.
[5]
Introducing Mesos
The Mesos API, discussed later in this chapter, allows programmers to develop
their own custom frameworks that can run on top of Mesos. Some other features
of frameworks, such as authentication, authorization, and user management,
will be discussed at length in Chapter 6, Mesos Frameworks.
Long-running services
Singularity: This is a scheduler (the HTTP API and web interface) for
running Mesos tasks, such as long-running processes, one-off tasks,
and scheduled jobs.
[6]
Chapter 1
Batch scheduling
Data storage
[7]
Introducing Mesos
Attributes
Attributes are used to describe certain additional information regarding the
slave node, such as its OS version, whether it has a particular type of hardware, and
so on. They are expressed as key-value pairs with support for three different value
typesscalar, range, and textthat are sent along with the offers to frameworks.
Take a look at the following code:
attributes : attribute ( ";" attribute )*
attribute : text ":" ( scalar | range | text )
Resources
Mesos can manage three different types of resources: scalars, ranges, and sets. These
are used to represent the different resources that a Mesos slave has to offer. For
example, a scalar resource type could be used to represent the amount of CPU
on a slave. Each resource is identified by a key string, as follows:
resources : resource ( ";" resource )*
resource : key ":" ( scalar | range | set )
key : text ( "(" resourceRole ")" )?
resourceRole : text | "*"
cpus
mem
disk
ports
In particular, a slave without the cpu and mem resources will never have its resources
advertised to any frameworks. Also, the master's user interface interprets the scalars
in mem and disk in terms of MB. For example, the value 15000 is displayed as
14.65GB.
[8]
Chapter 1
Examples
Here are some examples of configuring the Mesos slaves:
resources='cpus:24;mem:24576;disk:409600;ports:[2100024000];bugs:{a,b,c}'
attributes='rack:abc;zone:west;os:centos5;level:10;ke
ys:[1000-1500]'
In this case, we have three different types of resources, scalars, a range, and a set.
They are called cpus, mem, and disk, and the range type is ports.
Two-level scheduling
Mesos has a two-level scheduling mechanism to allocate resources to and launch
tasks on different frameworks. In the first level, the master process that manages
slave processes running on each node in the Mesos cluster determines the free
resources available on each node, groups them, and offers them to different
frameworks based on organizational policies, such as priority or fair sharing.
Organizations have the ability to define their own sharing policies via a custom
allocation module as well.
[9]
Introducing Mesos
1: Slave 1 reports to the master that it has four CPUs and 4 GB of memory
free. The master then invokes the allocation module, which tells it that
Framework 1 should be offered all the available resources.
[ 10 ]
Chapter 1
4: The master sends the tasks to the slave, which allocates appropriate
resources to the framework's executor, which in turn launches the two tasks.
As one CPU and 1 GB of RAM are still free, the allocation module may now
offer them to Framework 2. In addition, this resource offers process repeats
when tasks finish and new resources become free.
Mesos also provides frameworks with the ability to reject resource offers. A
framework can reject the offers that do not meet its requirements. This allows
frameworks to support a wide variety of complex resource constraints while
keeping Mesos simple at the same time. A policy called delay scheduling, in which
frameworks wait for a finite time to get access to the nodes storing their input data,
gives a fair level of data locality albeit with a slight latency tradeoff.
If the framework constraints are complex, it is possible that a framework might need
to wait before it receives a suitable resource offer that meets its requirements. To
tackle this, Mesos allows frameworks to set filters specifying the criteria that they
will use to always reject certain resources. A framework can set a filter stating that
it can run only on nodes with at least 32 GB of RAM space free, for example. This
allows it to bypass the rejection process, minimizes communication overheads,
and thus reduces overall latency.
Resource allocation
The resource allocation module contains the policy that the Mesos master uses to
determine the type and quantity of resource offers that need to be made to each
framework. Organizations can customize it to implement their own allocation
policyfor example, fair sharing, priority, and so onwhich allows for fine-grained
resource sharing. Custom allocation modules can be developed to address specific
needs.
The resource allocation module is responsible for making sure that resources are
shared in a fair manner among competing frameworks. The choice of algorithm
used to determine the sharing policy has a great bearing on the efficiency of a cluster
manager. One of the most popular allocation algorithms, max-min fairness, and its
weighted derivative are described in the following section.
[ 11 ]
Introducing Mesos
[ 12 ]
Chapter 1
Resource isolation
One of the key requirements of a cluster manager is to ensure that the allocation of
resources to a particular framework does not have an impact on any active running
jobs of some other framework. Provision for isolation mechanisms on slaves to
compartmentalize different tasks is thus a key feature of Mesos. Containers are
leveraged for resource isolation with a pluggable architecture. The Mesos slave uses
the Containerizer API to provide an isolated environment to run a framework's
executor and its corresponding tasks. The Containerizer API's objective is to support a
wide range of implementations, which implies that custom containerizers and isolators
can be developed. When a slave process starts, the containerizer to be used to launch
containers and a set of isolators to enforce the resource constraints can be specified.
The Mesos Containerizer API provides a resource isolation of framework executors
using Linux-specific functionality, such as control groups and namespaces. It also
provides basic support for POSIX systems (only resource usage reporting and not
actual isolation). This important topic will be explored at length in subsequent
chapters.
Mesos also provides network isolation at a container level to prevent a single
framework from capturing all the available network bandwidth or ports. This is not
supported by default, however, and additional dependencies need to be installed
and configured in order to activate this feature.
Monitoring in Mesos
In this section, we will take a look at the different metrics that Mesos provides
to monitor the various components.
[ 13 ]
Introducing Mesos
Network statistics for each active container are published through the
/monitor/statistics.json endpoint on the slave.
Types of metrics
Mesos provides two different kinds of metrics: counters and gauges. These can be
explained as follows:
Messages
Mesos implements an actor-style message-passing programming model to enable
nonblocking communication between different Mesos components and leverages
protocol buffers for the same. For example, a scheduler needs to tell the executor
to utilize a certain number of resources, an executor needs to provide status
updates to the scheduler regarding the tasks that are executed, and so on. Protocol
buffers provide the required flexible message delivery mechanism to enable this
communication by allowing developers to define custom formats and protocols that
can be used across different languages. For more details regarding the messages that
are passed between different Mesos components, refer to https://github.com/
apache/mesos/blob/master/include/mesos/mesos.proto
[ 14 ]
Chapter 1
API details
A brief description of the different APIs and methods that Mesos provides is
provided in the following section:
Executor API
A brief description of the Executor API is given below. For more details, visit
http://mesos.apache.org/api/latest/java/org/apache/mesos/Executor.
html.
This code is invoked once the executor driver is able to successfully connect
with Mesos. In particular, a scheduler can pass some data to its executors
through the ExecutorInfo.getData() field.
The following are the parameters:
driver: This is the executor driver that was registered and connected
to the Mesos cluster
executorInfo: This describes information about the executor that
was registered
slaveInfo: This describes the slave that will be used to launch the
This code is invoked when the executor reregisters with a restarted slave.
The following are the parameters:
driver: This is the executor driver that was reregistered with the
Mesos master
slaveInfo: This describes the slave that will be used to launch the
[ 15 ]
Introducing Mesos
The preceding code is invoked when the executor gets "disconnected" from
the slavefor example, when the slave is restarted due to an upgrade).
The following is the parameter:
Note that this task can be realized with a thread, a process, or some simple
computation; however, no other callbacks will be invoked on this executor
until this callback returns.
The following are the parameters:
This is invoked when a task running within this executor is killed via
SchedulerDriver.killTask (TaskID). Note that no status update will be
sent on behalf of the executor, and the executor is responsible for creating a
new TaskStatus protobuf message (that is, with TASK_KILLED) and invoking
ExecutorDriver.sendStatusUpdate (TaskStatus).
The following are the parameters:
driver: This is the executor driver that owned the task that
was killed
[ 16 ]
Chapter 1
This is invoked when a framework message arrives for this executor. These
messages are the best effort; do not expect a framework message to be
retransmitted in any reliable fashion.
The following are the parameters:
This is invoked when the executor terminates all of its currently running
tasks. Note that after Mesos determines that an executor has terminated, any
tasks that the executor did not send Terminal status updates for (for example,
TASK_KILLED, TASK_FINISHED, TASK_FAILED, and so on), and a TASK_LOST
status update will be created.
The following is the parameter:
The previous code is invoked when a fatal error occurs with the executor
and/or executor driver. The driver will be aborted BEFORE invoking
this callback.
The following are the parameters:
driver: This is the executor driver that was aborted due to this error
[ 17 ]
Introducing Mesos
The preceding code starts the executor driver. This needs to be called before
any other driver calls are made.
The state of the driver after the call is returned.
This aborts the driver so that no more callbacks can be made to the executor.
The semantics of abort and stop are deliberately separated so that the
code can detect an aborted driver (via the return status of join(); refer to
the following section) and instantiate and start another driver if desired
(from within the same process, although this functionality is currently not
supported for executors).
The state of the driver after the call is the return.
This waits for the driver to be stopped or aborted, possibly blocking the
current thread indefinitely. The return status of this function can be used to
determine whether the driver was aborted (take a look at mesos.proto for
a description of status).
The state of the driver after the call is the return.
[ 18 ]
Chapter 1
This starts and immediately joins (that is, blocks) the driver.
The state of the driver after the call is the return.
This sends a message to the framework scheduler. These messages are sent
on a best effort basis and should not be expected to be retransmitted in any
reliable fashion.
The parameters are as follows:
[ 19 ]
Introducing Mesos
The preceding code is invoked when the scheduler reregisters with a newly
elected Mesos master. This is only called when the scheduler is previously
registered. MasterInfo containing the updated information about the elected
master is provided as an argument.
The parameters are as follows:
master
The preceding code is invoked when resources are offered to this framework.
A single offer will only contain resources from a single slave. Resources
associated with an offer will not be reoffered to this framework until either;
(a) this framework rejects these resources (refer to SchedulerDriver.
launchTasks(java.util.Collection<OfferID>, java.util.
Collection<TaskInfo>, Filters)), or (b) these resources are rescinded
(refer to offerRescinded(org.apache.mesos.SchedulerDriver,
OfferID)). Note that resources may be concurrently offered to more than one
framework at a time, depending on the allocator being used. In this case, the
first framework to launch tasks using these resources will be able to use them,
while the other frameworks will have these resources rescinded. (Alternatively,
if a framework has already launched tasks with these resources, these tasks
will fail with a TASK_LOST status and a message saying as much).
[ 20 ]
Chapter 1
driver: This is the driver that was used to run this scheduler
This is invoked when an offer is no longer valid (for example, the slave is
lost or another framework is used resources in the offer). If, for whatever
reason, an offer is never rescinded (for example, a dropped message, failing
over framework, and so on), a framework that attempts to launch tasks
using an invalid offer will receive a TASK_LOST status update for these tasks
(take a look at resourceOffers(org.apache.mesos.SchedulerDriver,
java.util.List<Offer>)).
The following are the parameters:
driver: This is the driver that was used to run this scheduler
driver: This is the driver that was used to run this scheduler
and status
[ 21 ]
Introducing Mesos
This is invoked when the scheduler becomes disconnected from the master
(for example, the master fails and another takes over).
The following is the parameter:
driver: This is the driver that was used to run this scheduler
driver: This is the driver that was used to run this scheduler
[ 22 ]
Chapter 1
driver: This is the driver that was used to run this scheduler
driver: This is the driver that was used to run this scheduler
This starts the scheduler driver. It needs to be called before any other driver
calls are made.
The preceding returns the state of the driver after the call.
[ 23 ]
Introducing Mesos
This stops the scheduler driver. If the failover flag is set to false, it is
expected that this framework will never reconnect to Mesos. So, Mesos
will unregister the framework and shut down all its tasks and executors.
If failover is true, all executors and tasks will remain running (for some
framework-specific failover timeout), allowing the scheduler to reconnect
(possibly in the same process or from a different processfor example,
on a different machine).
The following is the parameter:
This stops the scheduler driver assuming no failover. This will cause Mesos
to unregister the framework and shut down all its tasks and executors.
This returns the state of the driver after the call.
This aborts the driver so that no more callbacks can be made to the scheduler.
The semantics of abort and stop are deliberately separated so that code
can detect an aborted driver (via the return status of join(); refer to the
following section) and instantiate and start another driver if desired from
within the same process.
This returns the state of the driver after the call.
This waits for the driver to be stopped or aborted, possibly blocking the
current thread indefinitely. The return status of this function can be used to
determine whether the driver was aborted (take a look at mesos.proto for a
description of Status).
This returns the state of the driver after the call.
[ 24 ]
Chapter 1
This starts and immediately joins (that is, blocks) the driver.
It returns the state of the driver after the call.
The preceding code launches the given set of tasks on a set of offers.
Resources from offers are aggregated when more than one is provided. Note
that all the offers must belong to the same slave. Any resources remaining
(that is, not used by the tasks or their executors) will be considered
declined. The specified filters are applied on all unused resources (take a
look at mesos.proto for a description of Filters). Invoking this function
with an empty collection of tasks declines offers in their entirety (refer to
declineOffer(OfferID, Filters)).
The following are the parameters:
[ 25 ]
Introducing Mesos
This kills the specified task. Note that attempting to kill a task is currently not
reliable. If, for example, a scheduler fails over while it attempts to kill a task,
it will need to retry in the future. Likewise, if unregistered/disconnected, the
request will be dropped (these semantics may be changed in the future).
The following is the parameter:
This declines an offer in its entirety and applies the specified filters on the
resources (take a look at mesos.proto for a description of Filters). Note
that this can be done at any time, and it is not necessary to do this within
the Scheduler.resourceOffers(org.apache.mesos.SchedulerDriver,
java.util.List<Offer>) callback.
The following are the parameters:
filters: These are the filters to be set for any remaining resources
This removes all the filters previously set by the framework (via
launchTasks(java.util.Collection<OfferID>, java.util.
Collection<TaskInfo>, Filters)). This enables the framework
[ 26 ]
Chapter 1
This sends a message from the framework to one of its executors. These
messages are sent on a best effort basis and should not be expected to be
retransmitted in any reliable fashion.
The parameters are:
This allows the framework to query the status for nonterminal tasks. This
causes the master to send back the latest task status for each task in statuses
if possible. Tasks that are no longer known will result in a TASK_LOST update.
If statuses is empty, the master will send the latest status for each task
currently known.
The following are the parameters:
Mesos in production
Mesos is in production at several companies such as Apple, Twitter, and HubSpot
and has even been used by start-ups such as Mattermark and Sigmoid. This broad
appeal is a validation of Mesos' tremendous utility. Apple, for example, powers its
consumer-facing, missioncritical, popular Siri application through a large Mesos
cluster (allegedly spanning tens of thousands of nodes). One such case study
(published on the Mesosphere website) is discussed here.
[ 27 ]
Introducing Mesos
Benefits
Mesos provides numerous benefits to both the development team and the company.
At HubSpot, developers own the operation of their applications. With Mesos,
developers can deploy services faster and with less maintenance. Here are some
of the other benefits:
[ 28 ]
Chapter 1
Scheduled tasks (cron jobs) are now exposed via a web interface and are not
tied to a single server, which may fail at any time, taking the cron job with it.
Mesos also simplifies the technology stack required to requisition hardware and
manage it from an operations perspective. HubSpot can standardize server footprints
and simplify the base image upon which Mesos slaves are executed.
Lastly, resource utilization is improved, which directly corresponds with reducing
costs. Services, which previously ran on overprovisioned hardware now use the
exact amount of resources requested.
Additionally, the QA environment runs at 50% of its previous capacity as the
HubSpot scheduler ensures that services are restarted when they fail. This means
that it is no longer necessary to run multiple copies of services inside QA for high
availability.
Challenges
A core challenge behind adoption is introducing a new deployment technology
to a group of 100 engineers who are responsible for managing their applications
on a daily basis. HubSpot mitigated this challenge by building a UI around Mesos
and utilizing Mesos to make the deployment process as simple and rewarding
as possible.
Looking ahead
HubSpot sees Mesos as a core technology behind future migrations into other
datacenters. As both a virtualization and deployment technology, Mesos has proven
to be a rewarding path forward. Additionally, HubSpot hopes to eventually leverage
Mesos to dynamically scale out processes based on load, shrink and grow the cluster
size relative to demand, and assist developers with resource estimation.
Detailed steps to download the code bundle are mentioned in the Preface
of this book. Please have a look. The code bundle for the book is also
hosted on GitHub at https://github.com/PacktPublishing/
Mastering-Mesos. We also have other code bundles from our rich
catalog of books and videos available at https://github.com/
PacktPublishing/. Check them out!
[ 29 ]
Introducing Mesos
Summary
In this chapter, we introduced Mesos, dived deep into its architecture, and discussed
some important topics, such as frameworks, resource allocation, and resource
isolation. We also discussed the two-level scheduling approach that Mesos employs
and provided a detailed overview of its API. The HubSpot case study at the end
was to show how it is used in production and that it is ready for prime time. The
objective was to explain what Mesos is and why it is required and provide a
high-level overview of how it works.
In the next chapter, we will deep dive into its important features and understand
how it contributes to scaling, efficiency, high availability, and extendibility.
[ 30 ]
www.PacktPub.com
Stay Connected: