Architectures, Design Methodologies, and Service Composition Techniques For Grid Job and Resource Management
Architectures, Design Methodologies, and Service Composition Techniques For Grid Job and Resource Management
Per-Olov Östberg
p-o@cs.umu.se
c 2009 by authors
Copyright
Except Paper I,
c Springer-Verlag, 2007
Paper II,
c Springer-Verlag, 2008
Paper III,
c Crete University Press, 2008
c Springer-Verlag, 2009
Paper IV,
ISBN 978-91-7264-861-6
ISSN 0348-0542
UMINF 09.15
The field of Grid computing has in recent years emerged and been established as
an enabling technology for a range of computational eScience applications. The use
of Grid technology allows researchers and industry experts to address problems too
large to efficiently study using conventional computing technology, and enables new
applications and collaboration models. Grid computing has today not only introduced
new technologies, but also influenced new ways to utilize existing technologies.
This work addresses technical aspects of the current methodology of Grid com-
puting; to leverage highly functional, interconnected, and potentially under-utilized
high-end systems to create virtual systems capable of processing problems too large
to address using individual (supercomputing) systems. In particular, this thesis studies
the job and resource management problem inherent to Grid environments, and aims to
contribute to development of more mature job and resource management systems and
software development processes. A number of aspects related to Grid job and resource
management are here addressed, including software architectures for Grid job man-
agement, design methodologies for Grid software development, service composition
(and refactorization) techniques for Service-Oriented Grid Architectures, Grid infras-
tructure and application integration issues, and middleware-independent and transpar-
ent techniques to leverage Grid resource capabilities.
The software development model used in this work has been derived from the
notion of an ecosystem of Grid components. In this model, a virtual ecosystem is
defined by the set of available Grid infrastructure and application components, and
ecosystem niches are defined by areas of component functionality. In the Grid ecosys-
tem, applications are constructed through selection and composition of components,
and individual components subject to evolution through meritocratic natural selec-
tion. Central to the idea of the Grid ecosystem is that mechanisms that promote traits
beneficial to survival in the ecosystem, e.g., scalability, integrability, robustness, also
influence Grid application and infrastructure adaptability and longevity.
As Grid computing has evolved into a highly interdisciplinary field, current Grid
applications are very diverse and utilize computational methodologies from a number
of fields. Due to this, and the scale of the problems studied, Grid applications typically
place great performance requirements on Grid infrastructures, making Grid infrastruc-
ture design and integration challenging tasks. In this work, a model of building on,
and abstracting, Grid middlewares has been developed and is outlined in the papers.
In addition to the contributions of this thesis, a number of software artefacts, e.g., the
Grid Job Management Framework (GJMF), have resulted from this work.
iii
iv
Preface
This thesis consists of a brief introduction to the field, a short discussion of the main
problems studied, and the following papers.
This research was conducted using the resources of the High Performance Com-
puting Center North (HPC2N). Financial support has been provided by The Swedish
Research Council (VR) under contract 621-2005-3667.
v
vi
Acknowledgements
A number of people have directly or indirectly contributed to the work in this thesis
and deserve acknowledgement. First of all, I would like to thank my advisor Erik
Elmroth for not only the opportunities provided and all the hard work, but also for
the positive environment he creates in our research group. I would also like to thank
my coadvisor Bo Kågström for inspiring discussions and the unique perspective he
brings to them. Among my colleagues in the GIRD group I would like to thank Lars
Larsson and Johan Tordsson for lengthy discussions of all things more or less related
to our work, and Francisco Hernández, Daniel Henriksson, Raphaela Bieber-Bardt,
Arvid Norberg, and Peter Gardfjäll (in no particular order) for all their contributions
to our collective effort. Among our research partners I would like to thank Sverker
Holmgren, Jonas Lindemann, and Salman Toor for interesting collaborations, and the
support staff of HPC2N for their contributions and knowledge of the Grid systems we
use. Finally, on a personal level I would like to thank my family and friends, without
whom none of this would be possible, for all the love and support they provide.
Thank you all.
vii
viii
Contents
1 Introduction 1
2 Grid Applications 3
3 Grid Infrastructure 7
4 Grid Job and Resource Management 9
4.1 Grid Environments 9
4.2 Grid Resources and Middlewares 11
4.3 Resource Management 12
4.4 Resource Brokering 12
4.5 Job Control 13
4.6 Job Management 14
4.7 (Non-Intrusive) Interoperability 15
5 An Ecosystem of Grid Components 17
6 Thesis Contributions 19
6.1 Paper I 19
6.2 Paper II 20
6.3 Paper III 20
6.4 Paper IV 20
6.5 Paper V 21
7 Future Work 23
Paper I 33
Paper II 47
Paper III 63
Paper IV 79
Paper V 93
ix
x
Chapter 1
Introduction
In the past decade, Grid computing has emerged and been established as an
enabling technology for a range of computational eScience applications. A
number of definitions of Grid computing exist, e.g., [33, 37, 64], and while
the scientific community has reached a certain level of agreement on what a
Grid is [58], best practices for Grid design and construction are still topics
for investigation. The definition used in this thesis details Grid computing to
be a type of distributed computing focused on aggregation of computational
resources for creation of meta-scale virtual supercomputers and systems.
As a paradigm, Grid computing revolves around concepts such as service
availability, performance scalability, virtualization of services and resources,
and resource (access) transparency [40, 58]. The current methodology of the
field is to leverage interconnected high-end systems to create virtual systems
capable of great performance scalability, high availability, and collaborative
resource sharing [37]. The approach taken in this work employs loosely cou-
pled and decentralized resource aggregation models, assumes resources to be
aggregated from multiple ownership domains, and expects all Grid services and
components to be subject to resource contention, i.e. to coexist with competing
mechanisms.
Grid technology and infrastructure have today found application in fields
as diverse as, e.g., life sciences, material sciences, climate studies, astrophysics,
and computational chemistry, making Grid computing an interdisciplinary field.
Current Grid applications occupy all niches of scientific computation, ranging
from embarrassingly parallel high-throughput applications to distributed and
synchronized data collation and collaboration projects.
Actors within, and contributions to, the field of Grid computing can broadly
be segmented into two main categories; application and infrastructure. Grid
applications often stem from independently developed computational method-
ologies more or less suited for use in Grid environments, and are often limited
(in Grid usage scenarios) by how well their methodology lends itself to par-
allelization. Motivations for migration to Grid environments vary, but often
1
include envisioned performance benefits or synergetic collaboration effects.
Typically, Grids are designed to provide a level of scalability beyond what is
offered by individual supercomputer systems. System requirements vary with
Grid application needs, and usually incorporate advanced demands for stor-
age, computational, or transmission capacity, which places great performance
requirements on underlying Grid infrastructure at both component and system
level. These conditions, combined with typical interdisciplinary requirements of
limited end-user system complexity, automation, and high system availability,
make Grid infrastructure design and resource federation challenging tasks.
The focus of this thesis lies on research questions related to Grid infrastruc-
ture and application design, with emphasis on job and resource management
issues. In particular, abstraction of Grid job management interfaces, and re-
lated application and infrastructure component integration issues have been
studied in the context of federated Grid environments. The methodology of
this work includes investigation of architectural design patterns inspired by the
notion of an ecosystem of Grid infrastructure components [62], and exploration
of Service-Oriented Architecture (SOA)-based [51] implementation techniques.
The concept of an ecosystem of Grid infrastructure components, where ap-
plications are composed through selection of software from an ecosystem of
components, and individual components are subject to meritocratic natural
selection, is further described in Section 5.
Two of the overarching goals of the GIRD [63] project, in which this re-
search has been performed, are to investigate and propose architectures for
abstraction and provisioning of Grid functionality, and to provide proof-of-
concept implementations of proposed architectures. Scientific contributions of
this thesis include investigation of architectural design patterns, development of
Grid infrastructure task algorithms, and contributions to formulation of design
methodologies for scalable Grid infrastructure and application components.
The rest of this thesis is structured as follows. Section 2 provides an in-
troduction to Grid applications and outlines a few of the requirements Grid
applications impose on Grid infrastructures and environments. Section 3 dis-
cusses Grid infrastructure and covers some of the trade-offs involved in Grid
infrastructure design and development. Section 4 provides an overview of the
Grid job and resource management problem, and structures the area into con-
stituent processes while briefly referencing some of the work within the field.
Section 5 sketches the notion of an ecosystem of Grid components, and serves
here as an introduction to the perspective chosen in this work. Section 6 sum-
marizes the contributions of this thesis, and relates the thesis papers to each
other. Finally, Section 7 outlines some future directions for this work, and
references current efforts related to the work of this thesis.
2
Chapter 2
Grid Applications
3
paradigms exist, e.g., the more recently formulated Many Task Computing
(MTC) [54] paradigm. MTC applications focus on running large amounts of
tasks over short periods of time, are typically communication-intensive but not
naturally expressed using synchronized communication patterns like MPI, and
measure performance using (application) domain-specific metrics.
Beside obvious computational requirements, Grid applications typically also
impose advanced system performance requirements for, e.g.,
• storage capacity: Grid applications potentially process very large data
sets, and often do so without predictable access patterns.
4
From a performance perspective, the construction of Grid systems is facili-
tated by improvements in computational and network capacity, and motivated
by general availability of highly functional and well connected end systems.
Increase in network capacity alone has lead to changes in computing geometry
and geography [37], and technology advances have today made massive-scale
collaborative resource sharing not only feasible, but approaching ubiquitous.
From an application perspective, Grid computing holds promise of more ef-
ficient models for collaboration when addressing larger and more complex prob-
lems, less steep learning curves (as compared to traditional high-performance
computing), increased system utilization rates, and efficient computation sup-
port for broader ranges of applications. While Grids of today have achieved
much in system utilization, scalability and performance, much work in reducing
system complexity and increasing system usability still remain [58].
5
6
Chapter 3
Grid Infrastructure
The name Grid computing originated from an analogy in the initial guiding
vision of the field; to provide access to the capabilities of computational re-
sources in a way similar to how power grids provide electricity [37], i.e. with
transparency in
• resource selection (i.e. which resource to use).
7
From a systems perspective, Grid computing revolves around concepts such
as (performance) scalability, virtualization, and transparency [40]. Performance
scalability here refers to the ability of a system to dynamically increase the
computational (or storage, network, etc.) capacity of the system to meet the
requirements of an application on demand. Virtualization here denotes the
process of abstracting computational resources, a practice that can be found
on all levels of a Grid. For example, Grid applications’ use of infrastructure is
often abstracted and hidden from end-users, Grid systems and infrastructure
typically abstract the use of computational resources from the view of appli-
cations, and access to Grid computational resources is typically abstracted by
native resource access layers, e.g., batch systems. The term transparency is
used to describe that, like access to systems and system components, scal-
ability should be automatic and not require manual efforts or knowledge of
underlying systems to realize access to, or increase in, system capacity.
Typically today, performance scalability is achieved in Grid systems through
dynamic provisioning of multiple computational resources over a network, vir-
tualization through interface abstraction mechanisms, and transparency through
automation of core Grid component tasks (such as resource discovery, resource
brokering, file staging, etc.).
To facilitate flexible resource usage models, Grid users and resource allot-
ments are typically organized in Virtual Organizations (VOs) [38]. VOs is a
key concept in Grid computing that pertains to virtualization of a system’s
user base around a set of resource-sharing rules and conditions. The formula-
tion of VOs stems from the dynamical nature of resource sharing where resource
availability, sharing conditions, and organizational memberships vary over time.
This mechanism allows Grid resource usage allotments to be administrated and
provided by decentralized organizations, to whom individual users and projects
can apply for memberships and resource usage credits. VOs employ scalable re-
source allotment mechanisms suitable for cross-ownership domain aggregation
of resources, and provide a way to provision resource usage without pre-existing
trust relationships between resource owners and individual Grid users.
In summary, a Grid computing infrastructure should provide flexible and
secure resource access and utilization through coordinated resource sharing
models to dynamic collections of individuals and organizations. Furthermore,
resources and users should be organized in Virtual Organizations and systems
be devoid of centralized control, scheduling omniscience, and pre-existing trust
relationships.
8
Chapter 4
A core task set of any Grid infrastructure is job and resource management, a
term here used to collectively reference a set of processes and issues related
to execution of programs on computational resources in Grid environments.
This includes, e.g., management, monitoring, and brokering of computational
resources; description, submission, and monitoring of jobs; fairshare scheduling
[46] and accounting in Virtual Organizations; and various cross-site adminis-
trational and security issues.
Grid job and resource management tasks seem intuitive when viewed indi-
vidually, but quickly become complex when considered as parts of larger sys-
tems. A number of component design trade-offs, requirements, and conditions
are introduced by core Grid requirements for, e.g., system scalability and trans-
parency, and tend to become oxymoronic when individual component designs
are kept strictly task oriented. An approach taken in this work is to primarily
regard components as parts of systems, and focus on component interoperabil-
ity to promote system composition flexibility [26]. The primary focus of the
Grid job and resource management contributions here is to abstract system
complexity and heterogeneity, and to allow applications to leverage resource
capabilities without becoming tightly coupled to particular Grids or Grid mid-
dlewares [27].
9
Figure 1: A naive Grid model. Grids aggregate clusters of computational
resources, which may be part of multiple Grids. Federated Grid environments
are composed from collaborative federation of existing Grids.
10
4.2 Grid Resources and Middlewares
A typical HPC Grid resource consists of a high-end computer system equipped
with (possibly customized) software such as
• data access and transfer utilities, e.g., GridFTP [18].
• batch systems and scheduling mechanisms, e.g., PBS [10] and Maui [48].
• job and resource monitoring tools, e.g., GridLab Mercury Monitor [8].
• computation frameworks, e.g., BLAST [60].
HTC resources are of more varied nature, CPU-cycle scavenging schemes such
as Condor [61] for example typically utilize standard desktop machines, while
volunteer computing efforts such as distributed.net [17] may see use of any
type of computational resource provided by end-users. HTC Grids often deploy
softwares that can be considered part of Grid middlewares on computational
resources, e.g., Condor and BOINC [2] clients.
Grids are created through aggregation of computational resources, typically
using Grid middlewares to abstract complexity and details of native resource
systems such as schedulers and batch systems. Grid middlewares are (typically
distributed) systems that act on top of local resource systems, abstracting
native system interfaces, and provide interoperability between computational
systems. To applications, Grid middlewares offer virtualized access to resource
capabilities through abstractive job submission and control interfaces, informa-
tion systems, and authentication mechanisms.
A number of different Grid middlewares exist, e.g., ARC [19], Globus [42],
UNICORE [59], LCG/gLite [13], and vary greatly in design and implementa-
tion. In a simplified model, Grid middlewares contain functionality for
• resource discovery, often through specialized information systems.
11
4.3 Resource Management
Grid resources are typically owned, operated, and maintained by local resource
owners. Local resource sharing policies override Grid resource policies; compu-
tational resources shared in Grid environments according to defined schedules
are possibly not available to Grid users outside scheduled hours. Due to this,
and hardware and software failures, administrational downtime, etc., Grid re-
sources are generally considered volatile.
In Grid systems, resource volatility is typically abstracted using dynamic
service description and discovery techniques, utilizing loosely coupled models
[65] for client-resource interaction. Local resource owners publish information
about systems and resources in information systems, and Grid clients, e.g.,
resource brokers and submission engines, discover resources on demand and
utilize the best resources currently available during the job submission phase.
Reliable resource monitoring mechanisms are critical to operation in Grid
environments. While resource characteristics, e.g., hardware specifications and
software installations, can be considered static, factors such as resource avail-
ability, load, and queue status are inherently dynamic. To facilitate Grid
utilization and resource brokering, resource monitoring systems are used to
provide information systems resource availability and status data.
As resource monitoring systems and information systems in Grid environ-
ments typically exist in different administrational domains, resource status in-
formation need to be disseminated through well-defined, machine-interpretable
interfaces. The Web Service Resource Framework (WSRF) [35] specification
family addresses Web Service state management issues, and contain interface
definitions and notification mechanisms suitable for this task. In Grid environ-
ments, information systems potentially contain large quantities of information
and can be segmented and hierarchically aggregated to partition resource in-
formation into manageable proportions.
12
• parameters and environmental settings.
• hardware requirements, e.g., CPU, storage, and memory requirements.
• software requirements, e.g., required libraries and licenses.
• file staging information, e.g., data location and access protocols.
• meta-information, e.g., duration estimates and brokering preferences.
Resource brokering is subject to heuristic constraints and optimality criteria
such as minimization of cost, maximization of resource computational capacity,
minimization of data transfer time, etc., and is typically complicated by factors
such as missing or incomplete brokering information, propagation latencies in
information systems, and existence of competing scheduling mechanisms [30].
A common federated Grid environment characteristic designed to promote
scalability is absence of scheduling omniscience. From this, two fundamental
observations can be made. First, no scheduling mechanism can expect to mo-
nopolize job scheduling, all schedulers are forced to collaborate and compete
with other mechanisms. Second, due to factors such as system latencies, in-
formation caching and status polling intervals, all Grid schedulers operate on
information which to some extent is obsolete [31]. In these settings, Grid bro-
kers and schedulers need to adapt to their environments and design emphasis
should be placed on coexistence [27]. In particular, care should be taken to not
reduce total Grid system performance, or performance of competing systems,
through inefficient mechanisms in brokering and scheduling processes.
13
5. clean up: job data, temporary, and execution files are removed from the
computational resource.
Naturally, ability to prematurely abort and externally monitor job execu-
tions must be provided by job control systems. In general, most systems of
this complexity are built in layers, and Grid middlewares typically provide job
control interfaces that abstract native resource system complexity.
As in any distributed system, a number of failures ranging from submission
and execution failures to security credential validation and file transfer errors
may occur during the job execution process. To facilitate client failure manage-
ment and error recovery, failure context information must be provided clients.
In Grid systems, failure management is complicated by factors such as resource
ownership boundaries and resource volatility issues. Care must also be taken to
isolate jobs executions, and to ensure that distribution of failure contexts not
result in information leakage. Typically, Grids make use of advanced security
features that make failure management, administration, and direct access to
resource systems complex.
14
[43] details interfaces for push model status notifications suitable for Grid job
management architectures.
A common advanced Grid application requirement is to, possibly condi-
tionally, run batches of jobs sequentially or in parallel. One way to organize
these sets is in Grid workflows [50], where job interdependencies and coordina-
tion information are expressed along with job descriptions. In simple versions,
workflows can be seen as job descriptions for sets of jobs. In more advanced
versions, e.g., the Business Process Execution Language (BPEL) [4], workflows
may themselves contain script-like instruction sets for, e.g., conditional execu-
tion, looping, and branching of jobs. When using workflows, Grid applications
rely on workflow engines, e.g., Taverna [52], Pegasus [16], and Grid infrastruc-
tures to automate execution of job sets. An important question here becomes
abstraction of level of detail, and balancing of level of detail against level of
control for advanced job management systems [23].
Advanced job management systems may also provision functionality for
customization of job execution, control, and management. In this case, job
management components should provide interfaces for customization that does
not require end-users or administrators to replace entire system components,
but rather offer flexible configuration and code injection mechanisms [26, 27].
15
16
Chapter 5
An Ecosystem of Grid
Components
Currently, a number of open research questions regarding Grid and Cloud com-
puting software design are being addressed by the scientific community. A com-
mon problem in current efforts is that applications tend to be tightly coupled
to specific middlewares or Grids, and lack ability to be generally applicable
to computational problems [25]. This work addresses Grid software design
methodologies for computational eScience applications that support the ma-
jority of current computational approaches, and places focus on infrastructure
composition and scalability rather than specific problem sets [24].
The methodology of this work builds on the idea of an ecosystem of Grid
infrastructure components [62], which encompasses a view of a software ecosys-
tem where individual components compete and collaborate for survival on an
evolutionary basis. Fundamental to this idea is the notion of software niches,
areas of functionality defined and populated by software components that in-
teract and provision use of Grid resources to applications and end-users. Here,
standardization of interfaces and software components help define niche bound-
aries, and continuous development of Grid infrastructure components and inte-
gration with eScience applications help shape and redefine niches (as well as the
ecosystem at large) through competition, innovation, diversity, and evolution.
In this approach, identification and exploration of component and system
traits likely to promote software survival in the Grid ecosystem are central, and
generally help in identification and formulation of research questions. Softwares
designed using this methodology focus on establishment of core functionality,
and adapt to, and integrate with, members of neighboring niches rather than
attempt to replace them.
Currently, advanced eScience applications and computational infrastruc-
tures require software and systems to scale with problem complexity and si-
multaneously abstract heterogeneity issues introduced by this scalability. For
17
usability, software also require interoperability and robustness to enable au-
tomation of repetitive tasks in computational environments, and flexibility in
configuration and deployment to be employed in environments with great vari-
ance in usage and deployment requirements. The approach taken in this work
is to build on top of Grid middlewares and create layers of flexible software that
interoperate non-intrusively with components from different niches in the Grid
ecosystem, and allow applications to be decoupled from Grid middlewares.
18
Chapter 6
Thesis Contributions
Large portions of the work in this thesis focus on Grid job and resource man-
agement issues, and address how these can be approached using middleware-
independent techniques. Two of the papers outline and discuss approaches to
Grid software development, one from a software engineering perspective (II),
and one from a system (re)factorization point of view (III). Two of the papers (I
and V) investigate and outline a generic architecture for Grid job management
capable of adoption in a majority of existing Grid computing environments.
Paper IV studies integration issues related to use of the proposed job manage-
ment architecture, and details an integration architecture building on it.
6.1 Paper I
Paper I [21] investigates software design issues for Grid job management tools.
Building on experiences from previous work [28, 29, 31], an architectural model
for construction of a middleware-independent Grid job management system is
proposed, and the design is detailed from an architectural point of view. In
this work, a layered architecture of composable services that each manage a
separate part of the Grid job management process is outlined, and design and
implementation implications of this architecture are discussed. The architec-
ture separates applications from infrastructure through a customizable set of
services, and provides middleware-independence through use of (possible third
party) middleware adaption plug-ins.
A Globus Toolkit 4-based [34] prototype implementation of some of the
services in the architecture is presented, and the services are integrated with
the ARC [19] and Globus [42] middlewares. To demonstrate the feasibility of
the approach, preliminary results from prototype testing are presented along
with an evaluation of system performance and system use cases.
19
6.2 Paper II
Paper II [24] analyzes Grid software development practices from a software
engineering perspective. An approach to software development for high-level
Grid resource management tools is presented, and the approach is illustrated
by a discussion of software engineering attributes such as design heuristics,
design patterns, and quality attributes for Grid software development.
The notion of an ecosystem of Grid infrastructure components is extended
upon, and Grid component coexistence, composability, adoptability, adaptabil-
ity, and interoperability are discussed in this context. The approach is illus-
trated by five case studies from recent software development efforts within the
GIRD project; the Job Submission Service (JSS) [31], the Grid Job Manage-
ment Framework (GJMF) [27], the Grid Workflow Execution Engine (GWEE)
[22], the SweGrid Accounting System (SGAS) [41], and the Grid-Wide Fair-
share Scheduling System (FSGrid) [20].
6.4 Paper IV
Paper IV [25] addresses Grid software integration issues and discusses prob-
lems inherent to Grid applications being tightly coupled to Gird middlewares.
The paper proposes an architecture for system integration focused on seamless
integration of applications and Grid middlewares through a mediating layer
handling resource brokering and notification delivery. The proposed architec-
ture is illustrated in a case study where the LUNARC application portal [47] is
integrated with the Grid Job Management Framework [27] presented in papers
I and V. The proposed integration architecture is evaluated in a performance
evaluation and findings from the integration efforts are presented throughout
the paper.
20
6.5 Paper V
Paper V [27] further elaborates on the work of Paper I, and proposes a compos-
able Service-Oriented Architecture-based framework architecture for middleware-
independent Grid job management. The proposed architecture is presented in
the context of development and deployment in an ecosystem of Grid compo-
nents, and software requirements and framework composition are discussed in
detail. The model of Paper I is extended with additional services for job descrip-
tion translation, system monitoring and logging, as well as a broader integration
support functionality range. Furthermore, a proof-of-concept implementation
of the entire framework is presented and evaluated in a performance evaluation
that illustrates some of the major trade-offs in framework use.
The Grid ecosystem model of Paper II is further developed and discussed
in the context of the proposed job management architecture, and the software
composition techniques of Paper III are built upon and evaluated in the con-
text of this project. Throughout the paper, a number of software design and
implementation findings are presented, and the framework is related to a set of
similar software development efforts within adjoining Grid ecosystem niches.
21
22
Chapter 7
Future Work
A number of possible future extensions to the work of this thesis have been
identified, some of which are currently pursued within the GIRD project. Fur-
ther development, and documentation of experiences from use of the software
development model of Paper II is a continuous effort, and of current special
interest is adoption of the model to Cloud computing software development
efforts. The model itself is currently utilized in a number of projects under the
GIRD multi-project umbrella, and are in the projects of this thesis combined
with the techniques of Paper III.
The service composition techniques of Paper III have been further developed
in work on Paper V, and are currently under investigation for extension in a
code-generation effort within multiple projects. The techniques lend themselves
well to software refactorization efforts and prototype implementations are being
developed for integration with the Apache Axis2 [7] SOAP [44] engine. Exten-
sion of these techniques to Representational State Transfer (REST)-based [32]
Resource-Oriented Architectures (ROA) [53] would possibly be a viable alter-
native to current Web Service Description Language (WSDL)-based [15] code
generation. In this case the abstraction of the mechanisms would naturally
be placed in API implementations, instead of in generated stub code. Ex-
tension of these techniques to a more ubiquitous notification scheme, where
the current WSRF-based [35] approach could be extended to a more generic
MOM- [9] or ESB-based [14] approach would also be possible. Development of
a more generic framework for service development adapted to a larger number
of service engines would further such efforts.
The job management framework of Paper I and V is currently being devel-
oped into a more mature software product scheduled for use in SweGrid, the
Swedish national Grid, and a port of the framework to alternative SOAP stacks
is currently under investigation. Interesting research questions related to the
architecture of this framework include, e.g., development of data management
capabilities, (further) adaption to standardization efforts, investigation of ad-
vanced notification brokering capabilities, and inclusion of advanced resource
23
brokering features such as advance reservation and coallocation of resources,
and classadd-based match-making.
Further development and integration of high-level job clients such as work-
flow engines and Grid portals would be beneficial, as well as further investiga-
tion of integration architectures such as that of Paper IV, as these are expected
to increase the understanding of application-infrastructure integration issues.
Investigation of (minimalistic) implementation approaches for Grid middleware
development and simulation are also expected to render a deeper understand-
ing of these issues. Integration with Cloud Computing solutions, and other
virtualization-based infrastructure techniques, are also of interest and can be
expected to increase the adoptability and flexibility of these techniques.
24
Bibliography
[1] C. Adams and S. Farrell. Internet X. 509 public key infrastructure certifi-
cate management protocols, 1999.
[2] D.P. Anderson. BOINC: A system for public-resource computing and
storage. In 5th IEEE/ACM International Workshop on Grid Computing,
pages 4–10, 2004.
[3] D.P. Anderson, J. Cobb, E. Korpela, M. Lebofsky, and D. Werthimer.
SETI@ home: an experiment in public-resource computing. Communica-
tions of the ACM, 45(11):56–61, 2002.
[4] T. Andrews, F. Curbera, H. Dholakia, Y. Goland, J. Klein, F. Leymann,
K. Liu, D. Roller, D. Smith, S. Thatte, et al. Business process execution
language for web services, version 1.1. Specification, BEA Systems, IBM
Corp., Microsoft Corp., SAP AG, Siebel Systems, 2003.
[5] E. Angerson, Z. Bai, J. Dongarra, A. Greenbaum, A. McKenney,
J. Du Croz, S. Hammarling, J. Demmel, C. Bischof, and D. Sorensen. LA-
PACK: A portable linear algebra library for high-performancecomputers.
In Proceedings of Supercomputing’90, pages 2–11, 1990.
25
[10] A. Bayucan, R.L. Henderson, C. Lesiak, B. Mann, T. Proett, and
D. Tweten. Portable Batch System: External reference specification. Tech-
nical report, Technical report, MRJ Technology Solutions, 1999.
[11] H. Benoit-Cattin, G. Collewet, B. Belaroussi, H. Saint-Jalmes, and
C. Odet. The SIMRI project: a versatile and interactive MRI simula-
tor. Journal of Magnetic Resonance, 173(1):97–115, 2005.
[12] BLAS (Basic Linear Algebra Subprograms). http://www.netlib.org/blas/.
September 2009.
[13] J. Knobloch (Chair) and L. Robertson (Project Leader). LHC computing
Grid technical design report. http://lcg.web.cern.ch/LCG/tdr/, Septem-
ber 2009.
[14] D. Chappell. Enterprise Service Bus. O’Reilly Media, Inc., 2004.
[15] E. Christensen, F. Curbera, G. Meredith, and S. Weerawarana. Web Ser-
vices Description Language (WSDL) 1.1. http://www.w3.org/TR/wsdl,
September 2009.
[16] E. Deelman, G. Singh, M. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta,
K. Vahi, G.B. Berriman, J. Good, A. Laity, J.C. Jacob, and D.S. Katz.
Pegasus: a framework for mapping complex scientific workflows onto dis-
tributed systems. Scientific Programming, 13(3):219–237, 2005.
[17] distributed.net. http://www.distributed.net/. September 2009.
[18] W. Allcock (editor). GridFTP: Protocol extensions to FTP for the Grid.
http://www.ogf.org/documents/GFD.20.pdf, September 2009.
26
[22] E. Elmroth, F. Hernández, and J. Tordsson. A light-weight Grid work-
flow execution engine enabling client and middleware independence. In
R. Wyrzykowski et al., editors, Parallel Processing and Applied Mathe-
matics, Lecture Notes in Computer Science, vol. 4967, pages 754–761.
Springer-Verlag, 2008.
[23] E. Elmroth, F. Hernández, and J. Tordsson. Three fundamental dimen-
sions of scientific workflow interoperability: Model of computation, lan-
guage, and execution environment. Future Generation Computer Systems.
The International Journal of Grid Computing: Theory, Methods and Ap-
plications, 2009, to appear.
[24] E. Elmroth, F. Hernández, J. Tordsson, and P-O. Östberg. Designing
Service-Based Resource Management Tools for a Healthy Grid Ecosystem.
In R. Wyrzykowski et al., editors, Parallel Processing and Applied Math-
ematics, Lecture Notes in Computer Science, vol. 4967, pages 259–270.
Springer-Verlag, 2008.
[25] E. Elmroth, S. Holmgren, J. Lindemann, S. Toor, and P-O. Östberg. Em-
powering a Flexible Application Portal with a SOA-based Grid Job Man-
agement Framework. In The 9th International Workshop on State-of-the-
Art in Scientific and Parallel Computing, to appear, 2009.
[26] E. Elmroth and P-O. Östberg. Dynamic and Transparent Service Compo-
sitions Techniques for Service-Oriented Grid Architectures. In S. Gorlatch,
P. Fragopoulou, and T. Priol, editors, Integrated Research in Grid Com-
puting, pages 323–334. Crete University Press, 2008.
[27] E. Elmroth and P-O. Östberg. A Composable Service-Oriented Archi-
tecture for Middleware-Independent and Interoperable Grid Job Manage-
ment. UMINF 09.14, Department of Computing Science, Umeå University,
Sweden. Submitted for Journal Publication, 2009.
[28] E. Elmroth and J. Tordsson. An interoperable, standards-based Grid re-
source broker and job submission service. In H. Stockinger, R. Buyya,
and R. Perrott, editors, e-Science 2005, First International Conference on
e-Science and Grid Computing, pages 212–220. IEEE CS Press, 2005.
[29] E. Elmroth and J. Tordsson. A Grid resource broker supporting advance
reservations and benchmark-based resource selection. In J. Dongarra,
K. Madsen, and J. Waśniewski, editors, Applied Parallel Computing - State
of the Art in Scientific Computing, Lecture Notes in Computer Science vol.
3732, pages 1061–1070. Springer-Verlag, 2006.
[30] E. Elmroth and J. Tordsson. Grid resource brokering algorithms enabling
advance reservations and resource selection based on performance predic-
tions. Future Generation Computer Systems. The International Journal of
Grid Computing: Theory, Methods and Applications, 24(6):585–593, 2008.
27
[31] E. Elmroth and J. Tordsson. A standards-based grid resource brokering
service supporting advance reservations, coallocation and cross-grid inter-
operability. Concurrency Computat.: Pract. Exper., 2009. accepted.
[32] R. T. Fielding. REST: Architectural Styles and the Design of Network-
based Software Architectures. Doctoral dissertation, University of Califor-
nia, Irvine, 2000.
[33] I. Foster. What is the grid? a three point checklist. GRID today, 1(6):22–
25, 2002.
[34] I. Foster. Globus toolkit version 4: Software for service-oriented systems.
In H. Jin, D. Reed, and W. Jiang, editors, IFIP International Conference
on Network and Parallel Computing, LNCS 3779, pages 2–13. Springer-
Verlag, 2005.
[35] I. Foster, J. Frey, S. Graham, S. Tuecke, K. Czajkowski, D. Ferguson,
F. Leymann, M. Nally, I. Sedukhin, D. Snelling, T. Storey, W. Vam-
benepe, and S. Weerawarana. Modeling stateful resources with Web ser-
vices. http://www-106.ibm.com/developerworks/library/ws-resource/ws-
modelingresources.pdf, September 2009.
[36] I. Foster, A. Grimshaw, P. Lane, W. Lee, M. Morgan, S. Newhouse,
S. Pickles, D. Pulsipher, C. Smith, and M. Theimer. OGSA
c basic exe-
cution service version 1.0. http://www.ogf.org/documents/GFD.108.pdf,
September 2009.
[37] I. Foster and C. Kesselman. The grid: blueprint for a new computing
infrastructure. Morgan Kaufmann, 2004.
[38] I. Foster, C. Kesselman, and S. Tuecke. The anatomy of the grid: Enabling
scalable virtual organizations. International Journal of High Performance
Computing Applications, 15(3):200, 2001.
[39] I. Foster, H. Kishimoto, A. Savva, D. Berry, A. Grimshaw, B. Horn,
F. Maciel, F. Siebenlist, R. Subramaniam, J. Treadwell, and J. Von
Reich. The Open Grid Services Architecture, version 1.5, 2006.
http://www.ogf.org/documents/GFD.80.pdf, May 2009.
[40] I. Foster and S. Tuecke. Describing the elephant: The different faces of IT
as service. ACM Queue, 3(6):26–34, 2005.
[41] P. Gardfjäll, E. Elmroth, L. Johnsson, O. Mulmo, and T. Sandholm. Scal-
able Grid-wide capacity allocation with the SweGrid Accounting System
(SGAS). Concurrency Computat.: Pract. Exper., 20(18):2089–2122, 2008.
[42] Globus. http://www.globus.org. September 2009.
28
[43] S. Graham, D. Hull, and B. Murray. Web Services Base Notifica-
tion 1.3 (WS-BaseNotification). http://docs.oasis-open.org/wsn/wsn-
ws base notification-1.3-spec-os.pdf, September 2009.
[44] M. Gudgin, M. Hadley, N. Mendelsohn, J-J. Moreau, H. Frystyk Nielsen,
A. Karmarkar, and Y. Lafon. SOAP version 1.2 part 1: Messaging frame-
work. http://www.w3.org/TR/soap12-part1/, September 2009.
[45] T. Hansen, S. Tilak, S. Foley, K. Lindquist, F. Vernon, A. Rajasekar, and
J. Orcutt. ROADNet: A network of SensorNets. In Local Computer Net-
works, Proceedings 2006 31st IEEE Conference on, pages 579–587, 2006.
29
[56] B. Segal, L. Robertson, F. Gagliardi, and F. Carminati. Grid comput-
ing: The European Data Grid Project. In Nuclear Science Symposium
Conference Record, 2000 IEEE, volume 1, page 2/1, 2000.
[57] M. Snir, S.W. Otto, D.W. Walker, J. Dongarra, and S. Huss-Lederman.
MPI: The complete reference. MIT Press Cambridge, MA, USA, 1995.
[58] H. Stockinger. Defining the grid: a snapshot on the current view. The
Journal of Supercomputing, 42(1):3–17, 2007.
[59] A. Streit, D. Erwin, Th. Lippert, D. Mallmann, R. Menday, M. Rambadt,
M. Riedel, M. Romberg, B. Schuller, and Ph. Wieder. UNICORE - from
project results to production grids. In L. Grandinetti, editor, Grid Com-
puting: The New Frontiers of High Performance Processing, Advances in
Parallel Computing 14, pages 357–376. Elsevier, 2005.
[60] T.A. Tatusova and T.L. Madden. BLAST 2 Sequences, a new tool for
comparing protein and nucleotide sequences. FEMS microbiology letters,
174(2):247–250, 1999.
[61] D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in prac-
tice: The Condor experience. Concurrency Computat. Pract. Exper., 17(2–
4):323–356, 2005.
30