10 Principles of Analytics
10 Principles of Analytics
White Paper
“Studies show that successful organizations consistently
Table of Contents
It is a cliché, but nonetheless true, that the volumes of data pouring into organiza-
tions are growing at a frenzied pace. The pressure to find new and creative ways
to exploit this stream of raw facts for deeper insight is extreme. It stands to reason
that there is a pressing need to expand the population of people who can quickly
review, analyze and act upon data without assistance from Information Technology
(IT) or reliance on a small corps of “power users,” but the common perception is
that “analytics” is reserved for a small group of highly trained and quantitatively
oriented professionals. However, studies show that successful organizations
consistently get remarkable performance from ordinary people 1 by applying one
simple concept: learning by doing. Providing analytical tools that are compre-
hendible and adopted across a wide range of skills and training, organizations
position themselves to find lasting competitive advantage by leveraging the
informational assets they already have.
The focus of this paper is to define the term “analytics,” contrast it to Business
Intelligence and to provide a framework to understand the requirements of
analytics, which are defined in the final section, “The Ten Principles for Enterprise
Analytics.” Analytics is suitable for large numbers of people, not just specialists,
1 The proper term for and there is a pressing need for broader and more comprehensive proliferation of
true analytical tools.
interacting with informa-
tion at the speed of What is “Analytics?”
business, analyzing, The proper term for interacting with information at the speed of business,
and discovering and analyzing, and discovering and following through with the appropriate action, is
“analytics.” The term “analytics” is badly misapplied by the Business Intelligence
following through with community, which uses it widely, because their offerings fail to meet too many
the appropriate action, critical criteria to be effective as analytics. In particular, analytics requires:
is “analytics.” visualization + interactivity + utility.
Visualization goes beyond simple charting. Useful visualization adds to the analysis,
it doesn’t just display the results of it. When visualization is interactive, meaning a
person can conduct analysis by interacting with the visualization itself, not the pro-
gram behind it, an immense amount of business utility is unlocked. This interaction
has to happen on both a mechanical level (linked visualizations allowing the user to
simultaneously explore many variables) as well as a functional level (iterating
through the data to find other products that resemble a sell-through curve or
customers that exhibit similar behavior to define a “segment” for further analysis).
Analytics is wrongly perceived to be too technical for most people to master. The
key to proliferating the use of these capabilities is to encourage people to discover
the benefits through not only training, but through practice. Each person has his or
her own particular questions and theories, which are generally not addressed
through reporting and canned analysis such as dashboards. Unlocking these
observations for examination and discussion is essential. This typically starts by
1 Pfeffer, Jeffrey and Sutton, Robert I., The Knowing-Doing Gap (Boston, Harvard Business School
Press, 2000)
incubating the process and allowing it Data Warehousing/Business
to spread organically as the utility of it Intelligence: Not the Solution
becomes widely known. Sharing is key, Data warehousing, that is, extracting,
and it has to include the ability to not transforming, integrating, loading and
only publish the results of the last step, storing data to provide fodder for digital
but to provide encapsulated and dashboards and other reporting efforts
interactive “guides” for others to replay, does little more than rearrange the
examine and experiment with the problem. Until the information can be
thread of analysis. Reports and dash- examined, understood and acted upon,
boards lack this quality; they do not it is, truly, just a “warehousing” opera-
allow people to learn by doing. tion. The solution to the problem is to
provide more adaptive tools that are
The Problem: The Current State accepted and used by a wider audi-
of Affairs ence across the business processes
Despite using the word “analytics” that most impact differentiation and
freely, most “solutions” in the market competitiveness. The BI industry uses
today fall far short of providing support the term “analytics” casually, but real
for real analytics. Business Intelligence functionality for analytics is found most-
2
tools have barely made a dent2, hover- ly in the specialized tools and industry
ing between five and fifteen percent verticals, not the most popular BI tools.
adoption rates, depending on which
Analytics has not been prominent in
study you believe. Data mining and
data warehousing and BI, despite its
statistical software exist at the margins,
obvious benefits, for a number of
adding value but not widely used. This
reasons, including:
tends to push more and more of the
effort to the ubiquitous tool at hand, ● Data warehousing methodologies are
Excel spreadsheets, which exacerbates data centric, not analytical
a growing problem called Shadow IT3, ● The deployment of BI has historically
distinguished by poor data quality, been driven by replacement of
misappropriation of critical skills, reporting systems
erosion of data warehousing ROI and ● The perception is that visualization is
soaring maintenance costs. In addition, bound to statistics and too difficult
inefficient processes for handling data for people to use
and analysis lead to a knowing/doing
gap4 – subject matter experts spend
too much time manipulating data,
make too many manual steps in the
analysis process and spend too little
2 http://www.intelligententerprise.com/info_cen-
time implementing decisions. Current
ters/data_warehousing/showArticle.jhtml?article
Business Intelligence solutions ID=19502113
accommodate this gap by providing
3 Shadow IT is defined loosely as IT work per-
information without visualization, formed without the knowledge or control of the
interactivity or the ability to share formal IT organization. The bulk of this effort is
personal productivity tools like spreadsheets
findings and close the loop. and databases.
5 Peter Drucker said, “No institution can possibly survive if it needs geniuses or supermen to manage it.
It must be organized in such a way as to be able to get along ... composed of average human
beings.”
The Need for Interactive, Visual
Analytics
Two forces are at work that drive the need for analytics: data
R
volumes are increasing geometrically without corresponding
increases in staff or tool efficiency, and the pressure to rapidly
respond to competitive and regulatory factors is forcing organiza-
tions to improve their ability to manage, analyze and act on the
information at their disposal.
In addition to just the sheer volume of data that has to be digested, competitive
and regulatory pressures to deal with the problem are building. In the past, it was
enough for organizations to prepare external reports on a quarterly basis. Today,
daily operations are subject to examination and scrutiny. Product safety and quality
concerns are top of mind. Decisions about switching channels or suppliers are
made on the spot. Monitoring operations for minute improvements and under-
standing the behaviors of customers, suppliers and competitors are crucial. With
this (potential) wealth of information available, those that delay their efforts to
exploit it will find themselves slamming on the brakes rather than the accelerator.
Before defining the requirements for analytics in “The Ten Rules for Analytics,” it
helps to define the term more completely and compare it to other approaches that
are incorrectly labeled analytical, such as OLAP, packaged analytical applications,
statistical tools and data mining.
Definition of Analytics
The formal definition of analysis is, roughly, the study of something by looking at its
parts and examining the relationships between them. Analytics is a more active
term, meaning the method or process of analysis. However, when the terms
analysis and analytics are applied to Information Technology, the meanings are
often quite different. Before specifying a set of rules or characteristics of analytics,
some clarity is needed.
Limitations of OLAP
OLAP tools deal with data depth by employing a combination of aggregation and
dimensional navigation. Because there is too much data to display in a grid, it is
compressed and rendered in aggregated form, from which an analyst may
decompose parts of it via pre-defined hierarchical paths. Most OLAP tools are not
capable of constraining queries or formulating new measures using the qualities or
attributes of these hierarchical units. In other words, in most OLAP applications, the
relationships that can be analyzed are parent-child, wasting the information embed-
ded in the attributes of parents and children. To introduce more attributes expands
the dimensional complexity of the model and produces an undesirable result called
sparsity, which limits the ability of the models to scale to the depth (amount of
data) and breadth (richness of the data) small number of “power users using
that is needed for true analytics. There complex tools.” In other words, analyti-
are exceptions to this, notably OLAP cal applications and on-line analytical
tools that are based in a relational processing tools are, for the most part,
model, but there are drawbacks there not analytical at all. For analytics to
as well, especially in performance and occur, a person has to consider the
ease of use. information in front of them, follow a
thread of reasoning, iterate through
One barrier that cannot be breached
possible conclusions, share their
with reports is the limited amount of
findings and, most importantly of all,
data, displayed as characters, which can
act with confidence on their results.
be scanned and comprehended on a
single display. It is clear that a report Statistical Tools
cannot scale to millions of pieces of Statistical analysis can generate useful
information. Visualization, however, can information easily, but very few people
not only place millions of pieces of data are comfortable enough with the
on a page, it can expose relationships concepts to place much trust in them,
that can be discovered through reports or themselves. In fact, most people
only with great effort, if at all. cannot explain the difference between 8
Similarly, grid displays of numbers are a mean, a median and a sample mean.
only useful to a point. It is fairly easy to The misapplication of statistics is so
spot a few outliers in numbers, such likely, that only those with a solid
as very large, very small, negative or background in the discipline use them
inappropriate values in a report, but routinely. This does not diminish the
when numbers are aggregated, these importance of statistics; they’re just not
anomalies are often masked. Drilling for everyone. It’s not a needed feature
down does not reveal them because for the vast majority of applications.
the point of departure is usually some- Deployed on their own, they cannot
thing that appears interesting at the serve our wide audience.
aggregated level, not something that
appears normal. This is especially true Interactive, Visual Analytics Is
where a large number of values are the Answer
rolled up – the aggregated amount may On the other hand, interactive, visual
be within reasonable boundaries, but it analytics offers advantages to people
can mask extreme variations that can- of all levels of skill and training. With
cel each other out. Because it is impos- interactive visualization, it is possible to
sible to visually scan thousands or even scan and analyze great volumes of
millions of values, the only data and to navigate through the data.
way to spot these anomalies is with This is a key discriminating factor. One
statistics and visualization. OLAP tools doesn’t navigate the data in visualiza-
provide little to no support for this. tions; one visually navigates the rela-
tionships between the data and draws
To a large extent, today’s analytical inferences instantly. This matches our
applications and OLAP tools are definition of analytics as an active
deployed for reporting purposes. The process, not just a set of capabilities.
interactive, speed-of-thought surfing
through data to analyze what is behind In the next section, we propose ten
the numbers is the domain of a very “principles” for enterprise analytics.
The Ten Principles of Enterprise Analytics
Given the strong evidence of “buyers’ remorse” for BI revealed in surveys6 and the
requirements for analytics described in this paper, the following ten “rules” lay out
what is needed for a complete analytics platform that can serve the needs of a
high percentage of knowledge workers.
1. Visual + interactive
Visualization of data is the quickest way to understand information, and it’s
easy to use. Analytics that are visual have better usability and are easier to use
than BI. Making analytics visual makes it ten times more powerful. To serve a
wide audience of business users, data has to be rendered unambiguously. It
must be visual. Knowledge workers must be able to navigate through the data
and all its attributes. It must be interactive.
2. Zero code
Tools designed for software engineers, database analysts, programmers,
network engineers and even specialized functions like chemical engineers or
computer animators require the use of programming languages. The art of
9
programming requires a great deal of skill and experience to be effective, and
Fast is always a relative should be managed for standards, reuse, quality and maintainability. Tools for
term, but you know it analytics, on the other hand, are designed for business users who mostly were
when you see it. not hired for their programming skills. To be truly useful, analytical software
should provide the full range of capabilities without the need for programming,
Interactive visualization
scripting or any other kind of code development, with no exceptions.
and data analysis
require near-instanta- 3. Actually Easy to Use
Analytical software must be easy to use, but what exactly does that mean? A
neous response time to
Google search on “ease of use” will return 10,600,000 hits. The Google inter-
keep up with the ana- face is code-free, but needing to sift through millions of documents is far from
lyst. easy. Learning the nuance in using Google takes time and effort to get it right.
The GUI is only part of the answer. If you use a GUI and still have to decide
what order to place the parameters to a function or whether to do an inner or
an outer join, it’s not easy to use. The problem is that GUIs are sometimes a
thousand times more difficult to use than the old command line prompts. It
isn’t the code that’s the problem; it’s the conceptual model. When you’re not
sure what the effect is when you click something, or if you don’t understand
the effect of loosening a constraint and sliding a slide bar, it’s hard to use. To
make it truly easy to use, it has to deliver benefits of value and it has to be
understandable. The connection between the interface and the underlying
conceptual model has to be obvious.
4. Fast
Fast is always a relative term, but you know it when you see it. Interactive
visualization and data analysis require near-instantaneous response time to
keep up with the analyst. But speed is not limited to response time. Cycle time
is also critical. It is not just the speed of the first question but of the many oth-
ers to follow, all contributing to the total time to understanding. While data
warehouses employ data models and refresh the static model repeatedly, a
great deal of data that is the subject of analytics is not recurring, it is one-off
and/or sourced externally. Analytics has to be fast enough to ingest this kind of
10
information and make it available to the analyst in the least possible time.
5. Disposable/Persistent
Analytic opportunities are not uniform. Some are repeated consistently, some
happen only once and others have a longer but still limited lifespan. For the
latter two, it must be possible to conceive of and executive analysis quickly,
and to dispose of the analysis when it is no longer needed. For those scenar-
ios with a medium to long lifespan, a smooth mechanism must function so that
it can persist and be reused as needed. What is missing from most BI tools
today is a systematic discovery process, a way to continuously discover things
without the need for a development cycle. Data has to be organized beginning
with the business problem while remaining open to new and changing data –
static data models feeding information to static user displays won’t work.
6. Collaborative
No matter how useful or relevant a set of investigations or analyses is, unless
they can be smoothly conveyed to others and discussed, modified and per-
fected, the effort has only momentary value. OLAP allowed numerate analysts
to surf through mountains of data and draw useful conclusions, but it provided
no facility to share the insight gained, only the final report. Analytics is an active
process that continues even after the analyst completes a thread of analysis.
Making not only the final result, but the steps to get there, available to others is
the collaborative part of analytics that is missing in BI.
7. Conceptually Sound
Iterative manipulations by a user have to hold together because the underlying
model of the software is based on a sound set of relationships, not just one-at-
a-time graphics. Applying filters, constraints or other rules should perform
accurately and consistently so the analyst can be confident in the answer and
there is no need to double-check every result. Because most analytical inquiry
involves a series of steps, this conceptual soundness must apply across all the
steps in the current analysis. Conceptual soundness contributes to decision
confidence. Likewise, there should be no need to understand (relational) data
models. Analytical tools should present results to business users, not data
models. All references to the source or storage modes of the data should be
abstracted. Elements of the model should match the jargon of the business
user, not the software industry.
8. Depth
All of the attributes of the data must be exposed, and not limited to small data
sets or only a few dimensions. A spreadsheet model can accommodate about
16 million cells, with a hard limit of 64,000 rows and 256 columns. For many
11 Most importantly of all, applications, this is far too small to be useful. Any tool can address huge
for analytics to be use- amounts of data by utilizing a separate database server, but these solutions
ful, people have to be have proven to be too slow. CPU and memory resources are plentiful. Models
that exploit this abundance will be most effective for interactive visualization.
able to express them-
selves – the selection 9. Good Software Citizen
and managing of data, Like any other enterprise software, analytics software has to make provisions
for all of the requirements to be a good participant in enterprise architecture
the design of the model
such as honoring and using security, adhering to standards and being driven
(relationships, calcula- by a sensible architecture that minimizes risk due to maintenance, changes or
tions, hierarchies, etc.), error.
and the appearance of
10. Expressive
the visualizations. Most importantly of all, for analytics to be useful, people have to be able to
express themselves – the selection and managing of data, the design of the
model (relationships, calculations, hierarchies, etc.), and the appearance of the
visualizations. Moving from idea to visualization to action has to be simple and
straightforward and the range of ideas and approaches cannot be unreason-
ably constrained by the technology.
Conclusion
People in business, education, research, government and all other kinds of organi-
zations are being pressed to capitalize on the swarm of data that flows through
their networks. So far, the technology industry has done a good job handling
scalability and reliability in the capturing and storing of that information, but making
sense of it all by more than a handful of experts is still a green field adventure. Data
warehousing and Business Intelligence provide excellent tools and practices for
refining, formatting and reporting that information, but the need to understand it in
order to act is still largely unmet.
Analytics is a vital activity for a broad range of people, but the term has become
associated with Business Intelligence, on the one hand, and statistics and data
mining on the other. In point of fact, it is neither. Anyone in an organization who
views quantifiable information, whether in reports, spreadsheets, grids, bands,
graphical, dashboards or greenbar, has a need to understand what the underlying
numbers mean, and that implies a discovery process. Reports may satisfy a need
to standardize the information, but fail to address the fact that everyone has their
own questions. Well-made business analytics powered by interactive visualization
facilitate the discovery process in a painless way and open up the membership to
the “analysts” club up and down the organization. Every day, people are exposed 12
to active visualization, from their graphic equalizers to the animated weather and
traffic maps in the morning. It is an accessible metaphor that has not been widely
deployed because of the lingering mindset that it is too computer-intensive, too
difficult to use, requires a statistical background and the temperament of an
astrophysicist, none of which is true.
However, no matter how useful it is and uncomplicated to use, certain people will
never actually take the driver’s seat. For them, guided analysis is the answer,
allowing them to ride in the passenger seat, to stretch the metaphor, and follow a
thread of analysis performed by someone else.
Spotfire, Inc.
212 Elm Street
Somerville, MA 02144 U.S.A.
Telephone +1.617.702.1600
Fax +1.617.702.1700
Toll-Free +1.800.245.4211
Spotfire AB
(European Headquarters)
Första Lânggatan 26
SE-413 28 Göteborg, Sweden
Telephone +46.31.704.1500
Fax +46.31.704.1501
Spotfire Japan KK
(Japanese Headquarters)
Kinokuniya Bldg. 7F, 13-5,
Hatchobon 4-chrome
Chuo-Ku, Tokyo 104-0032 Japan
Telephone +81.3.5540.7321
Fax +81.3.3552.3166
www.spotfire.com