Data Architecture Principles
Data Architecture Principles
Architecture
Principles
FOR SALESFORCE MARKETING CLOUD
ELIOT HARPER
Contents
Introduction 3
1. Purpose 4
2. Storage 5
3. Simplicity 7
4. Schema 8
5. Governance 11
Understandable 11
Compliant 12
Trustworthy 12
6. Security 14
User Access Reviews 15
Principle of Least Privilege 15
File Transfer 16
File Encryption 17
Encryption at Rest 17
Summary 18
About the Author 19
About CloudKettle 19
All rights reserved. This publication is protected by copyright and permission must be obtained from the
publisher prior to any reproduction, storage in a retrieval system, or transmission in any form or by any
means, electronic, mechanical, photocopying or otherwise. To obtain permission to use material from this
ebook, please email hello@cloudkettle.com with your request.
DATA ARCHITECTURE PRINCIPLES FOR SALESFORCE MARKETING CLOUD 2
Introduction
Data architecture can quickly get complicated.
It wasn’t always this way, but over the past two
decades there has been an explosion in adoption of
voluminous or ‘big’ data, which has been fueled by the
evolution of digital storage mediums. And as a result,
data has quickly grown in variety, volume and velocity.
This data phenomenon has enabled organizations to provide highly
personalized experiences by unifying discrete customer data points across
website visits, sales, customer service, email engagement and more. And
while data effectively forms the backbone of customer engagement in
digital marketing platforms like Salesforce Marketing Cloud, the reality is
that fundamental structural choices need to be considered when integrating
different data points, as a poorly considered data architecture results in
decisions which can be very costly to change later.
Purpose
The first principle to consider is to determine the purpose of the data source.
Some marketers treat data with a view of deservingness; if it is available,
then they believe that they are entitled to it. And as long as Salesforce has
a lenient view on data storage in the platform (essentially unlimited storage,
at no additional cost) then why not store everything, forever?
It turns out that there is really such a thing as “too much data” and there
are consequences of storing it.
To determine the data purpose for each data source, a well developed
mission statement should be crafted, detailing how the data will be used
and who will use it. This, in turn, will help to validate whether the data is
actually required in the first place.
Storage
Once the purpose of the data source has been established, then the
organization must determine how long the data is actually needed for.
Once again, thanks to the notable absence of data storage fees in
Marketing Cloud (a policy that is likely to change in the future), platform
users quite happily store all their data, forever. It is not uncommon to see
data extensions containing landing page form submissions, journey logs,
send logs and Subscriber records from every past batch send buried away,
out of mind, in date-based nested folders — often amounting in thousands
of individual data extensions across business units.
And without any performance or cost impact, there is little incentive for
users to delete all this data. However, there are a few problems with this
storage philosophy.
If there is a valid reason to store the data perpetually, then chances are
that Marketing Cloud is not the best repository for it, as data can easily be
deleted or overwritten unintentionally on the platform. In this case, a data
lake or data warehouse would be a better option for long-term data storage.
Simplicity
The most common approach to processing data in Marketing Cloud is
using Automation Studio, which among other things is purpose-built for
performing Extract, Transform and Load (or ETL) operations on data from
different sources.
Schema
A data ‘schema’ is an abstract design that represents data storage (in data
extensions). It not only defines data types, lengths, and required fields, but
also identifies relationships between data extensions. Think of a schema as
a ‘blueprint’ to ensure efficient organization of data, making the data easier
to manage and maintain.
The second step is to consider how data sources relate to each other.
The best way to describe this is by illustrating relationships between data
components. This is called an entity-relationship diagram, or an ERD.
Customers Orders
0..*
CustomerId OrderId
FirstName 1 CustomerId
LastName OrderDateTime
AddressLine1
AddressLine2 1
City 1..*
State
PostalCode
OrderLineItem
Phone
LineId
OrderId
ProductId
Quantity
1..*
In this diagram, the cardinal 1
relationship between data entities is
Products
clear, for example, “one customer can
have zero or many orders”, or “one ProductId
order can have one or many line items”. Description
InStock
Separating data extensions into RetailPrice
CostPrice
definable entities not only enables data
Weight
to be relational, but makes it much InventoryCount
easier to retrieve and manipulate.
Governance
Data governance is a multi-disciplinary process, but broadly speaking, the
term refers to the discipline of planning how an organization uses data so
it is handled consistently throughout the business in order to extract value
from all the information collected and stored in it.
Start by setting realistic and measurable goals (as you cannot control
what you cannot measure). Goals will differ by organization, but consider
adopting the following three goals as an initial framework.
Goal 1: Understandable
For data to be understood across the organization, it needs to be structured
in a way that it can be used effectively. This goal goes hand-in-hand with
the data schema discussed in the previous section, where a data dictionary
has a taxonomy that has context and meaning to all who use it. And as
stressed earlier, it needs to be well documented.
The legal bases for data processing varies by data protection regulation.
The GDPR and CCPA both have six legal bases, the PDPB has seven, and
the LGPD has ten. Determining a legal basis for processing data is key, as
all regulations articulate that data can only be processed if there is at least
one legal basis for doing so.
While legal bases vary by regulation, one common basis that is shared across
all regulations is consent. And consent is always required for marketing.
Goal 3: Trustworthy
Data governance is a trust-based process. And if data cannot be trusted,
then it is a source of risk, as it can result in poor business decisions, and
compromise customer relationships by targeting them with irrelevant or
wrong content.
Data trust needs to be earned, not taken as a leap of faith. In order to trust
an organization’s data, the data must produce reliable analytics to support
well-informed business decisions. Trusted data maximizes the ability to
create value from it.
Security
Security is about keeping sensitive and confidential data secure, yet
accessible to those who need it. Some Marketing Cloud customers assume
that security is not actually their concern, as the platform is hosted in a
highly secure, managed environment. That may be true, but there is still
plenty of opportunity for data to be compromised.
And SFTP is widely regarded today as the de facto protocol for secure
file transfer. Marketing Cloud offers different authentication options
when creating SFTP user accounts, which includes password, ssh key,
or a combination of both.
These same rules also apply to SSH keys. Only designated users should be
able to access SSH keys and it is important to enforce diligent key rotation,
while also disallowing the use of matching passphrases across multiple
keys or iterations.
The main justification is that if the private key is compromised (for example,
a device is stolen or malware is installed on it), the attacker would not be
able to compromise the SFTP account without the password. Similarly, if a
password is compromised, then they would still need the SSH key. It is still
not foolproof, but this does provide dual-factor authentication and makes
it harder for an attacker to compromise credentials.
SFTP and PGP have two different goals. PGP encrypts the data payload,
while SFTP encrypts the file transfer. At a minimum, transport encryption
is required. Data encryption provides an extra security layer.
Marketing Cloud supports importing and exporting both PGP and GPG files.
GPG and PGP are almost identical, with the major difference between them
being how they are licensed to the public. PGP or GPG encryption protects
data at rest, which ensures that the data file is not exposed after it is
transferred to an SFTP server or file system.
Encryption at Rest
Certain regulations and enterprise organization policies require data to be
encrypted at rest. This encryption type prevents attackers from accessing
unencrypted data — if an attacker obtains a hard drive with encrypted
data (from the data center) but not the encryption keys, then the attacker is
unable to read the data.
About CloudKettle
CloudKettle helps enterprises drive revenue
with the Salesforce and Google ecosystems.
We do this by providing the strategy and hands-
on keyboard execution to leverage platforms
like Salesforce Sales Cloud, Marketing Cloud,
Einstein, and CRM Analytics to create highly
personalized cross-channel experiences that
drive revenue.