0% found this document useful (0 votes)
88 views

Ensuring Data Integrity in Storage: Techniques and Applications

In this paper, we discuss the causes of integrity violations in storage. We describe several interesting applications of storage integrity checking. We then identify and formalize a new class of integrity assurance techniques.

Uploaded by

Miro Polgár
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Ensuring Data Integrity in Storage: Techniques and Applications

In this paper, we discuss the causes of integrity violations in storage. We describe several interesting applications of storage integrity checking. We then identify and formalize a new class of integrity assurance techniques.

Uploaded by

Miro Polgár
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Ensuring Data Integrity in Storage:

Techniques and Applications


Gopalan Sivathanu, Charles P. Wright, and Erez Zadok


Stony Brook University
Computer Science Department
Stony Brook, NY 11794-4400
{gopalan,cwright,ezk}@cs.sunysb.edu

ABSTRACT 1. INTRODUCTION
Data integrity is a fundamental aspect of storage security and re- Reliable access to data is a prerequisite for most computer sys-
liability. With the advent of network storage and new technology tems and applications. There are several factors that cause unex-
trends that result in new failure modes for storage, interesting chal- pected or unauthorized modifications to stored data. Data can get
lenges arise in ensuring data integrity. In this paper, we discuss the corrupted due to hardware or software malfunctions. Disk errors
causes of integrity violations in storage and present a survey of in- are common today [26] and storage software that exists is typically
tegrity assurance techniques that exist today. We describe several not designed to handle a large class of these errors. A minor in-
interesting applications of storage integrity checking, apart from se- tegrity violation, when not detected by the higher level software
curity, and discuss the implementation issues associated with tech- on time, could cause further loss of data. For example, a bit-flip
niques. Based on our analysis, we discuss the choices and trade- while reading a file system inode bitmap could cause the file sys-
offs associated with each mechanism. We then identify and for- tem to overwrite an important file. Therefore, prompt detection of
malize a new class of integrity assurance techniques that involve integrity violations is vital for the reliability and safety of the stored
logical redundancy. We describe how logical redundancy can be data.
used in today’s systems to perform efficient and seamless integrity Integrity violations could also be caused by malicious intrusions.
assurance. Security advisory boards over the last few years have noticed a
steep rise in the number of intrusion attacks on systems [4]. A large
class of these attacks are caused by malicious modifications of disk
Categories and Subject Descriptors data. An attacker that has gained administrator privileges could po-
B.8.0 [Performance and Reliability]: Reliability, Testing, and tentially make changes to the system, like modifying system utili-
Fault-Tolerance; D.4.2 [Operating Systems]: Storage Manage- ties (e.g., /bin files or daemon processes), adding back-doors or
ment; D.4.3 [Operating Systems]: File Systems Management; Trojans, changing file contents and attributes, accessing unautho-
D.4.5 [Operating Systems]: Reliability—Fault-tolerance; D.4.6 rized files, etc. Such file system inconsistencies and intrusions can
[Operating Systems]: Security and Protection—Access controls; be detected using utilities like Tripwire [17, 18, 39].
D.4.6 [Operating Systems]: Security and Protection—Cryptographic There are different notions of integrity in storage. File system
controls consistency is one of the common ones. Most file systems today
come with integrity checking utilities such as the Unix fsck that
perform a scan through the storage device to fix logical inconsis-
General Terms tencies between data and meta-data. (Tools such as fsck are often
said to be performing “sanity” checking.) This reduces the like-
Reliability, Security lihood of file corruption and wasted disk space in the event of a
system crash. Advanced methods like journaling [12] and trans-
actional file systems [8] ensure file system consistency even in the
Keywords event of unexpected system faults. File system inconsistency can
Storage integrity, File systems, Intrusion detection cause data corruption, but generally may not cause security threats;
files might become inaccessible due to inconsistency between the
∗This work was partially made possible by NSF CAREER meta-data and data caused by a system crash. Apart from file-
EIA-0133589 and CCR-0310493 awards and HP/Intel gifts num- system inconsistencies, integrity violations in file data are a major
bers 87128 and 88415.1. problem that storage system designers have to solve. Even a per-
fectly consistent file system can have its data corrupted, and normal
integrity checkers like fsck cannot detect these errors. Techniques
like mirroring, parity, or checksumming can be used to detect data
Permission to make digital or hard copies of all or part of this work for integrity violations at the file or block level. Cryptographic hash
personal or classroom use is granted without fee provided that copies are functions could even detect malicious forging of checksums.
not made or distributed for profit or commercial advantage and that copies In this paper, we begin by presenting a survey of integrity assur-
bear this notice and the full citation on the first page. To copy otherwise, to ance techniques, classifying them under three different dimensions:
republish, to post on servers or to redistribute to lists, requires prior specific the scope of integrity assurance, the logical layer of operation, and
permission and/or a fee.
StorageSS’05, November 11, 2005, Fairfax, Virginia, USA.
the mode of checking. We then discuss the various applications of
Copyright 2005 ACM 1-59593-223-X/05/0011 ...$5.00. integrity checking such as security, performance enhancement, etc.

26
We also describe the different implementation choices of integrity The ever growing requirements for storage technology has given
assurance mechanisms. Almost all integrity checking mechanisms rise to distributed storage where data need to be transferred through
that we analyzed adopt some form of redundancy to verify integrity unreliable networks. Unreliable networks can corrupt data that pass
of data. Techniques such as checksumming, parity etc., ignore the through them. Unless the higher level protocols adopt appropriate
semantics of the data and treat it as a raw stream of bytes. They ex- error checking and correcting techniques, these errors can cause
plicitly generate and store redundant information for the sole pur- client software to malfunction.
pose of integrity checking. In contrast to these physical redundancy
techniques, we identify a new class of techniques for integrity as- 2.2 Malicious Intrusions
surance, where the redundant information is dependent on the se- Trustworthy data management in a computer system is an impor-
mantics of the data stored. Such logical redundancy techniques tant challenge that hardware and software designers face today. Al-
often obviate the extra cost of explicitly accessing redundant data though highly critical and confidential information is being stored
and verifying integrity, by exploiting structural redundancies that electronically, and is accessed through several different interfaces,
already exist in the data. For example, if an application stored a B+ new security vulnerabilities arise. For example, in a distributed
tree on disk, with back pointers from the children to parents (for storage system, data can be accessed from remote locations through
more efficient scanning), those back pointers can also be used to untrusted network links; a network eavesdropper can gain access to
ensure the integrity of pointers within a node. Although some ex- confidential data if the data is not sufficiently protected by methods
isting systems perform a minimal amount of such logical integrity such as encryption. Damage to data integrity can often cause more
checking in the form of “sanity checks” on the structure of data, serious problems than confidentiality breaches: important informa-
we believe that these techniques can be generalized into first class tion may be modified by malicious programs or malicious users,
integrity assurance mechanisms. or faulty system components. For example, virus code could be in-
The rest of this paper is organized as follows. Section 2 describes serted into binary executables, potentially resulting in the loss of all
causes of integrity violations. Section 3 describes the three most data stored on a system. Operating systems that allow access to raw
commonly used integrity checking techniques. Section 4 presents a disks can inadvertently aid an attacker to bypass security checks in
more detailed classification of such techniques under three different the file system, and cause damage to stored data.
dimensions. Section 5 explores several interesting applications of
integrity checking. We discuss the various implementation choices
for integrity checkers in Section 6. In Section 7 we present the
2.3 Inadvertent User Errors
new class of integrity assurance techniques that make use of logical User errors can compromise data integrity at the application level.
redundancy. We conclude in Section 8. A user action can break application level integrity semantics. For
example, an inadvertent deletion of a database file can cause a
DBMS to malfunction, resulting in data corruption. In general,
2. CAUSES OF INTEGRITY VIOLATIONS if user actions invalidate the implicit assumptions that the applica-
Integrity violations can be caused by hardware or software mal- tions dealing with the data make, integrity violations can occur.
functions, malicious activities, or inadvertent user errors. In most
systems that do not have integrity assurance mechanisms, unex-
pected modifications to data either go undetected, or are not prop- 3. COMMON INTEGRITY TECHNIQUES
erly handled by the software running above, resulting in software
crash or further damage to data. In this section, we describe three In this section, we discuss the three most common integrity as-
main causes of integrity violations and provide scenarios for each surance techniques that exist today in storage. All these techniques
cause. maintain some redundant information about the data and ensure in-
tegrity by recomputing the redundant data from the actual data and
2.1 Hardware and Software Errors comparing it with the stored redundant information.
Data stored on a storage device or transmitted across a network
in response to a storage request, can be corrupted due to hardware 3.1 Mirroring
or software malfunctioning. A malfunction in hardware could also One simple way to implement integrity verification is data repli-
trigger software misbehavior resulting in serious damage to stored cation or mirroring. By maintaining two or more copies of the same
data. For example, a hardware bit error while reading a file system’s data in the storage device, integrity checks can be made by compar-
inode bitmap could cause the file system to overwrite important ing the copies. An integrity violation in one of the copies can be
files. Hardware errors are not uncommon in today’s modern disks. easily detected using this method. While implementing this is easy,
Disks today can corrupt data silently without being detected [1]. this method is inefficient both in terms of storage space and time.
Due to the increasing complexity of disk technology these days, Mirroring can detect integrity violations caused by data corruption
new errors occur on modern disks—for example, a faulty disk con- due to hardware errors, but cannot help in recovering from the dam-
troller causing misdirected writes [40] where data gets written to age, as a discrepancy during comparison does not provide informa-
the wrong location on disk. Most storage software systems are to- tion about which of the copies is legitimate. Recovery is possible
tally oblivious to these kind of hardware errors, as they expect the using majority rules if the mirroring is 3-way or more. Mirroring
hardware to be fail-stop in nature—where the hardware either func- can be used to detect integrity violations which are caused due to
tions or fails explicitly. data corruption and generally not malicious modification of data.
Bugs in software could also result in unexpected modification A malicious user who wants to modify data can easily modify all
of data. Buggy device drivers can corrupt data that is read from copies of the data, unless the location of the copies is maintained
the storage device. File system bugs can overwrite existing data or in a confidential manner. Mirroring also cannot detect integrity
make files inaccessible. Most file systems that adopt asynchrony violations caused by user errors, because in most cases user mod-
in writes could end up in an inconsistent state upon an unclean ifications are carried out in all mirrors. RAID-1 uses mirroring to
system shutdown, thereby corrupting files or making portions of improve storage reliability, but does not perform online integrity
data inaccessible. checks using the redundant data.

27
3.2 RAID Parity determine which of them is legitimate. For example, if checksum-
Parity is used in RAID-3, RAID-4, and RAID-5 [24] to vali- ming is used to detect integrity violations in file data, a checksum
date the data written to the RAID array. Parity across the array is mismatch can only detect an integrity violation, but could not pro-
computed using the XOR (Exclusive OR) logical operation. XOR vide information whether the data is corrupted or the checksum
parity is a special kind of erasure code. The basic principle behind itself is invalid. There are a few techniques for recovering from
erasure codes is to transform N blocks of data, into N + M blocks integrity violations once they are detected. Those techniques are
such that upon loss of any M blocks of data, they can be recovered closely tied to the mechanism used to perform detection, and the
from the remaining N blocks, irrespective of which blocks are lost. nature of redundancy that is employed.
The parity information in RAID can either be stored on a separate,
dedicated drive, or be mixed with the data across all the drives in Taxonomy of techniques
the array. Most RAID schemes are designed to operate on fail-
stop disks. Any single disk failure in RAID (including the parity
disk) can be recovered from the remaining disks by just performing
an XOR on their data. This recovery process is offline in nature.
Although the parity scheme in RAID does not perform online in- Checking
Scope of Logical
tegrity checks, it is used for recovering from a single disk failure in Mode
Assurance Layer
the array. The organization of RAID-5 parity is shown in Figure 1.

Avoidance Hardware Online


A0 B0 C0 Parity
XOR
Parity A1 B1 Parity D1
Generation
A2 Parity C2 D2 Device
Detection Driver Offline
Parity B3 C3 D3

A Blocks B Blocks C Blocks D Blocks


File system
Figure 1: RAID-5: Independent data disks with distributed Correction
parity
User level

3.3 Checksumming
Checksumming is a well known method for performing integrity Figure 2: Taxonomy of techniques
checks. Checksums can be computed for disk data and can be
stored persistently. Data integrity can be verified by comparing the In this section we describe the taxonomy of integrity checking
stored and the newly computed values on every data read. Check- techniques that exist today. Figure 2 represents our classification of
sums are generated using a hash function. The use of cryptographic integrity assurance techniques. We analyze the techniques in three
hash functions has become a standard in Internet applications and different dimensions: the scope of their assurance, the logical layer
protocols. Cryptographic hash functions map strings of different at which they are designed to operate, and their checking modes.
lengths to short fixed size results. These functions are generally
designed to be collision resistant, which means that finding two
4.1 Scope of Integrity Assurance
strings that have the same hash result should be infeasible. In ad- Data integrity can be guaranteed by several ways. Online in-
dition to basic collision resistance, functions like MD5 [30] and tegrity checks help to detect and in some cases recover from in-
SHA1 [6] also have some properties like randomness. HMAC [19] tegrity violations. Some systems, instead of performing checks for
is a specific type of a hashing function where the hash generated is integrity, employ preventive methods to reduce the likelihood of an
cryptographically protected. It works by using an underlying hash integrity violation. In this section we classify integrity assurance
function over a message and a key, thereby detecting unauthorized mechanisms into three main types, based on their goals: those that
tampering of checksum values. It is currently one of the predomi- perform preventive steps so as to avoid specific types of integrity vi-
nant means of ensuring that secure data is not corrupted in transit olations; those that perform integrity checks and detect violations;
over insecure channels (like the Internet). This can also be used in and those that are capable of recovering from damage once a viola-
the context of storage systems to ensure integrity during read. tion is detected.

4.1.1 Avoidance
4. INTEGRITY CHECKING TAXONOMY Some systems provide a certain level of integrity guarantee for
There are several methods used to detect and repair data-integrity the data they store, so as to avoid explicit integrity checking mech-
violations. Almost all methods that exist today use some kind of anisms. These systems come with an advantage that they do not in-
redundancy mechanisms to check for integrity. This is because in- cur additional overheads for integrity verification. The mechanism
tegrity checking requires either syntactic or semantic comparison of used to provide integrity guarantees could incur some overhead, but
the data with some piece of information that is related to the actual it is generally smaller than separate checking mechanisms. More-
data. Comparison of related data helps detect integrity violations, over, these systems avoid the hassle of recovering from an integrity
but cannot repair them. The main reason that comparison cannot damage once it is detected. In this section we discuss four exist-
repair violations is that when there is a mismatch between two dif- ing methods that provide different levels of integrity assurances.
ferent kinds of data being compared, it is usually not possible to Read-only storage is a straight-forward means to avoid integrity vi-

28
olations due to malicious user activity or inadvertent user errors. 4.1.2 Detection
Journaling ensures file system consistency, encryption file systems Most of the storage integrity assurance techniques that exist to-
prevent malicious modification of file data with say, virus code, and day perform detection of integrity violations, but do not help in
transactional file systems provide ACID transactions which appli- recovering from the violation. In this section, we discuss those
cations can use to ensure semantic integrity of information. techniques.

Read-only Storage. Making the storage read-only is a simple Checksumming. The checksumming techniques discussed in Sec-
means to ensure data integrity. The read-only limitation can be tion 3 help in detecting integrity violations. They generally cannot
imposed at the hardware or software level. Hardware level read- help recovery for two reasons. First, a mismatch between the stored
only storage is not vulnerable to software bugs or data modification value and the computed value of the checksums just means that one
through raw disk access. However, they are vulnerable to hardware of them was modified, but it does not provide information about
errors like bit-flipping. File systems that enforce read-only charac- which of them is legitimate. Stored checksums are also likely to be
teristics are still vulnerable to hardware and software errors, and to modified or corrupted. Second, checksums are generally computed
raw disk access. However, they can prevent integrity violations due using a one-way hash function and the data cannot be reconstructed
to user errors. SFSRO [7], Venti [27], and Fossilization [14] are given a checksum value.
systems are read-only to ensure data integrity.
Mirroring. The mirroring technique described in Section 3 can
Journaling. Journaling file systems were invented partly to take detect a violation by comparing the copies of data, but suffers from
advantage of the reliability of logging. Modern examples include the same problem as checksumming for correcting the data.
Ext2, NTFS, Reiserfs, etc. A journaling file system can recover
from a system crash by examining its log, where any pending changes CRC. Cyclic Redundancy Check (CRC) is a powerful and easily
are stored, and replaying any operations it finds there. This means implemented technique to obtain data reliability in network trans-
that even after an unexpected shutdown, it is not necessary to scan missions. This can be employed by network storage systems to
through the entire contents of the disk looking for inconsistencies detect integrity violations in data transmissions across nodes. The
(as with scandisk on Windows or fsck on Unix): the system CRC technique is used to protect blocks of data called frames. Us-
just needs to figure out whatever has been added to the journal but ing this technique, the transmitter appends an extra N -bit sequence
not marked as done. In a journaling file system, the transaction to every frame, called a Frame Check Sequence (FCS). The FCS
interface provided by logging guarantees that either all or none of holds redundant information about the frame that helps the trans-
the file system updates are done. This ensures consistency between mitter detect errors in the frame. The CRC is one of the most com-
data and meta-data even in the case of unexpected system failures. monly used techniques for error detection in data communications.
Although journaling cannot protect data from malicious modifica-
tions or hardware bit errors, it can ensure file system consistency
without performing any explicit integrity checks for each file.
Parity. Parity is a form of error detection that uses a single bit
to represent the odd or even quantities of ‘1’s and ‘0’s in the data.
Parity usually consists of one parity bit for each eight bits of data,
Cryptographic File Systems. Cryptographic file systems en- which can be verified by the receiving end to detect transmission
crypt file data (and even selected meta-data) to ensure the confi- errors. Network file systems benefit from parity error detection, as
dentiality of important information. Though confidentiality is the they use the lower level network layers, which implement parity
main goal of encryption file systems, a certain degree of integrity mechanisms.
assurance comes as a side effect of encryption. Unauthorized mod-
ification of data by malicious programs or users, such as replacing
system files with Trojans, becomes nearly impossible if the data is
4.1.3 Correction
encrypted with an appropriate cipher mode. Although data can be When an integrity violation is detected by some means, some
modified, it is not feasible to do it in a predicable manner without methods can be used to recover data from the damage. We discuss
the knowledge of the encryption key. However, integrity violations two in this section.
due to hardware errors cannot be prevented by using encryption.
Thus, cryptographic file systems provide protection of integrity for Majority Vote. The majority vote strategy helps to resolve the
a certain class of threat models. Several file systems such as Blaze’s problem of determining whether the actual data or the redundant
CFS [3], NCryptfs [41, 42], etc., support encryption. information stored is unmodified (legitimate), in the event of a mis-
match between both of them. The majority vote technique can be
Transactional File Systems. We are working towards build- employed with detection techniques like mirroring. When there are
ing a transactional file system that exports an ACID transaction fa- N copies of the data (N > 2), upon an integrity violation, the data
cility to the user level. In addition to supporting custom user-level contained in the majority of the copies can be believed to be le-
transactions for protecting the semantic integrity of data that appli- gitimate, to some level of certainty. The other copies can then be
cations see, our ACID file system aims at providing intra-operation recovered from the content in the majority of the copies.
transactions (e.g., an individual rename operation is transaction
protected), such that atomicity and consistency guarantees are pro- RAID Parity. RAID parity (e.g., RAID 5) uses an erasure code
vided for every file system operation. This maintains the file system to generate parity information at the block level or bit level. The in-
in a consistent state and makes the file system completely recover- dividual disks in RAID should be fail-stop in nature, which means
able to a consistent state even in the event of unexpected system that in the event of a failure of some means, the disks should stop
failures. We are planning to build our file system using the log- working thereby explicitly notifying the RAID controller of a prob-
ging and locking features provided by the Berkeley Database Man- lem. A failure in the disk can mean several things including a viola-
ager [32] in the Linux kernel [15]. tion in the integrity of their data. Once a failure (integrity violation)
is detected by the fail-stop nature of the hardware, the parity infor-

29
mation stored can be used to reconstruct the data in the lost disk. perform better than software level mechanisms, because simple in-
The procedure to reconstruct the data in a RAID array is discussed tegrity checkers implemented using custom hardware run faster.
in Section 3. Second, hardware-level integrity checkers do not consume CPU
cycles, thereby reducing the CPU load on the main host proces-
4.1.4 Detection and Correction sor. There are two disadvantages of operating at the hardware level.
Error detection and correction algorithms are used widely in net- First, the amount of information available to ensure integrity is usu-
work transport protocols. These algorithms combine the function- ally more limited at the hardware level than at the upper levels such
alities of detection of integrity violations and even correcting them as the device driver or the core OS levels. Therefore, semantic data
to a certain level. Some of these algorithms are now being used in integrity cannot be ensured at the hardware level. For example, an
storage devices also, for detecting and correcting bit errors. In this on-disk error correcting mechanism can only ensure integrity at a
section we discuss three algorithms employed in local and network block level and cannot guarantee file system consistency, as it gen-
storage systems today. erally has no information about which blocks are data blocks and
meta-data blocks. An exception to this is Semantically-Smart Disk
ECC. Error Correction Codes (ECCs) [13] are an advanced form Systems (SDS) [34] which decipher file-system–level information
of parity detection often used in servers and critical data applica- at the firmware level. Second, integrity checking at the hardware
tions. ECC modules use multiple parity bits per byte (usually 3) to level can capture only a small subset of the integrity violations, as
detect double-bit errors. They are also capable of correcting single- the checked data should pass through several upper levels before it
bit errors without raising an error message. Some systems that sup- finally reaches the application, and hence there is enough room for
port ECC can use a regular parity module by using the parity bits subsequent data corruptions at those levels. Therefore, hardware-
to make up the ECC code. Several storage disks today employ er- level error checks are generally rudimentary in nature, and they
ror correcting codes to detect and correct bit errors at the hardware have to be augmented with suitable software level mechanisms so
level. Usually, to correct an N -bit sequence, at least lg(N ) bits of as to ensure significant integrity assurance. In this section we dis-
parity information are required for performing the correction part. cuss three existing systems that employ hardware level integrity
Hamming codes are one popular class of ECCs [25]. assurance mechanisms.

FEC. Forward Error Correction (FEC) [2] is a popular error de- On-Disk Integrity Checks. Data read from a disk is suscepti-
tection and correction scheme employed in digital communications ble to numerous bursts of errors caused by media defects, thermal
like cellular telephony and other voice and video communications. asperity and error propagation in electronics. Thermal asperity is
In FEC, the correction is performed at the receiver end using the a read signal spike caused by sensor temperature rise due to con-
check bits sent by the transmitter. FEC uses the Reed-Solomon tact with disk asperity or contaminant particles. Error bursts can be
algorithm [28] to perform correction. FEC is generally not used many bytes in length. The basis of all error detection and correc-
in storage hardware, but network file systems that use lower level tion in storage disks is the inclusion of redundant information and
network protocols benefit from it. special hardware or software to use it. Each sector of data on the
hard disk contains 512 bytes or 4,096 bits of user data. In addition
RAID Level 2. RAID-2 [24] uses memory-style error correcting to these bits, an additional number of bits are added to each sector
codes to detect and recover from failures. In an instance of RAID- for ECC use (sometimes also called error correcting circuits when
2, four data disks require three redundant disks, one less than mir- implemented in hardware). These bits do not contain data, but con-
roring. Since the number of redundant disks is proportional to the tain information about the data that can be used to correct many
log() of the total number of disks in the system, storage efficiency problems encountered trying to access the real data bits. There are
increases as the number of data disks increases. The advantage of several different types of error correcting codes that have been in-
RAID-2 is that it can perform detection of failures even if the disks vented over the years, but the type commonly used on magnetic
are not fail-stop in nature. If a single component fails, several of disks is the Reed-Solomon algorithm, named for researchers Irving
the parity components will have inconsistent values, and the failed Reed and Gus Solomon. Reed-Solomon codes are widely used for
component is the one held in common by each incorrect subset. error detection and correction in various computing and communi-
The lost information is recovered by reading the other components cations media, including optical storage, high-speed modems, and
in a subset, including the parity component, and setting the miss- data transmission channels. They have been chosen because they
ing bit to 0 or 1 to get the proper parity value for that subset. Thus are faster to decode than most other similar codes, can detect (and
multiple redundant disks are required to identify the failed disk, but correct) large numbers of missing bits of data, and require the least
only one is needed to recover the lost information. number of extra ECC bits for a given number of data bits. On-
disk integrity checks suffer from the general problem of hardware
4.2 Logical Layers level integrity checkers: they do not have much information to per-
Integrity assurance techniques can operate at various system lev- form semantic integrity checks, and they capture only a subset of
els, depending on the nature of the system and the requirements. integrity violations.
Designing the integrity assurance technique at each of these system
levels have distinct security, performance, and reliability implica- Semantically-Smart Disk Systems. Semantically Smart Disk
tions. In this section we classify integrity assurance techniques into Systems attempt to provide file-system–like functionality without
five different levels: hardware, device driver, network, file system, modifying the file system. Knowledge of a specific file system is
and user levels. embedded into the storage device, and the device provides addi-
tional functionality that would traditionally be implemented in the
4.2.1 Hardware Level file system. Such systems are relatively easy to deploy, because
The lowest physical level where an integrity assurance mecha- they do not require modifications to existing file system code. As
nism can operate is the hardware level. Mechanisms operating at these systems decipher information from the file system running on
the hardware level have two key advantages. First, they usually top, they can perform file system integrity checks at the hardware

30
level, thereby combining the performance advantages of hardware all times. It manages data using transaction groups that employ
level integrity checkers, with as much information needed for per- copy-on-write technology to write data to a new block on disk be-
forming semantic integrity assurance. fore changing the pointers to the data and committing the write.
Because the file system is always consistent, time-consuming re-
Hardware RAID. The hardware RAID parity discussed in Sec- covery procedures such as fsck are not required if the system is
tions 3 and 4.1 are implemented in the RAID controller hardware. shutdown in an unclean manner. ZFS is designed to provide end-
to-end 64-bit checksumming for all data, helping to reduce the risk
4.2.2 Device Driver Level of data corruption and loss. ZFS constantly checks data to ensure
In this section we discuss two systems that employ integrity as- that it is correct, and if it detects an error in a mirrored pool, the
surance techniques at the device driver level. technology can automatically repair the corrupt data.
The Protected File System (PFS) [35] is an architecture for uni-
NASD. Network Attached Secure Disks (NASDs) [10] is a stor- fying meta-data protection of journaling file systems with the data
age architecture for enabling cost effective throughput scaling. The integrity protection of collision resistant cryptographic hashes. PFS
NASD interface is an object-store interface, based loosely on the computes hashes from file system blocks and uses these hashes to
inode interface for Unix file systems. Since network communica- later verify the correctness of their contents. The hashes are com-
tion is required for storage requests such as read and write, NASDs puted by an asynchronous thread called hashd and are stored in the
perform integrity checks for each request sent through the network. file system journal log for easy reading. Using write ordering based
NASDs’ security is based on cryptographic capabilities. Clients on journaling semantics for the data and the hashes, PFS ensures
obtain capabilities from a file manager using a secure and private the integrity of every block of data read from the disk.
protocol external to NASD. A capability consists of a public por-
tion and a private key. The private key portion is a cryptographic Stackable File Systems. Our implementation of checksummed
key generated by the file manager using a keyed message digest NCryptfs [33] and I3 FS [16] perform integrity checking at the Vir-
(MAC). The network attached drive can calculate the private key tual File System (VFS) level. Stackable file systems are a way to
and compare it with the client-supplied message digest. If there is a add new functionality to existing file systems. Stackable file sys-
mismatch, NASD will reject the request, and the client must return tems operate transparently between the VFS and lower file systems,
to the file manager to retry. NASDs do not perform integrity checks and require no modifications of lower-level file systems.
on data, as software implementations of cryptographic algorithms
operating at disk rates are not available with the computational re- Distributed File Systems. Most of the distributed file systems
sources expected on a disk. Miller’s scheme for securing network- perform integrity checks on their data, because the data could get
attached disks uses encryption to prevent undetectable forging of transmitted back and forth through untrusted networks in a dis-
data [21]. tributed environment. Distributed file systems that exist today adopt
a wide range of mechanisms to ensure integrity. The Google File
Software RAID. Software RAID is a device driver level im- System [9] is a scalable distributed file system that stores data in
plementation of the different RAID levels discussed in Sections 3 64MB chunks. Each chunkserver uses checksumming to detect
and 4.1. Here the integrity assurance and redundancy techniques corruption of the stored data. Every chunk is broken up into 64KB
are performed by the software RAID device driver, instead of the blocks and a 32-bit checksum value is computed for each of them.
RAID controller hardware. The advantage with software RAID is For reads, the chunkserver verifies the checksum of the data blocks
that no hardware infrastructure is required for setting up the RAID before returning any data to the requester. Therefore, chunkservers
levels between any collections of disks or even partitions. Gener- do not propagate corruptions to other machines. The checksum
ally, software RAID does not perform as well as hardware RAID. read and update procedures are highly optimized for better perfor-
Secondly, atomicity guarantees for data update and parity update mance in the Google File System.
are generally weaker in software RAID than in hardware RAID. SFSRO [7] is a read-only distributed file system that allows a
large number of clients to access public, read-only data in a secure
4.2.3 File System Level manner. In a secure area, a publisher creates a digitally signed
The file system is one of the most commonly used level for im- database out of a file system’s contents, and then replicates the
plementing data integrity assurance mechanisms. This is because data on untrusted content-distribution servers, allowing for high
the file system level is the highest level in the kernel that deals with availability. SFSRO avoids performing any cryptographic opera-
data management, and has the bulk of information about the or- tions on the servers and keeps the overhead of cryptography low
ganization of data on the disk, so as to perform semantic integrity on the clients. Blocks and inodes are named by handles, which are
checks on the data. Moreover, since file systems run inside the ker- collision-resistant cryptographic hashes of their contents. Using
nel, the extent of security that they provide is generally higher than the handle of the root inode of a file system, clients can verify the
user-level integrity checkers. contents of any block by recursively checking hashes. Storing the
hashes in naming handles is an efficient idea adopted by SFSRO,
On-Disk File Systems. Almost all on-disk file systems per- which not just improves performance, but also simplifies integrity
form offline consistency checking using user-level programs like checking operations.
fsck . Most of these consistency checking programs run at startup
after an unclean shutdown of the operating system, or they are ex- 4.2.4 Application Level
plicitly initiated by the administrator. For this reason, they can- There are several application level utilities that run at the user
not capture dynamic transient bit errors in the hardware that could level, performing file system integrity checks. We discuss four
compromise file system consistency at run time. Journaling file sys- commonly used utilities in this section.
tems like Ext3, IBM’s JFS, and ReiserFS use transactional seman-
tics for avoid file system inconsistencies. Solaris’s ZFS [37] avoids Tripwire. Tripwire [17, 18] is a popular integrity checking tool
data corruption by keeping the data on the disk self-consistent at designed for Unix, to aid system administrators to monitor their file

31
systems for unauthorized modifications. Tripwire reads the security in time, damage caused by the intrusion can be reduced or even
policy for files in the file system and then performs scheduled in- prevented. In this section we discuss three different applications of
tegrity checks based on checksum comparison. The main goal of integrity assurance in the viewpoint of systems security.
Tripwire is to detect and prevent malicious replacement of key files
in the system by Trojans or other malicious programs. Intrusion Detection. In the last few years, security advisory
boards have seen an increase in the number of intrusion attacks
Samhain. Samhain [31] is a multi-platform, open source solu- on computer systems. A large class of these intrusion attacks are
tion for centralized file integrity checking and host-based intrusion performed by replacing key binary executables like the ones in
detection on POSIX systems (e.g., Unix, Linux, and Windows). It the /bin directory with custom back-doors or Trojans. Integrity
was designed to monitor multiple hosts with potentially different checking utilities like Tripwire [18], Checksummed NCryptfs [33],
operating systems from a central location, although it can also be and I3 FS [16] detect unauthorized modification or replacement of
used as a standalone application on a single host. Samhain sup- files with the help of checksums. Online notification of integrity
ports multiple logging facilities, each of which can be configured violations and immediate prevention of access to the corrupted file
individually. Samhain offers PGP-signed databases and configura- helps in reducing the damages caused by the intrusion. Self-Securing
tion files and a stealth mode to protect against attempts to subvert Storage [36] prevents intruders from undetectably tampering with
the integrity of the Samhain client. or permanently deleting stored data, by internally auditing and log-
ging operations within a window. System administrators can use
Radmind. Radmind [5] is a suite of Unix command-line tools this information to diagnose intrusions.
and a server designed to administer the file systems of multiple
Unix machines remotely. At its core, Radmind operates as Trip- Non-Repudiation and Self-Certification. Distributed stor-
wire: it detects changes to any managed file system object (e.g., age systems like SFSRO [7] and NASD [10] have public- or private-
files, directories, links, etc.), and once a change is detected, Rad- key-based signatures for integrity assurance. Each request sent be-
mind can optionally reverse the change. tween nodes of the network is appended with a public-key-based
signature generated from the request contents. This method pro-
Osiris. Osiris [23] is a host integrity monitoring system that pe- vides authentication and assurance about the integrity of the re-
riodically monitors one or more hosts for change. It maintains de- quest received at the receiver’s end, and it also helps in ensuring
tailed logs of changes to the file system, user, and group lists, res- non-repudiation and self-certification because only the right sender
ident kernel modules, etc. Osiris can be configured to email these can generate the signature.
logs to the administrator. Hosts are periodically scanned and, if de-
sired, the records can be maintained for forensic purposes. Osiris Trusting Untrusted Networks. Distributed file systems ex-
uses OpenSSL for encryption and authentication in all components. change control and data information over untrusted networks. In-
tegrity assurance mechanisms like tamper-resistant HMAC check-
4.3 Online vs. Offline Integrity Checks sums and public key signatures verify that the information sent
Some systems like I3 FS [16], SFSRO [7], PFS [35] etc., adopt through untrusted networks is not modified or corrupted.
online integrity checking which means that they ensure integrity in
the critical path of a read or write operation. Such systems are much 5.2 Performance
more effective than offline integrity checkers like fsck , Tripwire,
Samhain, etc. This is because online integrity checkers can detect The design of a certain class of integrity assurance mechanisms
integrity violations before the violation could cause damage to the takes advantage of already existing redundant information to im-
system. For example, I3 FS can detect malicious replacement of bi- prove system performance. We discuss two examples where re-
nary executables at the time they are read (for executing) and hence dundant information helps improve performance.
can prevent execution immediately. With offline methods that per-
form integrity checking in scheduled intervals of time, there is a Duplicate Elimination. Low-Bandwidth Network File Sys-
window of vulnerability during which they cannot prevent the dam- tem (LBFS) [22] and Venti [27] use checksums for eliminating du-
age caused by an intrusion. Therefore, from the security viewpoint, plicates in their data objects. Since duplicate data objects share
online integrity checkers are better than offline ones. However, on- the same checksum value, a reasonably collision-resistant check-
line methods mostly come with performance costs. Performing summing scheme could help identify duplicates by comparing their
operations like checksum comparison in the critical section of a checksums. This method of duplicate identification is efficient be-
file system read could slow down the system noticeably. Offline cause the length of data that needs to be compared is usually 128-
integrity checkers generally run in an asynchronous manner and bit checksums, compared to data blocks which could be of the or-
hence do not pose significant performance problems. Depending der of kilobytes. Duplicate elimination helps in reducing storage
on the importance of the data to be protected, a suitable integrity space and enables better cache utilization, and hence improves per-
assurance mechanism can be chosen by the administrator. For vital formance. Checksums are used for duplication elimination in the
data, online integrity checkers are better suited. Rsync protocol [38].

Indexing. Checksums are a good way to index data. Object disks


5. USES OF INTEGRITY CHECKING can use checksums for indexing their objects [20]. SFSRO uses col-
lision resistant checksums for naming blocks and inodes. Though
5.1 Security highly collision-resistant checksums can be slightly larger than tra-
Data-integrity assurance techniques go a long way in making a ditionally used integers, they help achieve dual functionality with
computer system secure. A large class of attacks on systems today a small incremental cost. Using checksums for naming handles of-
are made possible by malicious modification of key files stored on fers an easy method for retrieving the checksums associated with
the file systems. If authorized modifications to files are detected blocks and thus it improves integrity checking performance.

32
5.3 Detecting Failures Page Level. Data integrity assurance techniques at the file sys-
Integrity checks on raw data can be used to identify disk failures. tem level generally operate at a page granularity, as the smallest
Data corruption is an important symptom of a disk failure and disks unit of data read from the disk by a file system is usually a page.
that are not fail-stop in nature could still continue to operate after Since file system page sizes are usually larger than disk block sizes,
silently corrupting the data they store. Fail-stop disks stop func- operating at a page level often results in a better balance between
tioning upon a failure, thereby explicitly notifying the controller the number of checksums computations required in proportion to
that they failed. Non-recoverable disk errors have to be identified in the size of data read. The checksummed NCryptfs discussed in
time so as to protect at least portions of the data; integrity checking Section 4.2 performs checksumming at a page level. Every time a
techniques can be used to achieve this goal. Some modern disks al- new page is read from the disk, NCryptfs recomputes the check-
ready detect failures using on-disk integrity checking mechanisms. sum and compares it with the stored value to verify the integrity
In systems like RAID-5 that assume the fail-stop nature of individ- of the page data. I3 FS also has a mode for performing per-page
ual disks to detect failures, checksumming can be added to enable checksumming [16].
them to perform automatic detection and recovery of disk failures,
even in the case of non-fail-stop disks. File Level. Application level integrity checkers such as Tripwire
perform checksumming at the file level. The advantage with file-
level checksumming is that the storage space required for storing
6. IMPLEMENTATION CHOICES the checksums is reduced compared to page level checksumming,
because each file has only one checksum value associated with it.
In page level checksumming, large files could have several check-
6.1 Granularity of Integrity Checking sum values associated with them, requiring more storage space and
There are several different granularities at which integrity checks efficient retrieval methods. For that reason, I3 FS also includes a
can be performed. For example, if checksumming is used, the size mode to perform whole-file checksumming.
of data to be checksummed can be either a byte, a block, or even a
whole file stored on disk. Choosing the right granularity of data to 6.2 Storing Redundant Data
be checksummed for a particular system is important for achieving
Integrity checking requires the management of redundant infor-
good performance and security. Operating at a finer granularity
mation (physical or logical) persistently. Because many integrity
for integrity checks generally results in too many computations of
assurance methods are online in nature and operate during the crit-
redundant information, especially when the access pattern is large
ical section of reads and writes, efficiency is a key property that the
bulk reads. On the other hand, having a large enough granularity
storage and retrieval mechanisms should have. There are a several
of data could result in more I/O for smaller reads, because an entire
different techniques that existing mechanisms adopt. In this section
chunk of data needs to be read for performing an integrity check.
we discuss a few of them.
Therefore, the optimal choice of granularity depends on the nature
SFSRO stores collision-resistant cryptographic checksums of file
of the system and also on the general access pattern. In this section
system objects in the form of naming handles [7]. Venti [27], a
we discuss four different choices of granularities at which integrity
network storage system, uses unique hashes of block contents to
checks can be performed, along with their trade-offs.
identify a block. In SFSRO, blocks and inodes are named using the
checksums of their contents, so that given the naming handle of the
Block Level. Usually, hardware level and device driver level root inode of a file system, a client can verify the contents of any
integrity assurance mechanisms operate at the granularity of disk block by recursively checking hashes. This is an efficient way of
blocks. This is mainly because higher level abstractions like page, optimizing performance for storing and retrieving checksums. On-
file, etc., are not visible at these lower levels. Most of the con- disk parity information for error-correcting codes is stored at the
figurations of RAID and NASD discussed in Section 4.2 perform block level, much closer to the relevant blocks, to avoid additional
integrity checks at the block level. For physical redundancy check- disk rotation or seeks for reading the parity information.
ing like checksumming, checksums are computed for each block Checksummed NCryptfs [33] uses parallel files to store the page-
on disk and then stored. Upon reading a disk block, its checksum level checksums for each file. Each file in the file system has an as-
will be recomputed and compared with the stored value. In block- sociated hidden checksum file which stores the checksums of each
level checksumming, for large reads that span a large number of of the file pages. Whenever a file is read, the associated checksum
disk blocks, the number of checksum computations and compar- is read from the hidden file and compared for integrity violations.
isons required can be large. To mitigate this problem, most of the The advantage of this method is that it can be easily implemented
block level integrity checkers are implemented in hardware. Block in a transparent manner across different underlying file systems.
level integrity checkers cannot perform file system level seman- Checksummed NCryptfs can be used with any underlying on-disk
tic integrity checking and hence they are limited in their scope. or network file systems. The problem with using parallel files is
The Protected File System (PFS) discussed in Section 4.2 employs that each time a file is opened, it gets translated to opening two
checksumming at a block granularity, but at the file system level. files (the checksum file also). This affects performance to some
extent.
Network Request Level. In distributed file systems, the re- I3 FS uses in-kernel Berkeley databases (KBDB) [15] to store
quests sent between different nodes of the network need to be au- both page level and whole file checksums. Since KBDB supports
thentic so as to ensure security. Several distributed storage systems efficient storage and retrieval of hkey ,value i pairs in the form
such as NASD adopt public key signatures or HMAC checksums of B-trees, hash tables, etc., it enables easy storage and retrieval of
to authenticate requests sent between nodes. The sender appends checksums, keyed by inode numbers and page numbers. PFS [35]
a signature with every request and it is verified at the receiver end. stores block level checksums for both data and meta-data blocks as
NASDs adopt integrity checking for requests but not for data, be- part of the journal, indexed by a logical record number. It also has
cause network request transfers are more important than data trans- an in-memory hash table for easy retrieval of checksums from the
fers, and forging network requests could break confidentiality. journal.

33
7. LOGICAL REDUNDANCY The Pilot file system is a very good example for how logical re-
Techniques that involve logical redundancy exploit the seman- dundancy can be exploited for performing integrity checks. How-
tics of data to verify integrity. Logical redundancy mechanisms ever, the Pilot operating system is just a research OS and is not in
can be used to perform integrity checks by exploiting the inherent use today. In general, with logical redundancy, although we can-
semantic redundancy that already exists in the data. An example not ensure block data integrity, we can ensure meta-data integrity
of a system that employs logical redundancy is the Pilot operating without storing any additional information. For key files in the file
system [29] whose file system meta-data is advisory in nature in system whose integrity has to be monitored, its meta-data can be re-
that the entire meta-data of a file can be regenerated by scanning constructed each time the file is read, and can be compared with the
through the file data. It stores meta-data for faster access to the stored meta-data. This can detect inconsistencies between the data
file, but it can potentially use this information to perform integrity and meta-data. This method of online integrity checking is useful
checks on the file when needed by just comparing the regenerated in cases where important files are frequently updated by concurrent
value and the stored value of the meta-data. Similarly, in some size- processes, giving room for inconsistencies during a system fault.
changing stackable file systems [43] the index files that store map-
ping information for sizes are totally reconstructible from the file 7.2 File System Bitmaps
data. This can also be used for performing online integrity checks Current Unix file systems such as Ext2 use several bitmaps for
to detect inconsistencies between file sizes. managing meta-data. For example, in Ext2, inode table entries are
Logical redundancy is a common method used by databases to identified through inode bitmaps. Each bit represents the current
maintain the semantic integrity of information they store [11]. We state of an entry in the inode table within that group, where 1 means
have identified that the general technique of logical redundancy “used” and 0 means “free/available.” In addition to this bitmap, the
can be exploited by storage systems at various levels, to implement file system also stores the current state of the inode which exists
seamless integrity checking. The main advantage of this method is in the inode table. Whenever an inode is deleted, the file system
that there is minimal additional overhead imposed by the integrity marks the entry in the inode table as “deleted.” This is redundant
checking mechanism, because the storage and perhaps retrieval of information maintained by Ext2 which is presently used by file sys-
redundant information is already performed by the system for other tem startup integrity checkers like fsck to reconstruct the bitmaps
purposes. There are several areas in storage where logical redun- when the file system is corrupted due to a system crash.
dancy can be exploited to ensure the integrity of data. We discuss This logical redundancy can be used for performing online in-
three in this section. tegrity checks. In this scenario, integrity checks help prevent two
kinds of potential problems: unexpected loss of data, and wasted
7.1 Pilot File System inodes and disk blocks. A transient hardware bit error while read-
The Pilot file system is part of the Pilot Operating System [29]. ing the inode bitmap could make an existing file inaccessible. If a
The Pilot file system’s uniqueness is its robustness. This is achieved bitmap value for a used inode is read as free, then the file system
primarily through the use of reconstructible maps. Many systems could overwrite the existing inode for a newly created file, thereby
make use of a file scavenger, a startup consistency checker for a file making the file pointed to by the older inode unreachable. This also
system, like fsck . In Pilot, the scavenger is given first-class status, makes the blocks occupied by the older file inaccessible, resulting
in the sense that the file structures were all designed from the begin- in wasted disk space. Similarly, if while reading the inode bitmap, a
ning with the scavenger in mind. Each file page is self-identifying bit error occurs, this could result in a free inode being read as used,
by virtue of its label, written as a separate physical record adjacent and the corresponding inode will be wasted. By performing online
to the one holding the actual contents of the page. Conceptually, integrity checks based on logical redundancy, bit errors while read-
one can think of a file page access as proceeding to scan all known ing bitmap blocks can easily be identified, and their effects can be
volumes, and checking the label of each page encountered until prevented.
the desired one is found. In practice, this scan is performed only
once by the scavenger, which leaves behind maps on each volume 7.3 On-Disk Data Structures
describing what it found there. Pilot then uses the maps and incre- Several indexing schemes and data storage patterns are being
mentally updates them as file pages are created and deleted. The employed for efficient retrieval of data from secondary storage de-
logical redundancy of the maps does not imply lack of importance, vices. Hash files and B-trees are the most common examples. B-
because the system would be not be efficient without them. Since trees are balanced trees that are optimized for situations when part
they contain only redundant information, they can be completely or all of the tree must be maintained in secondary storage such as a
reconstructed should they be lost. In particular, this means that magnetic disk. Since disk accesses are expensive time-consuming
damage to any page on the disk can compromise only data on that operations, a B-tree tries to minimize the number of disk accesses.
page. The primary map structure is the volume file map, a B-tree For example, a B-tree with a height of 2 and a branching factor
keyed on file-UID and page-number which returns the device ad- of 1,001 can store over one billion keys but requires at most two
dress of the page. All file storage devices check the label of the disk accesses to search for any node. Each node other than the
page and abort the I/O operation in case of a mismatch; mismatch leaf nodes in a B-tree has pointers to several children. One of the
does not occur in normal operation and generally indicates the need common integrity problems with B-trees are pointer corruptions.
to scavenge the volume. The volume file map uses extensive com- A pointer corruption in a single node of a B-tree can cause serious
pression of UIDs and run-encoding of page numbers to maximize problems while reading the data stored in it, as the entire branch-
the out-degree of the internal nodes of the B-tree and thus minimize ing path will be changed due to the wrong pointer value. Several
its depth. methods can be used to perform integrity checks in B-trees. The
The volume allocation map is also an important part of the Pi- most common among them is to store the checksums of the child
lot file system. It is a table that describes the allocation status of nodes along with the pointers in each node. With these checksums,
each page on the disk. Each free page is a self-identifying member each time a pointer is followed, the checksum of the child node
of a hypothetical file of free pages, allowing reconstruction of the can be computed and compared with the stored value. This method
volume allocation map. can effectively detect integrity violations in B-trees, but is not quite

34
efficient. A modification to a single leaf node data requires the re- [6] P. A. DesAutels. SHA1: Secure Hash Algorithm.
computation and storing the checksums of the entire ancestor path www.w3.org/PICS/DSig/SHA1_1_0.html , 1997.
up to the root. Moreover, computing and comparing checksums for [7] K. Fu, M. F. Kaashoek, and D. Mazières. Fast and Secure
each pointer access could seriously affect performance. Distributed Read-Only File System. In Proceedings of the
Logical redundancy techniques can be employed for perform- Fourth Symposium on Operating Systems Design and
ing efficient integrity checks in B-trees. If each child node has a Implementation (OSDI 2000), pages 181–196, San Diego,
back pointer to its parent, every time we follow a pointer, we can CA, October 2000.
check if the new child node visited points back to its parent. This [8] E. Gal and S. Toledo. A transactional flash file system for
way pointer corruptions can easily be detected without the hassle of microcontrollers. In Usenix ’05: Proceedings of the Usenix
generating and comparing checksums. Although this method can- Annual Technical Conference, pages 89–104, Anaheim, CA.,
not ensure the integrity of data in the child nodes, it can effectively USA, 2005. Usenix.
detect pointer corruptions with minimal space and time overheads. [9] S. Ghemawat, H. Gobioff, and S. T. Leung. The Google File
System. In Proceedings of the 19th ACM Symposium on
8. CONCLUSIONS AND FUTURE WORK Operating Systems Principles (SOSP ’03), pages 29–43,
Bolton Landing, NY, October 2003.
This paper presents a survey of different integrity assurance mech-
[10] G. A. Gibson, D. F. Nagle, W. Courtright II, N. Lanza,
anisms that are in use today. We have analyzed integrity assurance
P. Mazaitis, M. Unangst, and J. Zelenka. NASD Scalable
techniques from three different dimensions in our taxonomy: the
Storage Systems. In Proceedings of the 1999 USENIX
scope of assurance, logical layer, and checking mode. We have
Extreme Linux Workshop, Monterey, CA, June 1999.
also discussed several interesting applications of integrity assur-
ance. We analyzed how existing systems that employ integrity [11] L. Golubchik and R. Muntz. Fault tolerance issues in data
checks can use redundant data to improve their performance and declustering for parallel database systems. Bulletin of the
add new functionality. We presented real examples for some of the Technical Committee on Data Engineering, 14–28,
systems. We discussed several implementation choices for integrity September 1994.
checking granularity and managing redundant information. [12] R. Hagmann. Reimplementing the cedar file system using
We formalized a new class of efficient integrity assurance mech- logging and group commit. ACM SIGOPS Operating
anisms called logical redundancy and discussed three examples Systems Review, 21(5):155–162, 1987.
where it can be used. In our taxonomy we describe integrity as- [13] R. W. Hamming. Error detecting and error correcting codes.
surance techniques in four different viewpoints: the redundancy The Bell System Technical Journal, Vol XXVI, April 1950.
mechanisms used, their scope, their level of operation, and the fre- [14] W. Hsu and S. Ong. Fossilization: A Process for Establishing
quency at which checks are performed. We discussed the operation Truly Trustworthy Records. IBM Research Report, (10331),
of several existing systems in each of those viewpoints. 2004.
Our experience describing the taxonomy of integrity assurance [15] A. Kashyap, J. Dave, M. Zubair, C. P. Wright, and E. Zadok.
techniques has helped us focus our thinking on exploring more log- Using the Berkeley Database in the Linux Kernel.
ical redundancy techniques for integrity checking at low cost, in a www.fsl.cs.sunysb.edu/project- kbdb.html ,
highly efficient manner. We hope to explore further systems that 2004.
maintain redundant information as part of their normal operation, [16] A. Kashyap, S. Patil, G. Sivathanu, and E. Zadok. I3FS: An
but do not quite use them for performing integrity checks. These In-Kernel Integrity Checker and Intrusion Detection File
systems can be made more secure and efficient, by making use of System. In Proceedings of the 18th USENIX Large
the information that they maintain. Installation System Administration Conference (LISA 2004),
pages 69–79, Atlanta, GA, November 2004. USENIX
Association.
9. REFERENCES [17] G. Kim and E. Spafford. Experiences with Tripwire: Using
[1] W. Barlett and L. Spainbower. Commercial fault tolerance: A Integrity Checkers for Intrusion Detection. In Proceedings of
tale of two systems. In Proceedings of the IEEE Transactions the Usenix System Administration, Networking and Security
on Dependable and Secure Computing, pages 87–96, (SANS III), 1994.
January 2004. [18] G. Kim and E. Spafford. The Design and Implementation of
[2] E. W. Biersack. Performance evaluation of Forward Error Tripwire: A File System Integrity Checker. In Proceedings of
Correction in ATM networks. In SIGCOMM ’92: Conference the 2nd ACM Conference on Computer Commuications and
proceedings on Communications architectures & protocols, Society (CCS), November 1994.
pages 248–257, New York, NY, USA, 1992. ACM Press. [19] H. Krawczyk, M. Bellare, and R. Canetti. HMAC:
[3] M. Blaze. A Cryptographic File System for Unix. In Keyed-hashing for message authentication. Technical Report
Proceedings of the first ACM Conference on Computer and RFC 2104, Internet Activities Board, February 1997.
Communications Security, pages 9–16, Fairfax, VA, 1993. [20] M. Mesnier, G. R. Ganger, and E. Riedel. Object based
ACM. storage. IEEE Communications Magazine, 41, August 2003.
[4] CERT Coordination Center. CERT/CC Overview incident ieeexplore.ieee.org.
and Vulnerability Trends Technical Report. www.cert. [21] E. Miller, W. Freeman, D. Long, and B. Reed. Strong
org/present/cert- overview- trends . Security for Network-Attached Storage. In Proceedings of
[5] W. Craig and P. M. McNeal. Radmind: The Integration of the First USENIX Conference on File and Storage
Filesystem Integrity Checking with File System Technologies (FAST 2002), pages 1–13, Monterey, CA,
Management. In Proceedings of the 17th USENIX Large January 2002.
Installation System Administration Conference (LISA 2003), [22] A. Muthitacharoen, B. Chen, and D. Mazieres. A
October 2003.

35
Low-Bandwidth Network File System. In Proceedings of the [34] M. Sivathanu, V. Prabhakaran, F. I. Popovici, T. E. Denehy,
19th ACM Symposium on Operating Systems Principles A. C. Arpaci-Dusseau, , and R. H. Arpaci-Dusseau.
(SOSP ’01), Chateau Lake Louise, Banff, Canada, October Semantically-Smart Disk Systems. In Proceedings of the
2001. Second USENIX Conference on File and Storage
[23] Osiris. Osiris: Host Integrity Management Tool, 2004. Technologies (FAST ’03), pages 73–88, San Francisco, CA,
www.osiris.com . March 2003. USENIX Association.
[24] D. Patterson, G. Gibson, and R. Katz. A case for redundant [35] C. A. Stein, J. H. Howard, and M. I. Seltzer. Unifying file
arrays of inexpensive disks (RAID). In Proceedings of the system protection. In Proceedings of the Annual USENIX
ACM SIGMOD, pages 109–116, June 1988. Technical Conference, pages 79–90, Boston, MA, June 2001.
[25] E. W. Patterson and E. J. Weldon. Error-correcting codes. USENIX Association.
MIT Press Cambridge, 2nd ed., 1972. [36] J. D. Strunk, G. R. Goodson, M. L. Scheinholtz, C. A. N.
[26] V. Prabhakaran, N. Agrawal, L. N. Bairavasundaram, H. S. Soules, and G. R. Ganger. Self-Securing Storage: Protecting
Gunawi, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Data in Compromised Systems. In Proceedings of the Fourth
IRON File Sysetms. In Proceedings of the 20th ACM Symposium on Operating Systems Design and
Symposium on Operating Systems Principles (SOSP ’05), Implementation, pages 165–180, San Diego, CA, October
Brighton, UK, October 2005. 2000.
[27] S. Quinlan and S. Dorward. Venti: a new approach to [37] Sun Microsystems, Inc. Solaris ZFS file storage solution.
archival storage. In Proceedings of First USENIX conference Solaris 10 Data Sheets, 2004.
on File and Storage Technologies, pages 89–101, Monterey, www.sun.com/software/solaris/ds/zfs.jsp .
CA, January 2002. [38] A Trigdell and P Mackerras. The rsync algorithm. 1998.
[28] R. J. McEliece and D. V. Sarwate. On sharing secrets and Australian National University.
Reed-Solomon codes. Commun. ACM, 24(9):583–584, 1981. [39] Tripwire Inc. Tripwire Software. www.tripwire.com .
[29] D. D. Redell, Y. K. Dalal, T. R. Horsley, H. C. Lauer, W. C. [40] G. Weinberg. The solaris dynamic file system. 2004.
Lynch, P. R. McJones, H. G. Murray, and S. C. Purcell. Pilot: [41] C. P. Wright, J. Dave, and E. Zadok. Cryptographic File
An operating system for a personal computer. In Proceedings Systems Performance: What You Don’t Know Can Hurt
of the 7th ACM Symposium on Operating Systems Principles You. Technical Report FSL-03-02, Computer Science
(SOSP), pages 106–107, 1979. Department, Stony Brook University, August 2003. www.
[30] R. L. Rivest. RFC 1321: The MD5 Message-Digest fsl.cs.sunysb.edu/docs/nc- perf/perf.pdf .
Algorithm. In Internet Activities Board. Internet Activities [42] C. P. Wright, M. Martino, and E. Zadok. NCryptfs: A Secure
Board, April 1992. and Convenient Cryptographic File System. In Proceedings
[31] Samhain Labs. Samhain: File System Integrity Checker, of the Annual USENIX Technical Conference, pages
2004. http://samhain.sourceforge.net . 197–210, San Antonio, TX, June 2003. USENIX
[32] M. Seltzer and O. Yigit. A new hashing package for UNIX. Association.
In Proceedings of the Winter USENIX Technical Conference, [43] E. Zadok, J. M. Anderson, I. Bădulescu, and J. Nieh. Fast
pages 173–184, Dallas, TX, January 1991. USENIX Indexing: Support for size-changing algorithms in stackable
Association. file systems. In Proceedings of the Annual USENIX
[33] G. Sivathanu, C. P. Wright, and E. Zadok. Enhancing File Technical Conference, pages 289–304, Boston, MA, June
System Integrity Through Checksums. Technical Report 2001. USENIX Association.
FSL-04-04, Computer Science Department, Stony Brook
University, May 2004. www.fsl.cs.sunysb.edu/
docs/nc- checksum- tr/nc- checksum.pdf .

36

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy