NetWorker 19.5 Performance Optimization Planning Guide
NetWorker 19.5 Performance Optimization Planning Guide
June 2021
Rev. 01
Notes, cautions, and warnings
NOTE: A NOTE indicates important information that helps you make better use of your product.
CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid
the problem.
WARNING: A WARNING indicates a potential for property damage, personal injury, or death.
© 2000 - 2021 Dell Inc. or its subsidiaries. All rights reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries.
Other trademarks may be trademarks of their respective owners.
Contents
Figures..........................................................................................................................................5
Tables........................................................................................................................................... 6
Preface.........................................................................................................................................................................................7
Chapter 1: Overview..................................................................................................................... 11
Introduction.......................................................................................................................................................................... 11
NetWorker data flow......................................................................................................................................................... 11
Contents 3
Chapter 3: Tune Settings............................................................................................................ 46
Optimize NetWorker parallelism.................................................................................................................................... 46
Server parallelism........................................................................................................................................................ 46
Server's client parallelism.......................................................................................................................................... 46
Action parallelism......................................................................................................................................................... 47
Multiplexing................................................................................................................................................................... 47
File system density............................................................................................................................................................47
Disk optimization................................................................................................................................................................47
Device performance tuning methods........................................................................................................................... 48
Input/output transfer rate........................................................................................................................................ 48
Built-in compression................................................................................................................................................... 48
Drive streaming............................................................................................................................................................ 48
Device load balancing................................................................................................................................................. 48
Fragmenting a disk drive............................................................................................................................................48
Network devices................................................................................................................................................................49
Fibre Channel latency................................................................................................................................................. 49
Data Domain................................................................................................................................................................. 50
CloudBoost.................................................................................................................................................................... 51
AFTD device target and max sessions.................................................................................................................... 51
Number of virtual device drives versus physical device drives........................................................................52
Network optimization....................................................................................................................................................... 53
Advanced configuration optimization..................................................................................................................... 53
Operating system TCP stack optimization............................................................................................................53
Advanced tuning..........................................................................................................................................................53
Network latency.......................................................................................................................................................... 54
Ethernet duplexing...................................................................................................................................................... 54
Firewalls.........................................................................................................................................................................55
Jumbo frames...............................................................................................................................................................55
Congestion notification..............................................................................................................................................55
TCP buffers.................................................................................................................................................................. 56
Increase TCP backlog buffer size............................................................................................................................57
IRQ balancing and CPU affinity............................................................................................................................... 58
Interrupt moderation.................................................................................................................................................. 58
TCP chimney offloading.............................................................................................................................................58
Name resolution...........................................................................................................................................................59
Operating system specific settings for SLES SP2....................................................................................................59
4 Contents
Figures
Figures 5
Tables
1 Revision history...........................................................................................................................................................7
2 Style conventions.......................................................................................................................................................9
3 Sizing information for a physical server..............................................................................................................16
4 Sizing information for a virtual server................................................................................................................. 17
5 Reliability events....................................................................................................................................................... 18
6 Bus specifications.................................................................................................................................................... 20
7 Disk write latency results and recommendations.............................................................................................22
8 PSS support by NetWorker release.................................................................................................................... 26
9 Required IOPS for NetWorker server operations............................................................................................. 31
10 Disk drive IOPS values............................................................................................................................................ 32
11 Distribution of workflows and jobs...................................................................................................................... 36
12 The effect of blocksize on an LTO-4 tape drive.............................................................................................. 49
13 Tolerable Range for Low Density file system................................................................................................... 66
14 Tolerable Range for High Density file system................................................................................................... 67
6 Tables
Preface
As part of an effort to improve product lines, periodic revisions of software and hardware are released. Therefore, all versions of
the software or hardware currently in use might not support some functions that are described in this document. The product
release notes provide the most up-to-date information on product features.
If a product does not function correctly or does not function as described in this document, contact a technical support
professional.
NOTE: This document was accurate at publication time. To ensure that you are using the latest version of this document,
go to the Support website https://www.dell.com/support.
Purpose
This document describes how to size and optimize the NetWorker software.
Audience
This document is intended for the NetWorker software administrator.
Revision history
The following table presents the revision history of this document.
Related documentation
The NetWorker documentation set includes the following publications, available on the Support website:
● NetWorker E-LAB Navigator
Provides compatibility information, including specific software and hardware configurations that NetWorker supports. To
access E-LAB Navigator, go to https://elabnavigator.emc.com/eln/elnhome.
● NetWorker Administration Guide
Describes how to configure and maintain the NetWorker software.
● NetWorker Network Data Management Protocol (NDMP) User Guide
Describes how to use the NetWorker software to provide data protection for NDMP filers.
● NetWorker Cluster Integration Guide
Contains information related to configuring NetWorker software on cluster servers and clients.
● NetWorker Installation Guide
Provides information on how to install, uninstall, and update the NetWorker software for clients, storage nodes, and servers
on all supported operating systems.
● NetWorker Updating from a Previous Release Guide
Describes how to update the NetWorker software from a previously installed release.
● NetWorker Release Notes
Preface 7
Contains information on new features and changes, fixed problems, known limitations, environment and system requirements
for the latest NetWorker software release.
● NetWorker Command Reference Guide
Provides reference information for NetWorker commands and options.
● NetWorker Data Domain Boost Integration Guide
Provides planning and configuration information on the use of Data Domain devices for data deduplication backup and
storage in a NetWorker environment.
● NetWorker Performance Optimization Planning Guide
Contains basic performance tuning information for NetWorker.
● NetWorker Server Disaster Recovery and Availability Best Practices Guide
Describes how to design, plan for, and perform a step-by-step NetWorker disaster recovery.
● NetWorker Snapshot Management Integration Guide
Describes the ability to catalog and manage snapshot copies of production data that are created by using mirror technologies
on storage arrays.
● NetWorkerSnapshot Management for NAS Devices Integration Guide
Describes how to catalog and manage snapshot copies of production data that are created by using replication technologies
on NAS devices.
● NetWorker Security Configuration Guide
Provides an overview of security configuration settings available in NetWorker, secure deployment, and physical security
controls needed to ensure the secure operation of the product.
● NetWorker VMware Integration Guide
Provides planning and configuration information on the use of VMware in a NetWorker environment.
● NetWorker Error Message Guide
Provides information on common NetWorker error messages.
● NetWorker Licensing Guide
Provides information about licensing NetWorker products and features.
● NetWorker REST API Getting Started Guide
Describes how to configure and use the NetWorker REST API to create programmatic interfaces to the NetWorker server.
● NetWorker REST API Reference Guide
Provides the NetWorker REST API specification used to create programmatic interfaces to the NetWorker server.
● NetWorker 19.5 with CloudBoost 19.5 Integration Guide
Describes the integration of NetWorker with CloudBoost.
● NetWorker 19.5 with CloudBoost 19.5 Security Configuration Guide
Provides an overview of security configuration settings available in NetWorker and Cloud Boost, secure deployment, and
physical security controls needed to ensure the secure operation of the product.
● NetWorker Management Console Online Help
Describes the day-to-day administration tasks performed in the NetWorker Management Console and the NetWorker
Administration window. To view the online help, click Help in the main menu.
● NetWorker User Online Help
Describes how to use the NetWorker User program, which is the Windows client interface, to connect to a NetWorker
server to back up, recover, archive, and retrieve files over a network.
NOTE: Data Domain is now PowerProtect DD. References to Data Domain or DD systems in this documentation, in the UI,
and elsewhere in the product include PowerProtect DD systems and older Data Domain systems. In many cases the UI has
not yet been updated to reflect this change.
8 Preface
Typographical conventions
The following type style conventions are used in this document:
You can use the following resources to find more information about this product, obtain support, and provide feedback.
Knowledgebase
The Knowledgebase contains applicable solutions that you can search for either by solution number (for example, KB000xxxxxx)
or by keyword.
To search the Knowledgebase:
1. Go to https://www.dell.com/support.
2. On the Support tab, click Knowledge Base.
3. In the search box, type either the solution number or keywords. Optionally, you can limit the search to specific products by
typing a product name in the search box, and then selecting the product from the list that appears.
Preface 9
Live chat
To participate in a live interactive chat with a support agent:
1. Go to https://www.dell.com/support.
2. On the Support tab, click Contact Support.
3. On the Contact Information page, click the relevant support, and then proceed.
Service requests
To obtain in-depth help from Licensing, submit a service request. To submit a service request:
1. Go to https://www.dell.com/support.
2. On the Support tab, click Service Requests.
NOTE: To create a service request, you must have a valid support agreement. For details about either an account or
obtaining a valid support agreement, contact a sales representative. To find the details of a service request, in the
Service Request Number field, type the service request number, and then click the right arrow.
Online communities
For peer contacts, conversations, and content on product support and solutions, go to the Community Network https://
www.dell.com/community. Interactively engage with customers, partners, and certified professionals online.
10 Preface
1
Overview
This chapter includes the following topics:
Topics:
• Introduction
• NetWorker data flow
Introduction
The NetWorker software is a network storage management application that is optimized for the high-speed backup
and recovery operations of large amounts of complex data across entire datazones. This guide addresses non-disruptive
performance tuning options. Although some physical devices may not meet the expected performance, it is understood that
when a physical component is replaced with a better performing device, another component becomes a bottle neck.
This manual tries to address NetWorker performance tuning with minimal disruptions to the existing environment. It tries to
fine-tune feature functions to achieve better performance with the same set of hardware, and to assist administrators to:
● Understand data transfer fundamentals
● Determine requirements
● Identify bottlenecks
● Optimize and tune NetWorker performance
Overview 11
Initial Handshake Communication
Client Direct
nsrindexd nsrmmdbd
nsrmmgd
nsrexcd nsrjobd nsrsnmd
Data
Tracking info
Interprocess
communication
12 Overview
Initial Handshake Communication
Client Direct
nsrindexd nsrmmdbd
recover
nsrmmd
ansrd
nsrlcpd
nsrjobd nsrmmgd
Overview 13
2
Size the NetWorker Environment
This chapter describes how to best determine backup and system requirements. The first step is to understand the
environment. Performance issues are often attributed to hardware or environmental issues. An understanding of the entire
backup data flow is important to determine the optimal performance expected from the NetWorker software.
Topics:
• Expectations
• System components
• Storage considerations
• Backup operation requirements
• Components of a NetWorker environment
• Recovery performance factors
• Parallel restore
• Connectivity and bottlenecks
Expectations
You can determine the backup performance expectations and required backup configurations for your environment based on the
Recovery Time Objective (RTO) for each client.
System components
Every backup environment has a bottleneck. It may be a fast bottleneck, but the bottleneck will determine the maximum
throughput obtainable in the system. Backup and restore operations are only as fast as the slowest component in the backup
chain.
Performance issues are often attributed to hardware devices in the datazone. This guide assumes that hardware devices are
correctly installed and configured.
This section discusses how to determine requirements. For example:
● How much data must move?
● What is the backup window?
● How many drives are required?
● How many CPUs are required?
Devices on backup networks can be grouped into four component types. These are based on how and where devices are used.
In a typical backup network, the following four components are present:
● System
● Storage
● Network
● Target device
System
Several system configuration components impact performance:
● CPU
● Memory
● System bus (this determines the maximum available I/O bandwidth)
CPU requirements
Determine the optimal number of CPUs required, if 5 MHz is required to move 1 MB of data from a source device to a target
device. For example, a NetWorker server, or storage node backing up to a local tape drive at a rate of 100 MB per second,
requires 1 GHz of CPU power:
● 500 MHz is required to move data from the network to a NetWorker server or storage node.
● 500 MHz is required to move data from the NetWorker server or storage node to the backup target device.
NOTE: 1 GHz on one type of CPU does not directly compare to a 1 GHz of CPU from a different vendor.
The CPU load of a system is impacted by many additional factors. For example:
Because the NMC UI uses more memory when processing messages from RabbitMQ service, it is recommended that you
change the default Heap memory from 4 GB to 12 GB for small, medium, and large scale environments.
Up to
10,000
jobs per
day
jobs per
day
a. For example, if 8 vCPUs are configured on a virtual server, then a minimum of 4 vCPUs must be reserved, which is 50% of
8vCPUs.
b. For example, if 16 GB of RAM is configured on a virtual server, then a minimum of 8 GB must be reserved, which is 50% of
16 GB RAM.
NOTE:
● For virtual machines running as NetWorker server, ensure that you reserve memory and vCPU.
● Ensure that the swap space is equal to or double the RAM size.
● In the case of cloning, if RPS is enabled (nsrrecopy spawns per process), the server requires additional memory of
around 1.5 GB for each nsrrecopy process to run smoothly. For example, if five nsrrecopy processes are running
on a local or remote storage node, then additionally 7.5 GB of memory is required for clone to complete in a large scale
environment.
● Media database related operations using mminfo queries with different switch options (-avot, -avVS, -aS, and so
on) can consume reasonable amount of memory to process the mminfo query request. For example, On a scaled media
database of around 5 million records, processing of certain mminfo query request requires an additional memory of
around 7 GB.
● For better performance and scalability of NMC, Dell EMC recommends that you have a separate NMC server and a
separate NMC UI client. Ensure that the NMC server and the NetWorker server are running inside the same subnet or
vLAN to avoid latency impact when communicating with the NetWorker server.
● Dell EMC recommends that you configure a maximum of 2000 jobs per workflow and a maximum of 100 workflows per
policy. Exceeding these limits will significantly increase the load on the NetWorker server and on the user interface (UI)
response time. NetWorker can process 1024 jobs at a time, the rest of the jobs are queued, multiple workflows are
started concurrently. Do not exceed more than 6000 jobs in a queue at any fixed point in time. To prevent overloading
the server, stagger the workflow start time.
NetWorker Recommended Threshold Limit Schedule workflows in a manner that does not involve
Exceeded: Concurrent Running Workflows running 100 workflows concurrently. The section Distribution
Recommendation is 100, Current value is 106. of workflows and jobs provides more information.
Bus specifications
Bus specifications are based on bus type, MHz, and MB per second.
Bus specifications for specific buses are listed in the following table.
Storage considerations
There are components that impact the performance of storage configurations. They are as follows:
● Storage connectivity:
○ Local versus SAN attached versus NAS attached.
○ Use of storage snapshots.
The type of snapshot technology that is used determines the read performance.
● Some storage replication technologies add significant latency to write access which slows down storage access.
● Storage type:
○ Serial ATA (SATA) computer bus is a storage-interface for connecting host bus adapters to storage devices such as hard
disk drives and optical drives.
○ Fibre Channel (FC) is a gigabit-speed network technology that is primarily used for storage networking.
○ Flash is a non-volatile computer storage that is used for general storage and the transfer of data between computers and
other digital products.
● I/O transfer rate of storage is influenced by different RAID levels, where the best RAID level for the backup server is RAID1
or RAID5. Backup to disk should use RAID3.
● If the target system is scheduled to perform I/O intensive tasks at a specific time, schedule backups to run at a different
time.
● I/O data:
○ Raw data access offers the highest level of performance, but does not logically sort saved data for future access.
○ File systems with a large number of files have degraded performance due to additional processing required by the file
system.
● If data is compressed on the disk, the operating system or an application, the data is decompressed before a backup. The
CPU requires time to re-compress the files, and disk speed is negatively impacted.
NOTE: Avoid using synchronous replication technologies or any other technology that adversely impacts latency.
● For NDMP backups, configure a separate location on the NetWorker server for the /nsr/tmp folder to accommodate large
temporary file processing.
● Use the operating system to handle parallel file system I/O even if all mount points are on the same physical location. The
operating system handles parallel file system I/O more efficiently than the NetWorker software.
● Use RAID 3 for disk storage for Advanced File Type Device (AFTD).
● For antivirus software, disable scanning of the NetWorker databases. If the antivirus software scans the /nsr folder,
performance degradation, time-outs, or NetWorker database corruption can occur because of frequent file open/close
requests. The antivirus exclude list should also include NetWorker storage node locations that are used for AFTD.
In NetWorker 9.0 and later, the Bootstrap backup runs as part of server protection policy. However the IOPS
requirement remains almost same as mentioned in this section.
● IOPS requirements increase if the NetWorker software is configured to start many jobs simultaneously.
To accommodate load spikes, add one IOPS for each parallel session that is started.
It is recommended not to start more than 40 clients per group with the default client parallelism of four. The result is 160
IOPS during group startup.
Starting many clients simultaneously can lead to I/O system starvation.
● Each volume request results in a short I/O burst of approximately 200 IOPS for a few seconds.
For environments running a small number of volumes the effect is minimal. However, for environments with frequent mount
requests, a significant load is added to the NetWorker server. In this case, add 100 IOPS for high activity (more than 50
mount requests per hour). To avoid the excessive load, use a smaller number of large volumes.
● NDMP backups add additional load due to index post-processing.
For large NDMP environment backups with more than 10 million files, add an additional 120 IOPS.
Add the following TCP parameters when the NetWorker server runs with a heavy load (concurrent runs with a large number of
socket requests being made on the server application ports):
● On a Linux NetWorker server, add the following TCP parameters in the /etc/sysctl.conf file and run the sysctl
--system command:
net.ipv4.tcp_fin_timeout = 30
net.ipv4.ip_local_port_range = 15000 65535
net.core.somaxconn = 1024
● On a Linux NMC server, update the file-max value to 65536 to ensure Postgres database connectivity when the
NetWorker server runs with heavy loads:
HKEY_LOCAL_MACHINE\System\CurrectControlSet\services\Tcpip\Parameters
Value Name: TcpTimedWaitDelay
Data type: REG_DWORD
Base: Decimal
Value: 30
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
Value Name: MaxUserPort
Data Type: REG_DWORD
Base: Decimal
Value: 65535
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
Value Name: TcpNumConnections
Data Type: REG_DWORD
Base: Decimal
Value: 1024
NOTE: Use the default startup script on the NetWorker storage nodes and clients. The open file descriptor parameter is
not required on storage nodes and clients.
Figure 3. PSS performance gains between NetWorker 8.2 and NetWorker 9.0.x releases
Subsequent sections in this chapter break down the behavior of PSS in each release.
When a PSS enabled UNIX Client resource's parallelism value is greater than the resource's number of save points, the
scheduled backup savegroup process divides the parallelism among the save points and starts PSS save processes for all the
save points at approximately the same time. However, this is done within the limits of the following:
● The NetWorker server
Example 1
The following provides performance configuration alternatives for a PSS enabled client with the following backup requirements
and constraints:
● Two savepoints: /sp200GB and /sp2000GB
● Save streams able to back up at 100 GB/hr
● Client parallelism is set to four (No more than four concurrent streams to avoid disk IO contention)
Based on these requirements and constraints, the following are specific configuration alternatives with the overall backup time
in hours:
● A non-PSS Client resource with both savepoints at one stream each: 20 hours
● A single PSS Client resource with both /sp200GB at two streams and /sp2000GB at two streams for the same save
group: 10 hours
Example 2
With client parallelism set to eight and three save points /sp1, /sp2, and /sp3 explicitly listed or expanded by the keyword
ALL for UNIX, the number of PSS streams for each savepoint backup is three, three, and two respectively. The number of
mminfo media database save set records is also three, three, and two respectively.
For a particular save point, /sp1, mminfo, and NMC save set query results shows three save set records each named /sp1,
<1>/sp1, and <2>/sp1. These related records have unique save times that are close to one another. The /sp1 record always
has the latest save time, that is, maximum save time, as it starts last. This makes time-based recovery aggregation for the entire
save point /sp1 work automatically.
Example 3
For a PSS Windows save point backup, the number of streams per save point is estimated in the following two scenarios:
● The client parallelism per save point, where client parallelism=5, and the number of save points=2, the number of PSS
streams is three for the first save point, and two streams for the second.
For the save set ALL, with two volumes and client parallelism=5, each volume (save point) gets two streams.
● Using client parallelism=4, every save point is given two save streams. Both DISASTER_RECOVERY:\ volumes, C:\, and
D:\ are given two streams also.
For the save set ALL, the DISASTER_RECOVERY:\ save set is considered to be a single save point. For this example, the
system has C:\, D:\, and E:\, where C:\, and D:\ are the critical volumes that make up the DISASTER_RECOVERY:\
save set.
The save operation controls how the save points are started, and the total number of streams never exceeds the client
parallelism value of 4.
This setting will use one stream per client save set entry by default, with the exception of two streams for each of /data1, /
data2 and /data3, and eight streams for each of /data4 and /data5. Client-supported wildcard characters can be used.
After setting the environment variable, restart the NetWorker services for the changes to take effect. Increasing the default
maximum value can improve the performance for clients with very fast disks.
On Windows, launch NMC and the NetWorker Administration window, and then go to View > Diagnostic Mode >
Protection > Clients > Client Properties > Apps & Modules > Save operations and set the following:
PSS:streams_per_ss=2,C:\, D:\, 8, E:\, F:\HR
This Windows PSS client setting will continue to use the default four streams for each save point not explicitly listed here, but
two streams each for the C:\ and D:\ drives, and eight streams each for the E:\ drive and F:\HR folder.
NOTE: Note PSS backups currently ignore the policy workflow action's parallelism, previously known as the savegrp
parallelism.
When you set the client parallelism to a value less than the number of save points, some save point backups run in PSS mode,
with only a single stream, and other save points will run in the default mode (non-PSS). Therefore, for consistent use of PSS,
maintain the default setting or set the client parallelism to the number of save points. This ensures multiple streams for each
save point.
NOTE: The PSS incremental backup of a save point with zero to few files changed since its prior backup will result in one or
more empty media database save sets (actual size of 4 bytes), which is to be expected.
PSS enabled, CP=6 with 3 client save points
In NetWorker releases previous to NetWorker 19.5, if you set CP=6 and have three client save points, PSS will start all save
points together at two streams each, and each save point will remain at two streams, with each stream actively backing up files
from the start.
NetWorker 19.5, however, would start save point one with four active backup streams, and simultaneously start save point
two with two active streams and two idle streams. If save point one finishes first, then save point two could end up with four
active streams, and save point three would then start with two active streams and two idle streams. Depending on the time it
takes the save point to complete, save point two could remain as it started and save point three may start similar to how save
point one started, with four active streams. An idle stream is one that has not yet started saving data and will only become
active when CP allows. The total number of active streams from all save sets at any one point in time will not exceed CP. It is
recommended that you specify a value of 4 or a multiple of four to avoid idle streams.
● To list only the primary save sets for all /sp1 full and incremental backups, type the following command:
(1) A small NetWorker server environment is considered to have less than 500 clients, or 256 concurrent backup sessions.
(2) A medium NetWorker server environment is considered to have more than 500, and up to 1000 clients or 512 concurrent
backup sessions.
(3) A large NetWorker server environment is considered to have more than 1000 clients, and up to 2000 clients or 1024
concurrent backup sessions.
IOPS considerations
The following are considerations and recommendations for IOPS values:
● The NetWorker software does not limit the number of clients per datazone, but a maximum of 1000 clients is recommended
due to the complexity of managing large datazones, and the increased hardware requirements on the NetWorker server.
NOTE: As the I/O load on the NetWorker server increases, so does the storage layer service time. If service times
exceed the required values there is a direct impact on NetWorker server performance and reliability. Information on the
requirements for maximum service times are available in NetWorker server and storage node disk write latency on page
22.
● The NetWorker server performs the data movement itself. If the backup device resides on the server rather than the
NetWorker storage node, the backup performance is directly impacted.
Examples 2 and 3 are based on the preceding requirements that are listed in Table 7.
File history processing creates a significant I/O load on the backup server, and increases IOPS requirements by 100-120 I/O
operations per second during processing. If minimum IOPS requirements are not met, file history processing can be significantly
slower.
Network
Several components impact network configuration performance:
● IP network:
A computer network made of devices that support the Internet Protocol to determine the source and destination of network
communication.
● Storage network:
The system on which physical storage, such as tape, disk, or file system resides.
● Network speed:
The speed at which data travels over the network.
● Network bandwidth:
The maximum throughput of a computer network.
● Network path:
The communication path used for data transfer in a network.
● Network concurrent load:
The point at which data is placed in a network to ultimately maximize bandwidth.
● Network latency:
The measure of the time delay for data traveling between source and target devices in a network.
Target device
Storage type and connectivity have the types of components that impact performance in target device configurations. They are
as follows:
● Storage type:
○ Raw disk versus Disk Appliance:
■ Raw disk: Hard disk access at a raw, binary level, beneath the file system level.
■ Disk Appliance: A system of servers, storage nodes, and software.
○ Physical tape versus virtual tape library (VTL):
■ VTL presents a storage component (usually hard disk storage) as tape libraries or tape drives for use as storage
medium with the NetWorker software.
■ Physical tape is a type of removable storage media, generally referred to as a volume or cartridge, that contains
magnetic tape as its medium.
● Connectivity:
○ Local, SAN-attached:
Console
Servers
Storage Nodes
Devices
NetWorker
Servers
NetWorker
Clients
Data Zones
Datazone
A datazone is a single NetWorker server and its client computers. Additional datazones can be added as backup requirements
increase.
NOTE: It is recommended to have no more than 1500 clients or 3000 client instances per NetWorker datazone. This
number reflects an average NetWorker server and is not a hard limit.
If the save sets are distributed across multiple volumes, then a delay in restore can be expected, which is proportional to
the number of volumes involved.
Evaluate the workload for any datazone with more than a 100K jobs per day, and consider moving some of jobs to other
datazones.
If the NetWorker server and NMC are migrated from a prior NetWorker 9.x installation to a NetWorker 9.1.x installation, then
it is recommended to distribute the multiple workflows (previously configured as savegroups), among multiple policies with
the above recommendations. For example, assuming a NetWorker 9.x datazone has 800 savegroups, if this gets migrated to
NetWorker 9.1x, then all the savegroups are converted into workflows under a single backup policy. It is recommended to
distribute these workflows among multiple policies and schedule them with interleaved time intervals. Adhere to the above
recommendations when you are running multiple policies and workflows simultaneously.
NetWorker 18.1 and later has optimized Java heap memory within the UI so that large scale environments (100K jobs per day),
can be handled with 6GB to 12 GB of Java heap memory.
Console database
Use formulas to estimate the size and space requirements for the Console database.
Formulas for estimating the space required for the Console database
information
There are existing formulas used to estimate the space needed for different types of data and to estimate the total space
required.
NetWorker server
NetWorker servers provide services to back up and recover data for the NetWorker client computers in a datazone. The
NetWorker server can also act as a storage node and control multiple remote storage nodes.
Index and media management operations are some of the primary processes of the NetWorker server:
NetWorker client
A NetWorker client computer is any computer whose data must be backed up. The NetWorker Console server, NetWorker
servers, and NetWorker storage nodes are also NetWorker clients.
NetWorker clients hold mission critical data and are resource intensive. Applications on NetWorker clients are the primary users
of CPU, network, and I/O resources. Only read operations performed on the client do not require additional processing.
Client speed is determined by all active instances of a specific client backup at a point in time.
NOTE: Compared to traditional (non-DFA) backups, backups utilizing DDBoost require 2-40% of additional CPU, but for a
much shorter period. Overall, the CPU load of a backup utilizing DDBoost is lower than a traditional backup.
NetWorker databases
Several factors determine the size of NetWorker databases.
These factors are available in NetWorker database bottlenecks on page 44.
Parallel restore
Starting with NetWorker 19.2, the recover workflow for file system backups has been enhanced to perform restores in parallel.
The new improved logic tries to split recover requests into multiple recover requests resulting in more than one recover thread
yielding better recover performance in comparison to earlier versions.
The following restore workflows are supported with Data Domain and AFTD devices:
● File level restore
● Save set restore
1 Gbps Network
Control/
Meta Data
NetWorker NetWorker
Client Storage Node
Control Path
Data Path
CL6419
As illustrated in the following figure, the network is upgraded from a 1 GigE network to a 10 GigE network, and the
bottleneck has moved to another device. The host is now unable to generate data fast enough to use the available network
bandwidth. System bottlenecks can be due to lack of CPU, memory, or other resources.
NetWorker
Server
DATA DOMAIN
10 GigE Network
Control/
Meta Data
NetWorker NetWorker
Client Storage Node
Control Path
Data Path
CL6420
DATA DOMAIN
GigE Network
Control/
Meta Data
NetWorker NetWorker
Client Storage Node
Control Path
Data Path
CL6421
As illustrated in the following figure, higher performance tape devices on a SAN remove them as the bottleneck. The
bottleneck device is now the storage devices. Although the local volumes are performing at optimal speeds, they are unable
to use the available system, network, and target device resources. To improve the storage performance, move the data
volumes to high performance external RAID arrays.
NetWorker
Server
DATA DOMAIN
GigE Network
Control/
Meta Data Data
(Client Direct)
SAN
NetWorker NetWorker
Client Storage Node
Control Path
Data Path
As illustrated in the following figure, the external RAID arrays have improved the system performance. The RAID arrays
perform nearly as well as the other components in the chain ensuring that performance expectations are met. There will
always be a bottleneck, however the impact of the bottleneck device is limited as all devices are performing at almost the
same level as the other devices in the chain.
DATA DOMAIN
GigE Network
Control/
Meta Data Data
(Client Direct)
SAN
NetWorker NetWorker
Client Storage Node
Control Path
Data Path
RAID Array Tape Device
NOTE: The index database can be split over multiple locations, and the location is determined on a per client basis.
The following figure illustrates the overall performance degradation when the disk performance on which NetWorker media
database resides is a bottleneck. The chart on the right illustrates net data write throughput (save set + index + bootstrap)
and the chart on the left is save set write throughput.
Server parallelism
The server parallelism attribute controls how many save streams the server accepts simultaneously. The more save streams the
server can accept, the faster the devices and client disks run. Client disks can run at their performance limit or the limits of the
connections between them. The default Server parallelism is 32 you can configure the parallelism up to 1024.
Server parallelism is not used to control the startup of backup jobs, but as a final limit of sessions accepted by a backup server.
The server parallelism value should be as high as possible while not overloading the backup server itself.
NOTE: If you schedule more than 50 concurrent clone workflows in a data zone, ensure that you configure the server
parallelism value to 1024 to avoid the starvation of streams reserved by clone operation.
46 Tune Settings
Action parallelism
Action parallelism defines the maximum number of simultaneous data streams that can occur on all clients in a group that is
associated with the workflow that contains action.
Data streams include back data streams, savefs processes, and probe jobs. For a Backup action, the default parallelism value is
100. For all other action types, the default value is 0, or unlimited.
Multiplexing
The Target Sessions attribute sets the target number of simultaneous save streams that write to a device. This value is not a
limit, therefore a device might receive more sessions than the Target Sessions attribute specifies. The more sessions specified
for Target Sessions, the more save sets that can be multiplexed (or interleaved) onto the same volume.
AFTD device target and max sessions provides additional information on device Target Sessions.
Performance tests and evaluation can determine whether multiplexing is appropriate for the system. Follow these guidelines
when evaluating the use of multiplexing:
● Find the maximum rate of each device. Use the bigasm test described in The bigasm directive.
● Find the backup rate of each disk on the client. Use the uasm test described in The uasm directive.
If the sum of the backup rates from all disks in a backup is greater than the maximum rate of the device, do not increase server
parallelism. If more save groups are multiplexed in this case, backup performance will not improve, and recovery performance
might slow down.
Disk optimization
NetWorker uses an intelligent algorithm when reading files from a client to choose an optimal block size value in the range of 64
KB and 8 MB based on the current read performance of the client system.
This block size selection occurs during the actual data transfer and does not add any overhead to the backup process, and
potentially significantly increases disk read performance.
Tune Settings 47
NOTE: Read block size is not related to device block size used for backup, which remains unchanged.
This feature is transparent to the rest of the backup process and does not require any additional configuration.
You can override the dynamic block size by setting the NSR_READ_SIZE environment variable to a desired value in the
NetWorker client. For example, NSR_READ_SIZE=65536 forces the NetWorker software to use 64 KB block size during the
read process.
Built-in compression
Turn on device compression to increase effective throughput to the device.
Some devices have a built-in hardware compression feature. Depending on how compressible the backup data is, this can
improve effective data throughput, from a ratio of 1.5:1 to 3:1.
Drive streaming
To obtain peak performance from most devices, stream the drive at its maximum sustained throughput.
Without drive streaming, the drive must stop to wait for its buffer to refill or to reposition the media before it can resume
writing. This can cause a delay in the cycle time of a drive, depending on the device.
NOTE: The maximum number of Data Domain devices recommended per storage node is 30.
48 Tune Settings
d. When Windows is finished analyzing the disk, check the percentage of fragmentation on the disk in the Last Run column.
If the number is above 10%, defragment the disk.
e. Click Defragment disk. If prompted for an administrator password or confirmation, type the password or provide
confirmation.
NOTE: The defragmentation might take from several minutes to a few hours to complete, depending on the size and
degree of fragmentation of the hard disk. You can still use the computer during the defragmentation process.
Network devices
Data that is backed up from remote clients, the routers, network cables, and network interface cards can affect the backup and
recovery operations.
This section lists the performance variables in network hardware, and suggests some basic tuning for networks. The following
items address specific network issues:
● Network I/O bandwidth:
The maximum data transfer rate across a network rarely approaches the specification of the manufacturer because of
network protocol overhead.
NOTE: The following statement concerning overall system sizing must be considered when addressing network
bandwidth.
Each attached tape drive (physical VTL or AFTD) uses available I/O bandwidth, and also consumes CPU as data still requires
processing.
● Network path:
Networking components such as routers, bridges, and hubs consume some overhead bandwidth, which degrades network
throughput performance.
● Network load:
○ Do not attach a large number of high-speed NICs directly to the NetWorker server, as each IP address use significant
amounts of CPU resources. For example, a mid-size system with four 1 GB NICs uses more than 50 percent of its
resources to process TCP data during a backup.
○ Other network traffic limits the bandwidth available to the NetWorker server and degrades backup performance. As the
network load reaches a saturation threshold, data packet collisions degrade performance even more.
○ The nsrmmdbd process uses a high CPU intensive operation when thousands of save sets are processed in a single
operation. Therefore, cloning operations with huge save sets and NetWorker maintenance activities should run outside of
the primary backup window.
NOTE: High bandwidth does not directly increase performance if latency is the cause of slow data.
Tune Settings 49
Table 12. The effect of blocksize on an LTO-4 tape drive (continued)
Blocksize Local backup performance Remote backup performance
512 KB 173 MB/second 130 MB/second
1024 KB 173 MB/second 130 MB/second
The following figure illustrates that the NetWorker backup throughput drops from 100 percent to 0 percent when the delay is
set from 0.001 ms to 2.0 ms.
Data Domain
Backup to Data Domain storage can be configured by using multiple technologies:
● NetWorker 8.1 and later supports DD Boost over Fibre Channel. This feature leverages the advantage of the boost protocol
in a SAN infrastructure. It provides the following benefits:
○ DD Boost over Fibre Channel (DFC) backup with Client Direct is 20–25% faster when compared to backup with DD VTL.
○ The next subsequent full backup is three times faster than the first full backup.
○ Recovery over DFC is 2.5 times faster than recovery using DD VTL.
● Backup to VTL:
○ NetWorker devices are configured as tape devices and data transfer occurs over Fibre Channel.
○ Information on VTL optimization is available in Number of virtual device drives versus physical device drives.
● Backup to AFTD over CIFS or NFS:
○ Overall network throughput depends on the CIFS and NFS performance which depends on network configuration.
Network optimization provides best practices on backup to AFTD over CIFS or NFS.
○ Inefficiencies in the underlying transport limits backup performance to 70-80% of the link speed.
● The Client Direct attribute to enable direct file access (DFA):
○ Client Direct to Data Domain (DD) using Boost provides much better performance than DFA-AFTD using CIFS/NFS.
○ Backup performance with client direct enabled (DFA-DD/DFA-AFTD) is 20–60% faster than traditional backup using
nsrmmd.
○ With an increasing number of streams to single device, DFA handles the backup streams much better than nsrmmd.
● The minimum required memory for a NetWorker Data Domain Boost device with each device total streams set to 10 is
approximately 250 MB. Each OST stream for BOOST takes an additional 25 MB of memory.
● Compared to traditional (non-DFA) backups, backups utilizing DDBoost require 2-40% of additional CPU, but for a much
shorter period. Overall, the CPU load of a backup utilizing DDBoost is lower than traditional backup.
50 Tune Settings
CloudBoost
The CloudBoost device leverages the CloudBoost appliance and creates the NetWorker device on the cloud object store that is
hosted on a CloudBoost appliance.
The following are CloudBoost benefits:
● Leverages the sending of a NetWorker client backup to the cloud for long term retention capabilities.
● Data can be sent directly to the Cloud from Linux x64 clients. For other client types, data is written to the cloud via a
CloudBoost Storage Node.
● Data can be restored directly from the Cloud for Linux x64 clients. For other client types, the restore is performed via a
CloudBoost Storage Node.
● NetWorker 18.1 allows Windows x64 clients to perform backup and recovery directly to and from the Cloud.
● The default target sessions for a CloudBoost device type are 10 for NetWorker 9.1, and 4 for NetWorker 9.0.1. The default
maximum sessions are 80 for NetWorker 9.1, and 60 for NetWorker 9.0.1. For better performance, it is recommended that
you keep the default values for Target and Maximum sessions.
● CloudBoost performs native deduplication, similar to the Data Domain device, the consecutive backups can be 2–3 time
faster, based on the rate of change in data.
The NetWorker with CloudBoost Integration Guide provides details on configuring the Cloud appliance and device.
Tune Settings 51
● Max nsrmmd count is an advanced setting that can be used to increase data throughput by restricting the number of
backup processes that the storage node can simultaneously run. When the target or max sessions are changed, the max
nsrmmd count is automatically adjusted according to the formula MS/TS plus four. The default values are 12 (FTD/AFTD)
and 4 (DD Boost devices).
NOTE: It is not recommended to modify both session attributes and max nsrmmd count simultaneously. If you must
modify all these values, adjust the sessions attributes first, apply the changes, and then update max nsrmmd count.
52 Tune Settings
Network optimization
Adjust the following components of the network to ensure optimal performance.
Advanced tuning
IRQ processing for high-speed NICs is very expensive, but can provide enhanced performance by selecting specific CPU cores.
Specific recommendations depend on the CPU architecture.
Tune Settings 53
Network latency
Increased network TCP latency has a negative impact on overall throughput, despite the amount of available link bandwidth.
Longer distances or more hops between network hosts can result in lower overall throughput.
Network latency has a high impact on the efficiency of bandwidth use.
For example, The following figures illustrate backup throughput on the same network link, with varying latency.
For these examples, non-optimized TCP settings were used.
Ethernet duplexing
Network links that perform in half-duplex mode cause decreased NetWorker traffic flow performance.
For example, a 100 MB half-duplex link results in backup performance of less than 1 MB per second.
The default configuration setting on most operating systems for duplexing is automatically negotiated as recommended by
IEEE802.3. However, automatic negotiation requires that the following conditions are met:
● Proper cabling
54 Tune Settings
● Compatible NIC adapter
● Compatible switch
Automatic negotiation can result in a link performing as half-duplex.
To avoid issues with auto negotiation, force full-duplex settings on the NIC. Forced full-duplex setting must be applied to both
sides of the link. Forced full-duplex on only one side of the link results in failed automatic negotiation on the other side of the
link.
Firewalls
The additional layer on the I/O path in a hardware firewall increases network latency, and reduces the overall bandwidth use.
Use software firewalls on the backup server as it processes many packets that result in significant overhead.
Details on firewall configuration and impact are available in the NetWorker Administration Guide.
Jumbo frames
Use jumbo frames in environments capable of handling them. If both the source, the computers, and all equipment in the data
path can handle jumbo frames, increase the MTU to 9 KB.
These examples are for Linux and Solaris operating systems:
● On Linux, type the following command to configure jumbo frames:
ifconfig eth0 mtu 9000 up
● On Solaris, type the following command to configure jumbo frames for a nxge device:
To determine the instance number of a following device, type the following command:
nxge /etc /path_to_inst
Congestion notification
Methods to disable congestion notification algorithms vary based on the operating system.
On Windows Server 2012, and 2012 R2:
● Disable optional congestion notification algorithms by typing the following command:
C:\> netsh interface tcp set global ecncapability=disabled
● Compound TCP is an advanced TCP algorithm that provides the best results on Windows via the TCP Global parameter
Add-On Congestion Control Provider. The value for this parameter is none if Compound TCP is disabled, or ctcp if
Compound TCP is enabled.
If both sides of the network conversion are not capable of the negotiation, you can disable Add-On Congestion Control
Provider by typing the following command:
C:\> netsh interface tcp set global congestionprovider=none
NOTE: A reboot of the system is required if you enable Add-On Congestion Control Provider by typing the command
C:\> netsh int tcp set global congestionprovider=ctcp.
On Linux systems:
● Check for non-standard algorithms by typing the following command:
cat /proc/sys/net/ipv4/tcp_available_congestion_control
● To disable ECN type the following command:
echo 0 >/proc/sys/net/ipv4/tcp_ecn
On Solaris systems:
Tune Settings 55
● To disable TCP Fusion, if present, type the following command:
set ip:do_tcp_fusion = 0x0
TCP buffers
When the rate of inbound TCP packets is higher than the system can process, the operating system drops some of the packets.
This scenario can lead to an undetermined NetWorker state and unreliable backup operations. For NetWorker server or storage
node systems that are equipped with high-speed interfaces, it is critical to monitor the system TCP statistics for dropped TCP
packets, commonly done by using the netstat -s command. To avoid dropped TCP packets, increase the TCP buffer size.
Depending on the operating system, this parameter is referred to as buffer size, queue size, hash size, backlog, or connection
depth.
For high-speed network interfaces, increase size of TCP send/receive buffers.
NetWorker server
● Linux:
To modify the TCP buffer settings on Linux:
1. Add the following parameters to the /etc/sysctl.conf file:
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 8192 524288 16777216
net.ipv4.tcp_wmem = 8192 524288 16777216
net.ipv4.tcp_fin_timeout: 120
2. Type the following command:
/sbin/sysctl -p
3. Set the recommended RPC value:
sunrpc.tcp_slot_table_entries = 64
4. Enable dynamic TCP window scaling which requires compatible equipment in the data path:
sysctl -w net.ipv4.tcp_window_scaling=1
● Solaris:
To modify the TCP buffer settings on Solaris, type the following command:
tcp_max_buf 10485760
tcp_cwnd_max 10485760
tcp_recv_hiwat 65536
tcp_xmit_hiwat 65536
● Windows:
The default Windows buffer sizes are sufficient. To modify the TCP buffer settings on Windows:
○ Set the registry entry:
AdditionalCriticalWorkerThreads: DWORD=10
○ If the NIC drivers can create multiple buffers or queues at the driver-level, enable it at the driver level. For example, Intel
10 GB NIC drivers by default have RSS Queues set to two, and the recommended value for best performance is 16.
○ Increase the recycle time of ports in TIME_WAIT as observed in netstat commands:
On Windows, set the following registry entries:
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpTimedWaitDelay
Data type REG_DWORD
Range 0x1E 0x12C ( 30–300 seconds )
Default value 0xF0 ( 240 seconds = 4 minutes )
56 Tune Settings
net.ipv4.tcp_fin_timeout: 120
● Solaris:
To modify the TCP buffer settings on Solaris, type the following command:
tcp_max_buf 10485760
tcp_cwnd_max 10485760
tcp_recv_hiwat 65536
tcp_xmit_hiwat 65536
● AIX:
To modify the TCP buffer settings on AIX, modify the values for the parameters in /etc/rc.net if the values are lower
than the recommended. Consider the following:
○ The number of bytes a system can buffer in the kernel on the receiving sockets queue:
no -o tcp_recvspace=524288
○ The number of bytes an application can buffer in the kernel before the application is blocked on a send call:
no -o tcp_sendspace=524288
● Windows:
The default Windows buffer sizes are sufficient. To modify the TCP buffer settings on Windows:
○ Set the registry entry:
AdditionalCriticalWorkerThreads: DWORD=10
○ If the NIC drivers can create multiple buffers or queues at the driver-level, enable it at the driver level. For example, Intel
10 GB NIC drivers by default have RSS Queues set to two, and the recommended value for best performance is 16.
○ Increase the recycle time of ports in TIME_WAIT as observed in netstat commands:
On Windows, set the following registry entries:
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpTimedWaitDelay
Data type REG_DWORD
Range 0x1E 0x12C ( 30–300 seconds )
Default value 0xF0 ( 240 seconds = 4 minutes )
The net.core.somaxconn value default is 128. Raise the value substantially to support bursts of requests. For example, to
support a burst of 1024 requests, set net.core.somaxconn to 1024:
net.core.somaxconn = 1024
# vi /etc/sysctl.conf
2. Add value:
variable=value
3. Save the changes and load sysctl settings from the /etc/sysctl.conf file:
# sysctl -p
Tune Settings 57
IRQ balancing and CPU affinity
A high-speed network interface that uses either multiple 1 GB interfaces or one 10 GB interface benefits from disabled IRQ
balancing and binding to specific CPU core processing.
NOTE: The general rule is that only one core per physical CPU should handle NIC interrupts. Use multiple cores per CPU
only if there are more NICs than CPUs. Handle transmitting and receiving with the same CPU without exception.
These examples are for Linux and Solaris operating systems:
● Linux:
1. Disable IRQ balancing and set CPU affinity manually:
service irqbalance stop
chkconfig irqbalance off
2. Tune the CPU affinity for the eth0 interface:
grep eth0 /proc/interrupts
3. Tune the affinity for the highest to the lowest. For example:
echo 80 > /proc/irq/177/smp_affinity
echo 40 > /proc/irq/166/smp_affinity
SMP affinity works only for IO-APIC enabled device drivers. Check for the IO-APIC capability of a device by using
cat /proc/interrupts, or by referencing the device documentation.
● Solaris:
Interrupt only one core per CPU. For example, for a system with 4 CPUs and four cores per CPU, use this command:
psradm -i 1-3 5-7 9-11 13-15
Some NIC drivers artificially limit interrupt rates to reduce peak CPU use which limits the maximum achievable throughput. If
a NIC driver is set for “Interrupt moderation,” disable it for optimal network throughput.
Interrupt moderation
On Windows, for a 10 GB network, it is recommended to disable interrupt moderation for the network adapter to improve
network performance.
58 Tune Settings
Connection reset by peer
NOTE: TCP chimney offloading can cause CRC mismatches. Ensure that you consistently monitor for errors when you
enable TCP chimney offloading.
Name resolution
The NetWorker server relies heavily on the name resolution capabilities of the operating system.
For a DNS server, set low-latency access to the DNS server to avoid performance issues by configuring, either of the following:
● Local DNS cache
● Local non-authoritative DNS server with zone transfers from the main DNS server
Ensure that the server name and the hostnames that are assigned to each IP address on the system are defined in the hosts file
to avoid DNS lookups for local hostname checks.
However, in a large NetWorker environment, it might be required to temporarily retire or decommission a client. During the
retired phase, you may want the client to be part of the infrastructure but removed from active protection, scheduled, or
manual backup.
The retired client can still have valid backup copies which you may want to restore and clone. Whereas in the decommissioned
phase, you might not want to perform any further backup, restore, and clone operations with the client. To retire or
decommission a client, you remove the client from the DNS entries. This results in DNS lookup failures and therefore slower
NetWorker startup time.
To simplify this process, NetWorker 19.4 and later provides you with an option to set the state of the client using an attribute in
the RAP resource. Based on the client state, NetWorker makes an appropriate decision whether to perform DNS lookup or not.
From NetWorker 19.4 and later, DNS lookups for clients in the retired and the decommissioned state are avoided. This has led to
a reduced NetWorker startup time by up to two or three times compared to previous releases when 40 percent of the clients in
the datazone are marked as retired or decommissioned.
As a performance best practice and to take advantage of the new feature, after upgrading to NetWorker 19.4 or later, it is
recommended that you mark the clients which are not going to be used for backup as retired. For more information about the
feature, see Decommission a Client resource in the NetWorker Administration Guide.
Tune Settings 59
4
Test Performance
This chapter describes how to test and understand bottlenecks by using available tools including NetWorker programs such as
bigasm and uasm.
Topics:
• Determine symptoms
• Monitor performance
• Determining bottlenecks by using a generic FTP test
• Testing setup performance using the dd test
• Test disk performance by using bigasm and uasm
• TCP window size and network latency considerations
• Clone performance
• Limit memory usage on the host during clone operations
Determine symptoms
There are many considerations for determining the reason for poor backup performance.
Ask the following questions to determine the cause for poor performance:
● Is the performance consistent for the entire duration of the backup?
● Do the backups perform better when started at a different time?
● Is it consistent across all save sets for the clients?
● Is it consistent across all clients with similar system configuration using a specific storage node?
● Is it consistent across all clients with similar system configuration in the same subnet?
● Is it consistent across all clients with similar system configuration and applications?
Observe how the client performs with different parameters. Inconsistent backup speed can indicate problems with software or
firmware.
For each NetWorker client, answer these questions:
● Is the performance consistent for the entire duration of the backup?
● Is there a change in performance if the backup is started at a different time?
● Is it consistent across all clients using specific storage node?
● Is it consistent across all save sets for the client?
● Is it consistent across all clients in the same subnet?
● Is it consistent across all clients with similar operating systems, service packs, applications?
● Does the backup performance improve during the save or does it decrease?
These and similar questions can help to identify the specific performance issues.
Monitor performance
You can monitor the I/O, disk, CPU, and network performance by using native performance monitoring tools.
The monitoring tools available for performance monitoring include the following:
● Windows: perfmon program
● UNIX: iostat, vmstat, or netstat commands
Unusual activity before, during, and after backups can determine that devices are using excessive resources. By using these
tools to observe performance over a period, resources that are consumed by each application, including NetWorker, are clearly
identified. If it is discovered that slow backups are due to excessive network use by other applications, this can be corrected by
changing backup schedules.
60 Test Performance
High CPU use is often the result of waiting for external I/O, not insufficient CPU power. This is indicated by high CPU use inside
SYSTEM versus user space.
On Windows, if much time is spent on Deferred Procedure Calls, it often indicates a problem with device drivers.
The uasm directive tests disk read speeds, and by writing data to a null device can identify disk-based bottlenecks.
Test Performance 61
TCP window size and network latency considerations
Increased network TCP latency has a negative impact on overall throughput despite the amount of available link bandwidth.
Longer distances or more hops between network hosts can result in lower overall throughput. Since the propagation delay of
the TCP packet depends on the distance between the two locations, increased bandwidth will not help if high latency exists
between the two sites.
Throughput also depends on the TCP window size and the amount of latency between links. A high TCP window size generally
results in better performance, however, with a high latency link, increasing the TCP window may significantly impact the backup
window due to packet loss. Every unsuccessful packet that is sent must be kept in memory and must be re-transmitted in
case of packet loss. Therefore, for TCP windows with a high latency link, it is recommended that you maintain the default TCP
window size.
The network latency impact on NetWorker backup, clone, and recovery performance depends on the control path and data
path:
● Latency between clients and NetWorker server (control path)—The latency impact on the NetWorker control path
(metadata update) can vary based on the type of data you process during NetWorker backup and recovery operations.
For example, if NetWorker clients and the server are separated by a high latency link, and clients back up a high density file
system dataset, the large amount of metadata (file indexes) being sent over the wire impacts the index commit.
● Latency between client and target device (DD) (data path)—Latency between the NetWorker client and the target device
significantly impacts throughput. Any packet loss will further impact throughput. The high latency link in the data path
affects throughput irrespective of the type of data being processed.
The following section provides best practices and recommendations when using high latency networks such as WAN for
NetWorker application data and control paths for backup, clone, and recovery operations.
These examples show the results from using high density datasets (many files but with a low overall size) and large density
datasets (a small number of files but with a large overall size) during backup, clone, and restore workflows.
The data layout is as follows:
● High density file system (FS): 1 million files with approximately 4000 MB overall size
● Large density file system: <1000 files with approximately 390 GB overall size
NOTE: These tests were conducted using the WANem simulation tool by inducing latency and packet loss between the
NetWorker control and data path. Allow for a 10–20% error margin in the results due to the simulation technique.
Figure 15. Large Density FS Backup - WAN between NetWorker Clients and Server (High Packet Loss)
NOTE: The items that are marked in RED are reattempted backups with client retries >0 and <=5
62 Test Performance
Figure 16. Large Density FS Backup - WAN between NetWorker Clients and Server (Low Packet Loss)
Figure 17. High Density FS Backup Performance - WAN between NetWorker Clients and Server
Test Performance 63
WAN Latency impact in data path (NetWorker clients and target device
such as Data Domain separated by high latency) for backup
Figure 18. Large Density FS Backup Performance - WAN between NetWorker Clients and Data Domain
Figure 19. High Density FS Backup Performance - WAN between NetWorker Clients and Data Domain
64 Test Performance
WAN latency impact in data path (Latency between source and target
DDR) for cloning
Figure 20. Large Density FS Clone Performance - WAN between NetWorker Clients and Data Domain
Figure 21. High Density FS Clone Performance - WAN between NetWorker Clients and Data Domain
NOTE: Clone-controlled replication (CCR) performance completely depends on the Data Domain model, the existing load on
DDRs and the latency between two different Data Domain systems that are separated by WAN. The preceding results show
the WAN latency impact on a Large Density File System and High Density File System.
Observations and recommendations:
● If there is high latency link between the source and target DDR, then there is significant impact in clone throughput.
● Every 10 ms increase in latency reduces the clone throughput by 4-45 times.
● The packet loss in the WAN link further reduces the clone throughput by 4-300 times for a large density dataset and by
4-500 times for high density datasets.
● It is not recommended that you exceed the 50 ms latency for large size datasets and 20 ms latency for high density dataset
cloning.
Test Performance 65
WAN latency impact in data path (Latency between source and target
DDR) for recovery
Figure 22. Large Density FS Recovery Performance - WAN between NetWorker Clients and Data Domain
Figure 23. High Density FS Recovery Performance - WAN between NetWorker Clients and Data Domain
NOTE: For large density file system and high density file system dataset restore, time indicates the time that is taken to
perform 10 simultaneous restores.
Observations and recommendations:
● Latency impacts recovery performance similar to the backup and clone workflows.
● If a high latency link exists between the NetWorker client and DDR during recovery, then the performance slows down
drastically.
● Every 10 ms increase in latency reduces the recover throughput by 1-2 times for a high density dataset with multiple client
restore. For a large density dataset with multiple client restore, throughput decreases by 2-10 times with increase in latency.
● The packet loss in the WAN link further reduces the restore throughput by 2-12 times.
● It is not recommended that you exceed the 50 ms latency (with multiple restores) for a high dataset and 20 ms latency (with
multiple restores) for a large dataset during recovery.
Summary
Table 13. Tolerable Range for Low Density file system
WAN path Latency Packet loss
Client - NetWorker server 0-100ms 0-1%
66 Test Performance
Table 13. Tolerable Range for Low Density file system (continued)
WAN path Latency Packet loss
Client - Data Domain (DFA) 0-50ms 0-0.1%
NOTE: Higher latency and packet loss between the data path impacts throughput significantly. You can still use the
high latency link for data path but the NetWorker server might re-attempt the failed backups due to packet loss. It is
recommended that you apply the preceding recommendations to avoid the failures with high latency WAN links.
Clone performance
For small sized save set (KB files), a Recover Pipe to Save (RPS) clone takes 30 seconds more than a non-RPS clone. When a
dataset size is more than 2 GB, an RPS clone performs better than a non-RPS clone.
Test Performance 67