Subscribe by Email

Your email:

Blog

Current Articles | RSS Feed RSS Feed

Enterprise Laptop Backup: Deduplication - It’s the chunking, Stupid!

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

This post is part of sub-series on deduplication requirements in an overall Series on planning for corporate PC backup in your organization.  In my last post, I evaluated the best place for performing deduplication for laptops and desktops. 

This post looks at the various approaches available to dividing your data into smaller chunks in order to analyze them for duplication.  As mentioned in a previous post, this is the first step in every deduplication process and is perhaps the most important one because the better the approach to divide data into chunks, which are likely to occur multiple times, the better the dedup efficiency.  Broadly speaking, there are 4 approaches to how data can be divided into multiple chunks:

Comparison of deduplication methods

  1. File-level: This the most basic form of deduplication, which can identify identical files and store them only once. Also known as Single Instance Storage, this is also perhaps the easiest approach to implement for a vendor.  The downside is that if you change the file by even a single byte, the entire file needs to be stored again.  This happens more often that one may think.  For example, let’s say you create Word document or a PowerPoint presentation and email it to a colleague, who doesn’t make any changes to the presentation.  You’d expect that the two files would be identical, right?  You’ll be surprised to know that more often than not, the files, while visually identical, will be ever so slightly different.  This is because every time a document is opened, applications store metadata about the last user, last open time etc, which changes the files.
  2. Delta-block: While there is debate about whether delta block approaches falls under deduplication, they merit a mention.  While there are several variations, essentially, delta block technologies have the ability to identify changes to an already backed up document and backup only those changes.  The key is that the data has to have already been backed up under the same name to provide file ancestor information to the delta block processing.  Therefore, if you change a file and save it with a different name, the entire file will be backed up again.  While it is better than purely file-level deduplication, it is still pretty basic and is only useful for scenarios where you have large files that keep changing, but preserve their name.
  3. Block level: Block level deduplication breaks the file into fixed sized blocks and only backs up unique blocks using the process described here.  While better than file-level and delta-block technologies, this approach is best suited for database type stores whose physical block layout doesn’t change.  However, for document type data – most prevalent on PCs – where a simple save can completely alter the layout of the document, block level dedup isn’t very effective.  This is because it has two limitations: identifying common data for the first backup and identifying common data when the physical layout changes.  I’ll cover these in my next post.
  4. Object-based: Object-based data deduplication is the current state of the art and is the most effective solution for detecting duplicate data.  It can detect common embedded data for the first backup across completely unrelated files and even when physical block layout changes.  Unlike block based technologies, object-based dedup is “content aware” and chunks the file into well known logical objects like slides, images, paragraphs, worksheets, attachments etc.  The advantage is that even if the physical layout of a file changes – which can happen with a simple save operation – the logical objects can still be detected and stored only once.  As a result, object-based dedup provides the best efficiency for PC data with as much as 5-10x better performance vs. block based deduplication.

The graphic above displays the relative efficacy of the 4 different methods outlined above.  While there is a lot that goes into determining the actual dedup efficiency, the graphic should be viewed as a good indicator of the relative efficiency of the different methods.

What do you think?  In my next post, I'll cover the limitations of block level deduplication and why Object-based deduplication is the better choice.

Corporate PC backup: Whither Deduplication?

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

This post is part of sub-series on Deduplication requirements in an overall Series on planning for corporate PC backup in your organization.  In my last post, I explained the anatomy of a dedup system.  This post looks at the various places where Deduplication can be performed and the suitability of those approaches for corporate PC backup.

The old adage location, location, location applies to Deduplication also.  Where the 3 steps of Deduplication - chunk, compute hash and lookup then store (see the post anatomy of a dedup system) - are performed is crucial because that determines WAN efficiency, which is critical if you have a large number of laptops or remote users, who are likely to connect over WAN links.  Broadly, the following approaches are available:

  • Target based: In a Target based Deduplication approach, all 3 steps of Deduplication are performed on a storage device that is a storage target for an application.  The entire data is sent over the network to a storage device which then identifies duplicate data and stores only unique data.  This approach is the least network efficient as it requires that all data be sent over potentially slow WAN links for every backup!  This approach should be ruled out for PCs.
  • Purely source based: In only source-based Deduplication, all 3 steps of dedup are performed on the source system itself without communication with the central server.  The source system, for example a desktop or a laptop, identifies duplicate data on that system and only sends data that is unique on that PC to the server.  However, if there are other systems with similar common data, they will transmit and store the duplicate data again on the server.  This approach can work for a small number of PCs, but should be avoided as it results in too much duplicate data to be stored and transmitted.
  • Global: Global Deduplication combines the best of source and target based Deduplication.  In this approach, Deduplication responsibility is shared between the source system and the server.  The source system performs the first 2 steps of Deduplication, i.e. chunking and hash computation on the source system, but the 3rd step of lookup is performed on the central server.  As a result, the source machine – in conjunction with the server – identifies duplicate data across the entire organization and as a result only data that is truly unique is transmitted and stored.  This approach is the most WAN and storage efficient and is also the most scalable approach. 

Global Deduplication: Save Bandwidth and Storage

Figure: Global Deduplication - Save Storage and Bandwidth

 

 

Corporate PC backup: Anatomy of a Data Deduplication System

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

This post is part of sub-series on de-duplication requirements in an overall Series on planning for corporate PC backup in your organization.  This post looks at the various components that make up a Deduplication system.

Deduplication is the process of finding duplicate chunks of data and then acting on them only once.  For example, if you can identify duplicate data that is stored on your fileserver and store it only once, you could free up a lot of storage – which translates to immediate savings.  Across all the data in a company, there is a lot of duplication.  For example, identical files, or PowerPoint presentations using the same slides or PST files with the same message attachment.   The ability to identify this common data and to transmit and store only unique data is crucial for cost-effectiveness.  The industry jargon for this capability is data de-duplication or dedup, in short.

Anatomy of a deduplication system

Broadly speaking a de-duplication system consists of the following 3 components:

  • Chunking original data: The first step is to divide original data into chunks which in turn will be analyzed for duplicate occurrence in the system.  Typically, this involves dividing the original data into smaller chunks whether sub-blocks or sub-objects.  This is perhaps the most important step in the entire de-duplication process because the better the approach to divide data into smaller chunks which are likely to be found in duplicate, the better the dedup efficiency.  Broadly speaking, there are 4 approaches to how duplicate data is chunked:
    1. File based
    2. Delta block based
    3. Block based
    4. Object-based

All of the approaches above yield a set of data that can then be analyzed for duplication within the data repository.  I’ll analyze each of these approaches as part of this series.

  • Computing a unique identifier for the chunks created in step 1: once candidate chunks for duplication data analysis have been identified, we need an efficient way to detect whether it already exists in our data repository.  We can compare entire objects byte by byte, but it would be computationally expensive and wouldn’t scale.  Hence, the most common approach is to create a hash or checksum of the data and then lookup that checksum in the data repository.  The checksums are usually much smaller (3-4 order of magnitude smaller) than the original data – making it much faster to lookup whether a set of data already exists even with terabytes of data.
  • Lookup in data repository: The data repository consists of two components:
    1. Unique data repository: this is where all unique data objects are stored
    2. Metadata repository: a catalog of unique hashes in the repository corresponding to all the unique data objects stored in the system, optimized for quick lookup. 

Once the hash or checksum has been created, it needs to be looked up against the catalog of unique hashes in the unique data repository.  If the checksum already exists in the repository, then there is no need to store it again.  If, the checksum doesn’t exist, the data object corresponding to it will be added to the repository and the checksum/hash itself will be added to the metadata repository.

Above is a simple view of a de-duplication system.  Another important consideration other than the chunking algorithm and the scalability of the data repository is where the chunking and lookups are performed.  That’s the subject of the next blog.

Corporate PC backup: Data Deduplication Requirements

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

This post is part of a Series on planning for corporate PC backup in your organization.  Previously, I’ve focused on backup requirements, but in this series of posts on data deduplication I’ll explore the various deduplication technologies that are available and which ones are best suited for desktop laptop backup.  I’ll be exploring the topics listed below.  I’ll explore why global deduplication is the only type of dedup that makes sense for the desktop and laptop backup.  I’ll also explain why Object-based Deduplication provides the best efficiency and how it overcomes the shortcomings of block level dedup.

  • Anatomy of a dedup system: this post provides an introduction to the de-dup process, the various components and how they work together to identify duplicate data in your system.
  • Where to deduplicate: The old adage: Location, location, location applies to dedup also.  This post looks at the various places in the system where de-duplication is performed and analyzes their suitability to managing laptop and desktop data.
  • Identifying duplicate data: This post evaluates 4 approaches to chunking data for purposes of de-duplication: file based, delta-block based, block level and object based, and their relative merits.
  • Block level Deduplication challenges: This post identifies two major shortcomings of block level de-duplication, namely 1) finding duplicate data for the first pass or first backup, and 2) finding duplicate data when the physical layout of a file changes e.g. when the slides in a PowerPoint are re-ordered.
  • Variable length deduplication: This post examines what is commonly called “Variable Length Deduplication”, which is a slight variation of fixed length block Deduplication and the situations where it is useful?
  • Object based data de-duplication: This post introduces the concept of Object level data de-duplication and explains why it is superior to block level de-duplication.

Enterprise Laptop Backup and Recovery – Forever Incremental

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

This post is part of a Series on planning for Enterprise desktop laptop backup in your organization.  In my last post, I discussed why versioning capability is important when backing up PC data.  In this post, I’ll discuss the need for forever incremental backup capability when backing up enterprise laptops and desktops.

Before, we get into why forever incremental backups are important for backing up desktops and laptops, let’s review the various backup types and their uses.  This article on SearchDataBackup does a good job of describing the various backup types: full, incremental, differential, synthetic full and forever incremental.  The reason there are multiple backup types is because different types of data have different characteristics in terms of data size, change rate and recovery requirements and one size doesn’t fit all.  For example, if you are backing up an Exchange server, the recommended method is to do a weekly full backup and then do incremental or differential backups during the week.   This allows for efficiency – i.e. not doing a full backup every day – yet provides recovery granularity because the transaction logs can be played back to get to pretty much any point in time.  However, when it comes to backing up PCs, forever incremental backup with synthetic full capability is the best choice because of the following reasons:

  • Bandwidth and backup window constraints: A full backup is perhaps the most intrusive task on any system.  It requires that all the data on the system be read on the source system and then transmitted to and processed on the backup server.  It requires a large amount of bandwidth and CPU horsepower.  This is why, for servers, full backups often happen over the weekends, when the workload is low and the corporate network can spare the bandwidth.  However, unlike for servers, doing a regular full backup of the PCs is not feasible because of the following reasons:
    • PC may not be switched on during the weekend: Typically, PCs are not switched on during the weekend, or even if they are turned on, many of them, especially laptops, are not on the corporate network – thus making the full backup over the weekend impractical.
    • Sheer number of the PCs in a company: In any corporate setting, PCs outnumber any other type of compute resource, usually by an order of magnitude. Imagine, 5,000 PCs doing a full backup every Friday afternoon (since weekend is not an option!): not only clogging the network, but also requiring that the server be able to ingest a large amount of data from a large number of end points.
    • WAN bandwidth constraints: PCs, especially laptops tend to be quite distributed and often connect over the WAN.  Due to the amount of data that needs to be moved in a full backup, it is impractical to perform regular full backups for laptop users who tend to be mobile.
    • Impact to the PC: Given the intrusive nature of the full backup, the impact to the end user’s productivity is quite severe and is another reason why periodic full backups on the PC should be avoided.

It is clear from the constraints listed above that periodic full and incremental backups are not practical for the enterprise laptop and desktop backup scenario.  The only reliable and efficient way of backing up your enterprise laptop and desktop population is to do forever incremental backups.  Now, let’s look at the recovery side of the equation to see how to recover these forever incremental backups.

  • Self-service recovery: Backups are only useful if they can be used for recovery.  In case of PC data, a majority of the data recovery requests are for single files.  Enabling your end users to perform self-service recovery of their data is not only useful for the end users, but is also a tremendous cost saver for IT as they don’t have to field helpdesk calls for data recovery.  However, end user recovery requires that the recovery process be simple and 1 click.  It is impractical to expect that the end users can perform multi-step recoveries like recover a full backup first and then apply multiple incremental backups.  It is for this reason that it is imperative that for enterprise desktop laptop backup, forever incremental backups be combined with synthetic full backups so the end user always sees the entire data set available for recovery in one place and can recover any file they desire with a single click.

Enterprise PC Backup & Recovery – The need for versions!

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

This post is part of a Series on planning for Enterprise desktop laptop backup in your organization.  Whether you are considering software or online options for your enterprise PC backup solution there are several items that need to be considered and this series takes a look at those items.  In my last post, I discussed who is the best initiator for the backup: the desktop laptop being backed up or the backup server, highlighting issues you're likely to face if the backup server tries to initiate the backup to a desktop laptop.  In this post, I'll discuss the versioning capabilities you should look for in a backup product.

Versioning capability is what separates backup from archiving and replication and requires some thought.  When evaluating a desktop or laptop backup software or solution, at a minimum pay attention to the following:

  • Ensure that versioning is available: It is important for an enterprise backup software or solution to provide versioning, i.e. the ability to go back to an older version of your data, hopefully multiple older versions. Backup has two purposes in life: disaster recovery and to recover one or more files after they have been corrupted. On a PC, whether a laptop or a desktop, both scenarios are possible: your laptop could be lost or the hard drive on your desktop could crash: both are examples of a disaster, not frequent, but likely. Corruption on the other hand is actually quite common, examples are: if your PC gets infected by a virus or while editing a document you accidentally overwrite a good version, or you delete a file on your laptop or desktop thinking you won't need it again. A backup software or solution - because of its versioning capabilities - protects you against both problems. Replication on the other hand - because it lacks versioning - only protects you against disaster recovery. So, if a virus infects your PC and you are using replication - good luck getting your data back because the infected files would likely have been replicated also, so your only copy is also corrupted. A backup system on the other hand because of its versioning capabilities would allow you to recover an older version of your data.
  • Different retention policies for different data type and different users: In an enterprise, there are lots of categories of data and users. One size retention policies don't work for an enterprise laptop or desktop backup solution. Look for a solution that lets you define different data retention policies for older versions based on users and data types. For example, for your executive users' PCs, you may need to retain data for a longer time than for the rank and file employees' desktops or laptops. Also, you may want to retain older versions of user generated data on a PC for a longer duration than application generated data.
  • Ability to place a legal hold: In an enterprise, there are times when you need to place a legal hold on the data stored on an end user's desktop or laptop. This may require that you suspend deletion of older versions of your PC stored in the enterprise desktop or laptop backup system.
  • Access to different versions: It's great to have older versions available, but they are not of much use if they are not easily accessible. I'll cover this in detail during the recovery requirements, but look for a solution that provides easy access and recovery of older versions of desktop laptop backup data for both end users and administrators.

Enterprise Laptop Desktop Backup – Backup Initiator!

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

This post is part of a Series on planning for Enterprise desktop laptop backup in your organization.  Whether you are considering software or online options for your enterprise PC backup solution there are several items that need to be considered and this series takes a look at those items.  In my last post, I explored the need for restartability, i.e. the ability for a desktop laptop backup application to automatically handle environmental errors as they happen and resume the operation exactly where it was interrupted after the error condition has gone away.  In this post, I'll discuss who should initiate the desktop laptop backup: the backup server or the PC itself.

Who initiates the desktop laptop backup: the PC or the Backup Server, is a key consideration.  Many backup products with roots in server backup have the server initiate the backup, but that doesn't really work for the PC because of the following reasons:

  • Unreliability: unlike a server, which is always running, the desktop or laptop may be switched off at the time of the backup - causing the backup to fail. This is a problem on two fronts: first the PC is not backed up; second the backup server will report a failure on its management console or report. The second problem is an issue because in an enterprise with a large desktop laptop population the administrator may have to deal with hundreds of such failures on a daily basis.
  • Won't work for VPN connections: if the PC is connecting over VPN, it may not even be reachable from the server because the DNS doesn't usually reflect the VPN connection address of the PC - making the PC unreachable from the server. This is actually a really severe problem because your remote users who are perhaps the most vulnerable to data loss are primarily connecting over the VPN and they would be exposed for a long period of time.
  • Security Risk: When the server initiates the backup, it requires that there be an incoming TCP/UDP port open on the PC to allow the server to connect. This is a major security hole in this day and age of mobile and remote users who are constantly using their laptop at various public WiFi spots: airports, cafés and the like. Security at these public WiFi spots is already suspect, but having an open incoming communication port via a hole in the PC firewall is simply inviting trouble.
  • Poor Recovery Point Objective (RPO): Server initiated backup of the enterprise laptop desktop population can have a poor RPO because of two reasons: first the PC may be switched off or unreachable from the server for long periods of time (only connected over VPN), and secondly the PC may have a large amount of data to be backed up but until the server successfully initiates the backup the data won't be backed up - increasing the risk of data loss.
The best enterprise class solutions have the desktop or laptop initiate the backup connection to the server.  This ensures that whenever all PC conditions are favorable to perform backup (i.e. not in use, connectivity is available etc) a backup can be performed.  Of course this means that the backup server needs to be enterprise class, i.e. should be able to handle a large number of concurrent incoming connections, so make sure that you're aware of the peak capacity for the backup server you're looking at and the behavior when the load exceeds peak capacity.

Enterprise Laptop Desktop Backup – Restartability!

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

This post is part of a Series on planning for Enterprise desktop laptop backup in your organization.  Whether you are considering software or online options for your enterprise PC backup solution there are several items that need to be considered and this series takes a look at those items.  In my last post, I explored the crucial issue that plagues almost every solution: aggressive backup agents that adversely impact the user and what to look for to avoid that.  In this post, I'll explore the key requirement of restartability.

A PC, especially the laptop is perhaps the most volatile environment in the entire IT infrastructure.  Unlike a server, a desktop laptop can power down or lose connectivity at any time, or the end user can kill any Software agent that may be running on the PC - c'mon admit it - we've all done that!  Any of this can happen during backup on a single PC.  Now multiply it by hundreds or thousands and imagine it happening every day!  It becomes pretty clear that it is extremely important for an enterprise desktop laptop backup solution to have restartability, i.e. the ability to:

  • Gracefully handle severe failures: For a PC backup solution any of the above failures are expected, not out of the norm and hence should be handled gracefully and automatically without requiring administrator or end user action. I liken it to continuing your tap dance without missing a beat even though the rug has been pulled out from under your feet!
  • Automatically restart: If a backup process is interrupted, it should automatically restart when the conditions permit for that to happen again. For example, if the backup was interrupted because the network connectivity was lost, the backup process should be watching for the connectivity to come back so it can automatically resume backup when the connectivity is back and it is appropriate to perform the backup (i.e. the user is not using the PC).
  • Resume at the point of the last stoppage: If you were backing up a 100MB file and the backup was interrupted when you had already backed up 90MB, the backup process should resume at the point of stoppage by only backing up and sending the remaining 10MB of data. This is especially important when the backup is happening over a WAN.

The ability to restart a backup exactly where it stopped is extremely crucial for an enterprise class desktop laptop backup solution, because on a PC the environmental errors are so much more frequent than a server.  Many a times there is a temptation to use the same product that does backup of your servers for backing up your enterprise PC population also.  Just, make sure that any solution you look at can automatically handle the multitude of environmental errors you'll have to deal with on a daily basis for your enterprise laptop desktop population.

Enterprise Laptop Desktop Backup – Don’t Impact the End User!

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

This post is part of a Series on planning for Enterprise desktop laptop backup in your organization.  Whether you are considering software or online options for your enterprise PC backup solution there are several items that need to be considered and this series takes a look at those items.  In my last post, I explored backup frequency options that you should look at for your enterprise laptop and desktop population.  In this post, I'll share my thoughts on a crucial issue that plagues almost every solution: aggressive backup agents that adversely impact the user.

One of the biggest concerns about PC backup solutions has been the impact to end users.  Stories abound about what is considered to be the only widely deployed solution, which is notorious for taking over your PC.  Imagine that you've just started your laptop or desktop - you're desperate to check your email, but the backup agent has different ideas - it must do a backup! Productivity can go take a hike!  End users often end up killing or disabling such agents on their PC - defeating the purpose of the backup. 
Some PC backup solutions offer CPU throttling as a means to prove that they are not disruptive to the end user during the backup.  The premise is that they are monitoring the CPU usage and won't start the backup if the CPU usage is above a certain percentage.  However, unlike for servers, CPU usage is not an effective indicator of whether the PC is in use.  Think about it, when you are editing a document or checking your email - what percent of your PC is in use? May be it is 5-10%.  At that usage, these backup agents have a free pass to start pounding your disk for backup - exactly when you don't want them to.  For desktop laptop backup, the CPU throttling rule is pretty much useless in preventing end user disruption.
Look for a solution that can detect that the PC is in use by the end user and only does backup when the desktop or laptop is not in use.  Make sure that the rule applies to not only start of backup, but to the entire backup processing.  For example, the backup application should not only NOT start up if the PC is in use, but also it should immediately pause if the user starts using the PC. 

There are two sides to enterprise desktop and laptop backup: IT and end users, both are important and have needs that need to be met.  Don't forget the end user: as not impacting the end users is key - otherwise they will try to find a way to work around the solution you deploy - defeating the whole purpose of the backup.

Enterprise Desktop Laptop Backup – Backup Frequency

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

This post is part of a Series on planning for Enterprise desktop laptop backup in your organization.  Whether you are considering software or online options for your enterprise PC backup solution there are several items that need to be considered and this series takes a look at those items.  In my last post, I explored whether scheduled backup can really work for your enterprise laptop and desktop population.  In this post, I'll share my thoughts on backup frequency i.e. how frequently the data on your enterprise laptop or desktop population should be backed up.

Choosing the appropriate backup frequency options for your laptop desktop data is quite important, because if done wrong, it can result in: too much data being backed up, end user impact and poor RPO (Recovery Point Objective).  Also, different classes of data need different backup frequencies based on the importance and change rate.  Broadly speaking, the following options exist:

  • Backup every x interval: This option essentially means that the laptop desktop will be backed up every x units of time. The most basic form is the once a day backup, but other frequencies can also be specified. Choose this option for data that has the following characteristics:
    • The RPO (Recovery Point Objective) requirements are not too stringent, i.e. it's OK to lose data generated in x/2 interval on an average in case of a disaster.
    • The change rate is quite high and the incremental cost of backing up intermediate data is not justified by the additional coverage it may provide.
  • Continuous backup: Conceptually, continuous backup or CDP of your desktop laptop data sounds perfect: there are no schedules to manage and you have continuous coverage, but there are some things to watch out here.
    • Intermediate version proliferation: If a user is working on a document on their PC and they save it 20 times during the day, CDP ends up backing up 20 versions of that document, most of which the user is never going to go back to. Similarly, if you backup a PST file, every time Microsoft Outlook writes to the PST file, a backup is created. Proliferation of intermediate backup versions is a problem with CDP which increases storage and bandwidth footprint with arguably little added value.
    • Impact to the end user: The other side of continuous backup is that it is well, continuous. It can impact the end user by performing ongoing backup on the PC exactly while the user is using the PC. Make sure that the disruption is worth the benefit of having access to every save event.
    • Disconnected backup: A PC, especially a laptop, is frequently disconnected. Make sure the CDP solution you are looking at can backup PC data even when it is disconnected.
    • Recovery experience: The recovery experience re: how the intermediate versions are accessed is equally important. If a user saved 20 versions of the document and CDP stored 20 backup versions - how does the user find what they are looking for? Do they have to recover all 20 files one by one and then find what they are looking for, or can they search for what they are looking for?
  • Near continuous backup: In talking with several customers, near continuous backup seems to be the best option for most data for the enterprise laptop and desktop population, because it retains the best features of CDP but solves the issues listed above. Like CDP, there are no schedules to manage. But, near continuous backup prevents intermediate version proliferation by limiting the number of total backup versions for a file. A well-designed implementation also doesn't impact the end user and can handle lack of a connection. Make sure you understand the scaling capabilities of the near continuous backup implementation, specifically, how does the backup server handle a large number of enterprise desktop and laptop backup requests and what is the behavior when the peak load capacity is exceeded.

For desktop and laptop backup you need to have different options for different types of data.  For example, for your end user documents, near continuous backup may make the most sense, while for data like PST files, perhaps the option to backup every x interval makes most sense.  Look for a software or a solution that provides you flexibility to specify different frequency options for different types of data.

All Posts