Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

Avamar space reclamation processes - Part 1: Garbage Collection

Summary: This KB article describes the first part of the Avamar space reclamation process. This is known as garbage collection.

This article may have been automatically translated. If you have any feedback regarding its quality, please let us know using the form at the bottom of this page.

Article Content


Instructions

This article is the first in a series which documents how Avamar recycles space, both within the GSAN and on the hard drives.


The current implementation of garbage collection was introduced along with Avamar v7.0, and its design has remained largely unchanged.

What does garbage collection do?

Garbage collection is the first stage of the process where Avamar reclaims space that was used to store backup data.

It operates on the cur directory, and frees up space within the GSAN by removing data chunks which are no longer referenced by any backup:

  • Data is said to be "defined" if it can be looked up in the index.
  • Data is referenced if it exists as part of a backup (that is the hash is present in the User Accounting System, composite stripes, or directory elements).

Space that is reclaimed by garbage collection cannot be reused until after crunching has run. Crunching runs immediately after the daily scheduled garbage collection has finished. See Avamar space reclamation processes - Part 2: Crunching.


When does garbage collection run?

    Garbage collection runs at the beginning of the maintenance window, before the checkpoint/hfs/checkpoint cycle. During this time, incoming backups to the system should be limited, so garbage collection can run without loading the system heavily.


    How long does garbage collection run for?

    By default, garbage collection runs for 4 hours. If two passes do not complete within this time, the run time of the next garbage collection will be increased by 15 minutes. This continues until either two passes complete successfully, or the default limit of 7 hours (420 minutes) is reached.
     
      What can prevent garbage collection from running successfully?

      Common issues are listed below. Some articles may require authentication on the Dell Support site to be viewed.



      How garbage collection works

      Step 1 - Building the table of reference counts (TORC):

      Garbage collection reads entries in the user accounting system, the composite stripes, and directory elements to build a Table Of Reference Counts (TORC).
      In the TORC, garbage collection records all hashes on the system and how many times each hash is referenced.


      Step 2 - Reading the indexes:

      Once the TORC is complete, each node loads a subset of its individual index stripes into memory. The number of stripes read is defined by the gccount parameter. For each hash defined in the index, garbage collection looks up the hash in the TORC to check if it is referenced.

      • If the hash exists in both the index and the TORC, there is nothing to do. Every hash in the TORC has a reference count of at least 1, so the hash is both defined and referenced.

      • If the hash exists in the index, but not in the TORC, the hash is defined but not referenced, so can be removed.

      Note: If the hash existed in the TORC but not in the index, this would be a data integrity error (hash that is referenced but not defined). This results in an hfscheck failure.


      Step 3 - Remove unreferenced hashes:

      As we noted earlier, hashes that are not referenced are not part of any backup, so can be safely removed from the Avamar. To do this, garbage collection:

      1. Removes the entry in the index.
      2. Zeros out the entry for the hash in the Chunk Header Descriptor (CHD). The CHD defines where individual chunks are inside the stripe container.

      Avamar has marked the area the hash was occupying as empty. For performance and, or capacity reasons, the data is not deleted at this stage.


      Step 4 - Update the TORC:

      If the chunk which garbage collection removed was a composite, the TORC must be updated.

      If we look back at step 1, the reference counts in the TORC include references made by composite stripes, which contain composite chunks.

      Since a composite chunk was removed, we can decrement the reference count in the TORC by one for any hashes referenced by that composite chunk.

      Garbage collection does this by reading in the composite, to see which hashes it references, and then updating the TORC.


      Step 5 - Read the next set of indexes:

      Garbage collection unloads the previous set of index stripes from memory, and then loads a new set.

      Steps 2, 3 and 4 are repeated for these new index stripes.

      Once all the index stripes have been read, any data chunk (known as 'atomic' chunks) in the TORC that has 0 references (thanks to step 4), is removed.


      Step 6 - Start a new pass:

      Once all the indexes have been read, garbage collection starts a new pass.
      All the index stripes are re-read, looking for data that is no longer referenced thanks to our previous passes.

      This is necessary because hashes are not read in a logical order, but rather in the order they are stored in the indexes.
      Garbage collection is not certain to find the hashes in the optimal order. A hash can remain referenced until the end of the pass.

      Two passes of garbage collection can comfortably maintain a "steady-state" capacity in most Avamar server environments.
      Garbage collection performs passes until it runs out of time, or a pass completes without removing any data.



      Manual garbage collection

      Micromanaging an Avamar server should not be required. The scheduler is intended to automate the running of maintenance tasks. If Avamar capacity is high, see the Avamar Operational Best Practices Guide and Avamar: Capacity Management Concepts and Training.

      On rare occasions, running garbage collection might help alleviate acute issues where the GSAN "User capacity" is so high that the system enters read-only mode. 
      In these cases, garbage collection is run manually to bring down the capacity level to just below the read-only threshold. This allows the backup window to run.
      Automated garbage collection can continue working as normal.

      Avamar Support should fully investigate and understand the situation before manual garbage collection is considered.

      It is never appropriate to request that Support runs manual garbage collection on a system without authorization from an L2 support engineer after such an investigation.
      See Avamar - About the use of manual Garbage Collection.

      Additional Information



       

      Article Properties


      Affected Product

      Avamar

      Product

      Avamar, Avamar Server

      Last Published Date

      11 Apr 2024

      Version

      7

      Article Type

      How To