Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.
Some article numbers may have changed. If this isn't what you're looking for, try searching all articles. Search articles

Article Number: 000018664


Running clean on a Data Domain Restorer (DDR) does not reclaim the amount of physical space indicated by 'Cleanable Gb'

Summary: Running clean on a Data Domain Restorer (DDR) does not reclaim the amount of physical space indicated by 'Cleanable Gb'

Article Content


Symptoms

Data Domain Restorers (DDRs) allow users to display system utilization via the 'filesys show space' or 'df' commands. For example:

# filesys show space
Active Tier:
Resource           Size GiB    Used GiB   Avail GiB   Use%   Cleanable GiB*
----------------   --------   ---------   ---------   ----   --------------
/data: pre-comp           -   1970382.1           -      -                -
/data: post-comp   150830.3    111365.1     39465.2    74%           8252.4
/ddvar                308.1        95.6       196.9    33%                -
----------------   --------   ---------   ---------   ----   --------------
* Estimated based on last cleaning of 2016/06/24 09:45:56.


One of the uses of this command is to determine physical (post-comp) utilization of the system. When run against the active tier this output will include a figure called 'Cleanable Gb'. To explain this figure further:
  • When a file is removed from the 'Data Domain File System' (DDFS) it is no longer visible on the DDR (it is removed from the DDFS name space) however any data referenced by that file is not immediately removed
  • As a result the amount of free space on the system does not immediately increase
  • To reclaim this space a process called garbage collection (GC) or cleaning must be run
  • The purpose of cleaning is to look for data on disk which is superflous (i.e. referenced by deleted objects such as files, snapshots, or mtrees) and physically remove/clean up this data freeing space for new ingest
  • The purpose of the 'Cleanable Gb' figure is to provide an indication of how much space is likely to be freed if cleaning were started at any given point in time
Note, however, that 'Cleanable Gb' is an estimate and is not expected to be exact (in some cases, due to workload/use case, it may be very inaccurate). As a result, running cleaning can free up significantly more or less physical disk space than indicated by 'Cleanable Gb'.

This behavior is expected and is due to the design of DDFS.

Cause

To understand how 'Cleanable Gb' functions it is first necessary to understand what happens when a file is written to a DDR:
  • The file is sent to the DDR from the backup client - the DDR records the original (i.e. logical) size of the file (original bytes)
  • The file is anchored and segmented (split into 4-12Kb chunks called segments) with a unique fingerprint being generated for each segment
  • The fingerprint of each segment is checked against indices on the DDR - if the fingerprint already exists in indices the corresponding segment already exists on disk (i.e. it is a duplicate)
  • Duplicate segments do not need to be written out to disk so are effectively replaced by a pointer to the segment already on disk
  • Segments whose fingerprints do not exist in indices are unique/new so must be written to disk
  • Once all duplicate segments have been replaced (i.e. the file has been de-duplicated) the DDR records the size of the unique segments (globally compressed bytes)
  • The unique segments are compressed (by default with lz) before being placed in 4.5Mb containers and written out to disk
  • The DDR records the size of the unique segments after compression (i.e. the physical disk space consumed by the files unique data) (locally compressed bytes)
  • In addition to the above a map of segments making up the file (segment tree) is generated and written to disk - this is to allow the file to be read back/reconstructed in the future
Once the file has been ingested values for original bytes/globally compressed bytes/locally compressed bytes/metadata are recorded. These values are known as 'per file compression statistics'.

Over time as the DDR is used it is expected that files are created/deleted/clean run. This can change the way an existing file de-duplicates. For example, lets assume that a file is written to the DDR which contains 1Mb of unique data - at the point of ingesting that file this 1Mb of data is referenced only by this file. Over time, however, a further 9 files are written to the DDR all of which contain the same 1Mb of data. Now this 1Mb of data is referenced by 10 files total (i.e. the way in which that 1Mb of data is referenced has changed).

Note, however, that despite the above life cycle of a files data, the per file compression statistics for each of the files written are not changed (i.e. the statistics for the very first file writing the 1Mb unique data to disk will still appear to show that this file 'owns' the data even though it is now referenced by 10 files).

As a result of this these per file compression statistics effectively go stale over time.

Now it is possible to discuss how the 'Cleanable Gb' value is generated:
  • Over time files get deleted from the DDR
  • Every time a file is deleted we take the files 'locally compressed bytes' (i.e. physical amount of space the file took on disk when ingested) and add this to the current value of 'Cleanable Gb'
Note, however, that as per file compression stats may be stale 'Cleanable Gb' may not be accurate.

Finally an example of this in practice:
  • A 1Tb file (file1) of completely random data (which does not de-duplicate and/or compression) is written to a DDR. Per file compression statistics for file1 will show a 'locally compressed bytes' of ~1Tb
  • A 1Tb file (file2) is written to the DDR. This file shares 500Gb of data with file1 and has 500Gb unique random data. Per file compression statistics for file2 will show a 'locally compressed bytes' of 500Gb
  • A 1Tb file (file3) is written to the DDR. This file is identical to file2. Per file compression statistics for file3 will show a 'locally compressed bytes' of 0 bytes (as the entire file de-duplicates against file2
At some point later the files start to be deleted and clean run:
  • Initially 'Cleanable Gb' is 0
  • file1 is deleted - as file1s 'locally compressed bytes' is 1Tb 'Cleanable Gb' is incremented by 1Tb
  • Clean is run however only reclaims 500Gb of space - this is expected as the other 500Gb of data written by file1 must stay on disk as it is referenced by file2/file3
  • file2 is deleted - as file2s 'locally compressed bytes' is 500Gb 'Cleanable Gb' is incremented by 500Gb
  • Clean is run however no space is reclaimed - this is expected as all data referenced by file2 is also referenced by file3 (so must stay on disk)
  • file3 is deleted - as file3s 'locally compressed bytes' is 0 'Cleanable Gb' is incremented by 0
  • Clean is run however 1Tb physical space is reclaimed
Note that the above examples/explanations are extremely simplistic but do serve to demonstrate why 'Cleanable Gb' (and per file compression statistics) are estimates only and must be treated as such.

Resolution

This behavior is as expected given the design of DDFS. There is no way to work around this or to increase the accuracy of the 'Cleanable Gb' figure.

Article Properties


Affected Product

Data Domain

Product

Data Domain

Last Published Date

02 Jun 2021

Version

4

Article Type

Solution