Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

Data Domain: How to solve high space consumption or low available capacity on Data Domain Restorers (DDRs)

Summary: This article describes how to assist with issues relating to high space usage or a lack of available capacity on Data Domain Restorers (DDRs)

This article may have been automatically translated. If you have any feedback regarding its quality, please let us know using the form at the bottom of this page.

Article Content


Symptoms

 
All Data Domain Restorers (DDRs) contain a pool/area of storage known as the 'active tier':
  • This is the area of disk where newly ingested files/data resides and on most DDRs files remain here until expired/deleted by a client backup application
  • On DDRs configured with Extended Retention (ER) or Long Term Retention (LTR) the data movement process may periodically run to migrate old files from the active tier to archive or cloud tiers
  • The only way in which to reclaim space in the active tier which was used by deleted/migrated files is by running the garbage collection/clean process (GC)
Current utilization of the active tier can be displayed using the 'filesys show space' or 'df' commands:
 
# df

Active Tier:
Resource           Size GiB   Used GiB   Avail GiB   Use%   Cleanable GiB*
----------------   --------   --------   ---------   ----   --------------
/data: pre-comp           -    33098.9           -      -                -
/data: post-comp    65460.3      518.7     64941.6     1%              0.0
/ddvar                 29.5       19.7         8.3    70%                -
/ddvar/core            31.5        0.2        29.7     1%                -
----------------   --------   --------   ---------   ----   --------------

If configured, details of archive/cloud tiers will be shown below the active tier.

Utilization of the active tier must be carefully managed otherwise the following may occur:
  • The active tier may start to run out of available space causing alerts/messages such as the following to be displayed:
EVT-SPACE-00004: Space usage in Data Collection has exceeded 95% threshold.
  • If the active tier becomes 100% full no new data can be written to the DDR which may cause backups/replication to fail - in this scenario alerts/messages such as the following may be displayed:
CRITICAL: MSG-CM-00002: /../vpart:/vol1/col1/cp1/cset: Container set [container set ID] out of space
  • In some circumstances, the active tier becoming full may cause the Data Domain File System (DDFS) to become read-only at which point existing files cannot be deleted
This knowledge base article attempts to:
  • Explain why the active tier may become full
  • Describe a simple set of checks which can be performed to determine the cause of high utilization of the active tier and corresponding remedial steps
Note that:
  • This article is not exhaustive (there may be some situations where the active tier of a DDR becomes highly utilized or full for a reason not discussed in this document).
  • This article does not cover high utilization of archive or cloud tiers

Cause

 



 
The active tier of a DDR can experience higher than expected utilization for several reasons:
  • Backup files/save sets are not being correctly expired/deleted by client backup applications due to incorrect retention policy or backup application configuration
  • Replication lag causing a large amount of old data to be kept on the active tier pending replication to replicas
  • Data being written to the active tier has a lower than expected overall compression ratio
  • The system has not been sized correctly, that is it is simply too small for the amount of data which is being attempted to be stored on it
  • Backups consist of many small files - these files consume more space than is expected when initially written however this space should be reclaimed during clean/garbage collection
  • Data movement is not being run regularly on systems configured with ER/LTR causing old files which should be migrated to archive/cloud tiers to remain on the active tier
  • Cleaning/garbage collection is not being run regularly
  • Excessive or old mtree snapshots existing on the DDR preventing clean from reclaiming space from deleted files/data

Resolution

Step 1 - Determine whether an active tier clean must be run.

The Data Domain Operating System (DDOS) attempts to maintain a counter called 'Cleanable GiB' for the active tier. This is an estimation of how much physical (post-comp) space could potentially be reclaimed in the active tier by running clean/garbage collection. This counter is shown by the 'filesys show space'/'df' commands:
 
Active Tier:
Resource           Size GiB    Used GiB   Avail GiB   Use%   Cleanable GiB*
----------------   --------   ---------   ---------   ----   --------------
/data: pre-comp           -   7259347.5           -      -                -
/data: post-comp   304690.8    251252.4     53438.5    82%           51616.1 <=== NOTE
/ddvar                 29.5        12.5        15.6    44%                -
----------------   --------   ---------   ---------   ----   --------------

If either:
  • The value for 'Cleanable GiB' is large
  • DDFS has become 100% full (and is therefore read-only)
Clean should be performed and allowed to run to completion before continuing with any further steps in this document. To start clean the 'filesys clean start' command should be used, that is:
 
# filesys clean start
Cleaning started.  Use 'filesys clean watch' to monitor progress.

To confirm that clean has started as expected the 'filesys status' command can be used, that is:
 
# filesys status
The filesystem is enabled and running.
Cleaning started at 2017/05/19 18:05:58: phase 1 of 12 (pre-merge)
 50.6% complete, 64942 GiB free; time: phase  0:01:05, total  0:01:05

Note that:
  • If clean is not able to start please contact your contracted support provider for further assistance - this may indicate that the system has encountered a 'missing segment error' causing clean to be disabled
  • If clean is already running the following message will be displayed when it is attempted to be started:
**** Cleaning already in progress.  Use 'filesys clean watch' to monitor progress.
  • No space in the active tier will be freed/reclaimed until clean reaches its copy phase (by default phase 9 in DDOS 5.4.x and earlier, phase 11 in DDOS 5.5.x and later). For further information about the phases used by clean see: https://support.emc.com/kb/446734
  • Clean may not reclaim the amount of space indicated by 'Cleanable GiB' as this value is essentially an estimation. For further information about this see: https://support.emc.com/kb/485637
  • Clean may not reclaim all potential space in a single run - this is because on DDRs containing large data sets clean will work against the portion of the file system containing the most superfluous data (that is to give the best return in free space for time taken for clean to run). In some scenarios clean may need to be run multiple times before all potential space is reclaimed
  • If the value for 'Cleanable GiB' is large, this indicates that clean has not been running at regular intervals. Check that a clean schedule has been set:
# filesys clean show schedule

If necessary set an active tier clean schedule - for example to run every Tuesday at 6am:
# filesys clean set schedule Tue 0600
Filesystem cleaning is scheduled to run "Tue" at "0600".
On systems configured with Extended Retention (ER) clean may be configured to run after data movement completes and may not have its own separate schedule. This scenario is covered later in this document.

Once clean has completed use the 'filesys show space'/'df' commands to determine whether utilization issues have been resolved. If usage is still high proceed to run through the remaining steps in this article.

Step 2 - Check for large amounts of replication lag against source replication contexts

Native Data Domain replication is designed around the concept of 'replication contexts'. For example when data needs to be replicated between systems:
  • Replication contexts are created on source and destination DDRs
  • The contexts are initialised
  • Once initialisation is complete replication will periodically send updates/deltas from source to destination to keep data on the systems synchronised
If a source replication context lags then it can cause old data to be held on disk on the source system (note that lagging replication contexts cannot cause excessive utilisation on the destination system):
  • Directory replication contexts (used when replicating a single directory tree under /data/col1/backup between systems):
Directory replication uses a replication log on the source DDR to track outstanding files which have not yet been replicated to the destination
If a directory replication context is lagging then the replication log on the source DDR will track many files which are pending replication.
Even if these files are deleted, whilst they continue to be referenced by the replication log, clean cannot reclaim space on the disk used by these files.
  •  Mtree replication contexts (used when replicating any mtree other than /data/col1/backup between systems):
Mtree replication uses snapshots created on source and destination systems to determine differences between systems and therefore which files must be sent from source to destination.
If an mtree replication context is lagging, then the corresponding mtree may have old snapshots created against it on source and destination systems.
Even if files are from the replicated mtree on the source system if those files existed when mtree replication snapshots were created on the system clean cannot reclaim space on disk used by these files.
  • Collection replication contexts (used when replicating the entire contents of one DDR to another system):
Collection replication performs 'block based' replication of all data on a source system to a destination system.
If a collection replication is lagging, then clean on the source system cannot operate optimally - in this scenario an alert will be generated on the source indicating that a partial clean is being performed to avoid using synchronization with the destination system.
Clean will therefore be unable to reclaim as much space as expected on the source DDR.

 To determine if replication contexts are lagging the following steps should be performed:
  • Determine the hostname of the current system:
sysadmin@dd4200# hostname
The Hostname is: dd4200.ddsupport.emea
  • Determine the date/time on the current system:
sysadmin@dd4200# date
Fri May 19 19:04:06 IST 2017
  • List replication contexts configured on the system along with their 'synced as of time'. Contexts of interest are those where the 'destination' does NOT contain the hostname of the current system (which indicates that the current system is the source) and the 'synced as of time' is old:
sysadmin@dd4200# replication status
CTX   Destination                                                                          Enabled   Connection     Sync'ed-as-of-time   Tenant-Unit
---   ----------------------------------------------------------------------------------   -------   ------------   ------------------   -----------    
3     mtree://dd4200.ddsupport.emea/data/col1/DFC                                          no        idle           Thu Jan 8 08:58     -   <=== NOT INTERESTING  - CURRENT SYSTEM IS THE DESTINATION
9     mtree://BenDDVE.ddsupport.emea/data/col1/BenMtree                                    no        idle           Mon Jan 25 14:48     -   <=== INTERESTING - LAGGING AND CURRENT SYSTEM IS THE SOURCE
13    dir://DD2500-1.ddsupport.emea/backup/dstfolder                                       no        disconnected   Thu Mar 30 17:55     -   <=== INTERESTING - LAGGING AND CURRENT SYSTEM IS THE SOURCE
17    mtree://DD2500-1.ddsupport.emea/data/col1/oleary                                     yes       idle           Fri May 19 18:57     -   <=== NOT INTERESTING - CONTEXT IS UP TO DATE   
18    mtree://dd4200.ddsupport.emea/data/col1/testfast                                     yes       idle           Fri May 19 19:18     -   <=== NOT INTERESTING - CONTEXT IS UP TO DATE
---   ----------------------------------------------------------------------------------   -------   ------------   ------------------   -----------

Contexts for which the current system is the source and which are showing significant lag or contexts which are no longer required should be broken. This can be performed by running the following command on the source and destination system:
 
# replication break [destination]

For example, to break the 'interesting' contexts shown above, the following commands would be run on source and destination:
(dd4200.ddsupport.emea): # replication break mtree://BenDDVE.ddsupport.emea/data/col1/BenMtree
(BenDDVE.ddsupport.emea): # replication break mtree://BenDDVE.ddsupport.emea/data/col1/BenMtree

(dd4200.ddsupport.emea): # replication break dir://DD2500-1.ddsupport.emea/backup/dstfolder
(DD2500-1.ddsupport.emea): # replication break dir://DD2500-1.ddsupport.emea/backup/dstfolder
Note that:
  • Once contexts are broken active tier clean will need to be performed to reclaim potential space in the active tier
  • If using mtree replication once contexts are broken mtree replication snapshots may remain on disk. Make sure that step 5 is followed to expire any superfluous snapshots prior to running clean
  • If the source/destination mtree is configured to migrate data to archive or cloud tiers care should be taken when breaking corresponding mtree replication contexts as these contexts may not be able to be recreated/initialised again in the future. The reason for this is that when an mtree replication context is initialised an mtree snapshot is created on the source system and contains details of all files in the mtree (regardless of tier). This snapshot is then replicated in full to the active tier of the destination. As a result if the active tier of the destination does not have sufficient free space to ingest all the mtrees data from the source the initialise will not be able to complete. For further information on this issue please contact your contracted support provider
  • If a collection replication context is broken the context will not be able to be recreated/initialised without first destroying the instance of DDFS on the destination DDR (and losing all data on this system). As a result a subsequent initialise can take considerable time/network bandwidth as all data from the source must be physically replicated to the destination again
Step 3 - Check for mtrees which are no longer required

The contents of DDFS is logically divided into mtrees. It is common for individual backup applications/clients to write to an individual mtrees. If a backup application is decomissioned it will no longer be able to write data to/delete data from the DDR which may leave old/superfluous mtrees on the system. Data in these mtrees will continue to exist indefinitely using space on disk on the DDR. As a result any such superfluous mtrees should be deleted. For example:
  • Obtain a list of mtrees on the system:
# mtree list
Name                                                            Pre-Comp (GiB)   Status 
-------------------------------------------------------------   --------------   -------
/data/col1/Budu_test                                                     147.0   RW     
/data/col1/Default                                                      8649.8   RW     
/data/col1/File_DayForward_Noida                                          42.0   RW/RLCE
/data/col1/labtest                                                      1462.7   RW     
/data/col1/oscar_data                                                      0.2   RW     
/data/col1/test_oscar_2                                                  494.0   RO/RD     
-------------------------------------------------------------   --------------   -------
  • Any mtrees which are no longer required should be deleted with the 'mtree delete' command, i.e.:
# mtree delete [mtree name]

For example:
# mtree delete /data/col1/Budu_test
...
MTree "/data/col1/Budu_test" deleted successfully.
 
  • Space consumed on disk by the deleted mtree will be reclaimed the next time active tier clean/garbage collection is run.
Note that:
  • Mtrees which are destinations for mtree replication (i.e. have a status of RO/RD in the output of mtree list) should have their corresponding replication context broken before the mtree is deleted
  • Mtrees which are used as DDBoost logical storage units (LSUs) or as virtual tape library (VTL) pools may not be able to be deleted via the 'mtree delete' command - refer to the Data Domain Administration Guide for further details on deleting such mtrees
  • Mtrees which are configured for retention lock (i.e. have a status of RLCE or RLGE) cannot be deleted - instead individual files within the mtree must have any retention lock reverted and be deleted individually - refer to the Data Domain Administration Guide for further details
Step 4 - Check for old/superfluous mtree snapshots

A Data Domain snapshot represents a point in time snapshot of the corresponding mtree. As a result:
  • Any files which exist within the mtree when the snapshot is created will be referenced by the snapshot
  • Whilst the snapshot continues to exist even if these files are removed/deleted clean will not be able to reclaim any physical space they use on disk - this is because the data must stay on the system in case the copy of the file in the snapshot is later accessed
To determine whether any mtrees have old/superfluous snapshots the following steps should be performed:
  • Obtain a list of mtrees on the system using the 'mtree list' command as shown in step 3
  • List snapshots which exist for each mtree using the 'snapshot list' command:
# snapshot list mtree [mtree name]

When run against an mtree with no snapshots the following will be displayed:
# snapshot list mtree /data/col1/Default
Snapshot Information for MTree: /data/col1/Default
----------------------------------------------
No snapshots found.

When run against an mtree with snapshots the following will be displayed:
 
# snapshot list mtree /data/col1/labtest
Snapshot Information for MTree: /data/col1/labtest
----------------------------------------------
Name                                  Pre-Comp (GiB)   Create Date         Retain Until        Status 
------------------------------------  --------------   -----------------   -----------------   -------
testsnap-2016-03-31-12-00                     1274.5   Mar 31 2016 12:00   Mar 26 2017 12:00   expired
testsnap-2016-05-31-12-00                     1198.8   May 31 2016 12:00   May 26 2017 12:00          
testsnap-2016-07-31-12-00                     1301.3   Jul 31 2016 12:00   Jul 26 2017 12:00          
testsnap-2016-08-31-12-00                     1327.5   Aug 31 2016 12:00   Aug 26 2017 12:00          
testsnap-2016-10-31-12-00                     1424.9   Oct 31 2016 12:00   Oct 26 2017 13:00          
testsnap-2016-12-31-12-00                     1403.1   Dec 31 2016 12:00   Dec 26 2017 12:00          
testsnap-2017-01-31-12-00                     1421.0   Jan 31 2017 12:00   Jan 26 2018 12:00          
testsnap-2017-03-31-12-00                     1468.7   Mar 31 2017 12:00   Mar 26 2018 12:00      
REPL-MTREE-AUTO-2017-05-11-15-18-32           1502.2   May 11 2017 15:18   May 11 2018 15:18         
-----------------------------------   --------------   -----------------   -----------------   ------
  • Where snapshots exist use the output from 'snapshot list mtree [mtree name]' to determine snapshots which:
Are not 'expired' (see status column)
Were created a significant time in the past (for example snapshots created in 2016 from the above list)

These snapshots should be expired such that they can be removed when clean runs and space they are holding on disk freed:

# snapshot expire [snapshot name] mtree [mtree name]

For example:
 
# snapshot expire testsnap-2016-05-31-12-00 mtree /data/col1/labtest
Snapshot "testsnap-2016-05-31-12-00" for mtree "/data/col1/labtest" will be retained until May 19 2017 19:31.
  • If the snapshot list command is run again these snapshots will now be listed as expired:
# snapshot list mtree /data/col1/labtest
Snapshot Information for MTree: /data/col1/labtest
----------------------------------------------
Name                                  Pre-Comp (GiB)   Create Date         Retain Until        Status 
------------------------------------  --------------   -----------------   -----------------   -------
testsnap-2016-03-31-12-00                     1274.5   Mar 31 2016 12:00   Mar 26 2017 12:00   expired
testsnap-2016-05-31-12-00                     1198.8   May 31 2016 12:00   May 26 2017 12:00   expired       
testsnap-2016-07-31-12-00                     1301.3   Jul 31 2016 12:00   Jul 26 2017 12:00          
testsnap-2016-08-31-12-00                     1327.5   Aug 31 2016 12:00   Aug 26 2017 12:00          
testsnap-2016-10-31-12-00                     1424.9   Oct 31 2016 12:00   Oct 26 2017 13:00          
testsnap-2016-12-31-12-00                     1403.1   Dec 31 2016 12:00   Dec 26 2017 12:00          
testsnap-2017-01-31-12-00                     1421.0   Jan 31 2017 12:00   Jan 26 2018 12:00          
testsnap-2017-03-31-12-00                     1468.7   Mar 31 2017 12:00   Mar 26 2018 12:00      
REPL-MTREE-AUTO-2017-05-11-15-18-32           1502.2   May 11 2017 15:18   May 11 2018 15:18         

-----------------------------------   --------------   -----------------   -----------------   -------

Note that:
  • It is not possible to determine how much physical data an individual snapshot or set of snapshots holds on disk - the only value for 'space' associated with a snapshot is an indication of the pre-compressed (logical) size of the mtree when the snapshot was created (as shown in the above output)
  • Snapshots which are named 'REPL-MTREE-AUTO-YYYY-MM-DD-HH-MM-SS' are managed by mtree replication and in normal circumstances should not need to be manually expired (replication will automatically expire these snapshots when they are no longer required). If such snapshots are extremely old then it indicates that the corresponding replication context is likely showing significant lag (as described in step 2)
  • Snapshots which are named 'REPL-MTREE-RESYNC-RESERVE-YYYY-MM-DD-HH-MM-SS'  are created by mtree replication when an mtree replication context is broken. Their intent is that they can be used to avoid a full resynchronization of replication data if the broken context is later recreated (for example if the context was broken in error). If replication will not be re-established these contexts can be manually expired as described above
  • Expired snapshots will continue to exist on the system until the next time clean/garbage collection is run - at this point they will be physically deleted and will be removed from the output of 'snapshot list mtree [mtree name]' - clean can then reclaim any space these snapshots were using on disk
Step 5 - Check for an unexpected number of old files on the system

Autosupports from the DDR contain histograms showing a break down of files on the DDR by age - for example:
 
File Distribution
-----------------
448,672 files in 5,276 directories

                          Count                         Space
               -----------------------------   --------------------------
         Age         Files       %    cumul%        GiB       %    cumul%
   ---------   -----------   -----   -------   --------   -----   -------
       1 day         7,244     1.6       1.6     4537.9     0.1       0.1
      1 week        40,388     9.0      10.6    63538.2     0.8       0.8
     2 weeks        47,850    10.7      21.3    84409.1     1.0       1.9
     1 month       125,800    28.0      49.3   404807.0     5.0       6.9
    2 months       132,802    29.6      78.9   437558.8     5.4      12.3
    3 months         8,084     1.8      80.7   633906.4     7.8      20.1
    6 months         5,441     1.2      81.9  1244863.9    15.3      35.4
      1 year        21,439     4.8      86.7  3973612.3    49.0      84.4
    > 1 year        59,624    13.3     100.0  1265083.9    15.6     100.0
   ---------   -----------   -----   -------   --------   -----   -------

This can be useful to determine if there are files on the system which have not been expired/deleted as expected by the client backup application. For example if the above system were written to by a backup application where the maximum retention period for any one file was 6 months it is immediately obvious that the backup application is not expiring/deleting files as expected as there are approximately 80,000 files older than 6 months on the DDR.

Note that:
  • It is the responsibility of the backup application to perform all file expiration/deletion
  • A DDR will never expire/delete files automatically - unless instructed by the backup application to explicitly delete a file the file will continue to exist on the DDR using space indefinitely
As a result issues such as this should first be investigated by the backup application vendors support team.

If required Data Domain support can provide additional reports to:
  • Give the name/modification time of all files on a DDR ordered by age (so the name/location of any old data can be determined)
  • Split out histograms of file age into separate reports for the active/archive/cloud tier (where the ER/LTR features are enabled)
To perform this:
  • Collect evidence as described in the 'Collecting sfs_dump' paragraph of the notes section of this document
  • Open a service request with your contracted support provider
Once old/superfluous files are deleted active tier clean/garbage collection will need to be run to physically reclaim space in the active tier

Step 6 - Check for backups which include a large number of small files

Due to the design of DDFS small files (essentially any file which is smaller than approximately 10Mb in size) can consume excessive space when initially written to the DDR. This is due to the 'SISL' (Stream Informed Segment Layout) architecture causing small files to consume multiple individual 4.5Mb blocks of space on disk. For example a 4Kb file may actually consume up to 9Mb of physical disk space when initially written.

This excessive space is subsequently reclaimed when clean/garbage collection is run (as data from small files is then aggregated into a smaller number of 4.5Mb blocks) but can cause smaller models of DDR to show excessive utilisation and fill when such backups are run.

Autosupports contain histograms of files broken down by size, for example:
 
                          Count                         Space
               -----------------------------   --------------------------
        Size         Files       %    cumul%        GiB       %    cumul%
   ---------   -----------   -----   -------   --------   -----   -------
       1 KiB         2,957    35.8      35.8        0.0     0.0       0.0
      10 KiB         1,114    13.5      49.3        0.0     0.0       0.0
     100 KiB           249     3.0      52.4        0.1     0.0       0.0
     500 KiB         1,069    13.0      65.3        0.3     0.0       0.0
       1 MiB           113     1.4      66.7        0.1     0.0       0.0
       5 MiB           446     5.4      72.1        1.3     0.0       0.0
      10 MiB           220     2.7      74.8        1.9     0.0       0.0
      50 MiB         1,326    16.1      90.8       33.6     0.2       0.2
     100 MiB            12     0.1      91.0        0.9     0.0       0.2
     500 MiB           490     5.9      96.9      162.9     0.8       1.0
       1 GiB            58     0.7      97.6       15.6     0.1       1.1
       5 GiB            29     0.4      98.0       87.0     0.5       1.6
      10 GiB            17     0.2      98.2      322.9     1.7       3.3
      50 GiB            21     0.3      98.4     1352.7     7.0      10.3
     100 GiB            72     0.9      99.3     6743.0    35.1      45.5
     500 GiB            58     0.7     100.0    10465.9    54.5     100.0
   > 500 GiB             0     0.0     100.0        0.0     0.0     100.0
   ---------   -----------   -----   -------   --------   -----   -------

If there is evidence of backups writing very large numbers of small files then the system may be affected by significant temporary increases in utilisation between each invocation of clean/garbage collection. In this scenario it is preferable to change backup methodology to include all small files into a single larger archive (such as a tar file) before writing them to the DDR. Note that any such archive should not be compressed or encrypted (as this will damage the compression ratio/de-duplication ratio of that data).

Step 7 - Check for lower than expected de-duplication ratio

The main purpose of a DDR is to de-duplicate and compress and data ingested by the device. The ratio of de-duplication/compression is very much dependent on the use case of the system and the type of data which it holds however in many cases there will be an 'expected' overall compression ratio based on results obtained through proof of concept testing or similar. To determine the current overall compression ratio of the system (and therefore whether it is meeting expectations) the 'filesys show compression' command can be used. For example:
 
# filesys show compression

From: 2017-05-03 13:00 To: 2017-05-10 13:00

Active Tier:
                   Pre-Comp   Post-Comp   Global-Comp   Local-Comp      Total-Comp
                      (GiB)       (GiB)        Factor       Factor          Factor
                                                                     (Reduction %)
----------------   --------   ---------   -----------   ----------   -------------
Currently Used:*    20581.1       315.4             -            -    65.3x (98.5)
Written:
  Last 7 days         744.0         5.1         80.5x         1.8x   145.6x (99.3)
  Last 24 hrs
----------------   --------   ---------   -----------   ----------   -------------
 * Does not include the effects of pre-comp file deletes/truncates

In the above example the system is achieving an overall compression ratio of 65.3x for the active tier (which is extremely good). If, however, this value shows that overall compression ratio is not meeting expectations then further investigation is likely to be required. Note that investigating lower than expected compression ratio is a complex subject which can have many root causes. For further information on investigating further please see the following article: https://support.emc.com/kb/487055

Step 8 - Check whether the system is a source for collection replication

When using collection replication if the source system is physically larger than the destination the size of the source system will be artificially limited to match that of the destination (i.e. there will be an area of disk on the source which will be marked as unusable). The reason for this is that when using collection replication the destination is required to be a block level copy of the source however if the source if physically larger than the destination there is a chance that excessive data may be written to the source which then cannot be replicated to the destination (as it is already full). This scenario is avoided by limiting the size of the source to match the destination.
  • Using the commands from step 2 check whether the system is a source for collection replication. To do this run ''replication status' and determine if there are any replication contexts starting 'col://' (indicating collection replication) which do NOT contain the hostname of the local system in the destination (indicating that this system must be a source for the replication context)
  • If the system is a source for collection replication check the size of each systems active tier by logging into both and running the 'filesys show space' command - compare the active tiers 'post-comp' size on each
  • If the source is significantly larger than the destination then its active tier size will be artificially limited
  • To allow all space on the source to be usable for data the following should be performed:
Add additional storage to the destination active tier such that its size is >= the size of the source active tier
Break the collection replication context (using commands from step 2) - note that this will obviously prevent data being replicated from source -> destination DDR

As soon as either of these have been performed additional space will be made available immediately in the active tier of the source system (i.e. there is no need to run active tier clean/garbage collection before using this space)

Step 9 - Check whether data movement is being regularly run

If the DDR is configured with either Extended Retention (ER) or Long Term Retention (LTR) it will have a second tier of storage attached (archive tier for ER or cloud tier for LTR). In this scenario data movement policies are likely configured against mtrees to migrate older/unmodified data requiring long term retention from the active tier out to the alternate tier of storage such that space used by these files in the active tier can be physically reclaimed by clean/garbage collection. If data movement policies are incorrectly configured or if the data movement process is not regularly run then old data will remain in the active tier longer than expected and will continue to use physical space on disk.
  • Initially confirm whether the system is configured for ER or LTR by running 'filesys show space' and checking for the existence of an archive or cloud tier - note that to be usable these alternative tiers of storage must have post-comp size of > 0Gb:
# filesys show space
...
Archive Tier:
Resource           Size GiB   Used GiB   Avail GiB   Use%   Cleanable GiB
----------------   --------   --------   ---------   ----   -------------
/data: pre-comp           -     4163.8           -      -               -
/data: post-comp    31938.2     1411.9     30526.3     4%               -
----------------   --------   --------   ---------   ----   -------------

# filesys show space
...
Cloud Tier
Resource           Size GiB   Used GiB   Avail GiB   Use%   Cleanable GiB
----------------   --------   --------   ---------   ----   -------------
/data: pre-comp           -        0.0           -      -               -
/data: post-comp   338905.8        0.0    338905.8     0%             0.0
----------------   --------   --------   ---------   ----   -------------

Note that ER and LTR are mutually exclusive so a system will either contain only an active tier (no ER/LTR configured) or an active and archive tier (ER configured) or an active and cloud tier (LTR configured)
  • If the system is configured with ER/LTR check data movement policies against mtrees to ensure that these are as expected and set such that old data will be pushed out to the alternate tier of storage:
ER: # archive data-movement policy show
LTR: # data-movement policy show

If data movement policies are incorrect/missing these should be corrected - refer to the Data Domain Administrators Guide for assistance in performing this
  • If the system is configured with ER/LTR check that data movement is scheduled to run at regular intervals to physically migrate files/data from the active tier to alternate storage:
ER: # archive data-movement schedule show
LTR: # data-movement schedule show

Note that Data Domain generally recommends running data movement via an automated schedule however some customers choose to run this process in an ad-hoc manner (i.e. when required). In this scenario data movement should be started regularly by running:
 
ER: # archive data-movement start
LTR: # data-movement start

For more information on modifying the data movement schedule refer to the Data Domain Administrators Guide
  • If the system is configured for ER/LTR check the last time data movement was run:
ER: # archive data-movement status
LTR: # data-movement status

If data movement has not been run for some time attempt to manually start the process then monitor as follows:
 
ER: # archive data-movement watch
LTR: # data-movement watch

If data movement fails to start for any reason please contact your contracted support provider for further assistance.
  • Once data movement is complete active tier clean should be run (note that it may be configured to start automatically on completion of data movement) to ensure that space used my migrated files in the active tier is physically freed:
# filesys clean start

On ER systems it is common to schedule data movement to run regularly (i.e. once a week) then configure active tier clean to run on completion of data movement. In this scenario active tier clean does not have its own independent schedule. To configure this initially remove the current active tier clean schedule:

# filesys clean set schedule never


Configure data movement to run periodically followed by automatic active tier clean - for example to run data movement every Tuesday at 6am followed by active tier clean:

# archive data-movement schedule set days Tue time 0600
The Archive data movement schedule has been set.
Archive data movement is scheduled to run on day(s) "tue" at "06:00" hrs


It can be confirmed that active tier clean is configured to run after completion of data movement as follows:

# archive show config
Enabled                                         Yes                               
Data movement Schedule                          Run on day(s) "tue" at "06:00" hrs   <=== SCHEDULE
Data movement throttle                          100 percent                       
Default age threshold data movement policy      14 days                           
Run filesys clean after archive data movement   Yes   <=== RUN CLEAN ON COMPLETION
Archive Tier local compression                  gz                                
Packing data during archive data movement       enabled                           
Space Reclamation                               disabled                          
Space Reclamation Schedule                      No schedule

On LTR systems active tier clean should still be configured with its own schedule

Step 10 - Add additional storage to the active tier

If all previous steps have been performed, active tier clean run to completion, however there is still insufficient space available on the active tier it is likely that the system has not been correctly sized for the workload it is receiving. In this case one of the following should be performed:
  • Reduce workload hitting the system - for example:
Redirect a subset of backups to alternate storage
Reduce retention period of backups such that they are expired/deleted more quickly
Reduce the number/expiration period of scheduled snapshots against mtrees on the system
Break superfluous replication contexts for which the local system is a destination then delete corresponding mtrees
  • Add additional storage to the active tier of the system and expand its size:
# storage add [tier active] enclosure [enclosure number] | disk [device number]
# filesys expand

To discuss addition of storage please contact your sales account team.

Article Properties


Affected Product

Data Domain

Product

Data Domain

Last Published Date

11 Dec 2023

Version

5

Article Type

Solution