Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

Article Number: 000065554


Avamar avsql hourly incremental backup and restore performance severely impacted due to large backup label numbers

Summary: Avamar avsql hourly incremental backup and restore performance severely impacted due to large backup label numbers.

Article Content


Symptoms

Affected Avamar SQL Plug-in versions:
7.1.x
7.2.x
7.3.x
7.4.x


Below is an example of a case scenario where the SQL plug-in Incremental backup is scheduled to run at every hour.

The backup size is small (around few GBs) but the time that is taken to run has gradually risen taking over 60 minutes to complete. This means that it is no longer possible to complete the hourly backups within the one hour slot as it was required and expected.

During the whole backup process, we may experience two different unexpected Wait Times that contribute to the overall slow performance.

In the SQL plug-in log, we see the following two Wait Times gaps:
 
First delay:
2016-09-13 11:00:02 avsql Info <19491>: SMO DLL version 11 detected.
2016-09-13 11:00:02 avsql Info <19493>: SMO 2011 managed DLL loaded.
2016-09-13 11:00:03 avsql Info <17380>: Number of databases: 45.
2016-09-13 11:00:03 avsql Info <6837>: Excluding target '(local)/xxxx'
2016-09-13 11:00:03 avsql Info <7065>: Working on target '(local)/xxxx'
2016-09-13 11:00:06 avsql Info <40399>: Last full backup of target '(local)/xxx' was found, labelnum = '532265'.

>>>> Here, see the gap where the process it seems to stop responding for 34 minutes approximately <<<<<<
2016-09-13 11:34:08 avsql Info <40401>: Only the LSN was verified, during log gap detection.
2016-09-13 11:34:08 avsql Info <14293>: Creating a database task for '(local)/xxxx/i-11' (hcount 669)


Second delay:
2016-09-13 11:43:53 avsql Info <7067>: Creating manager for snapview ...
2016-09-13 11:43:53 avsql Info <6686>: Process 8000 (C:\Program Files\avs\bin\avtar) for workorder <schedule-name>-<group-name>-<WID-number>#0 started
2016-09-13 11:43:53 avsql Info <7732>: Waiting for avtar to finish
2016-09-13 11:43:54 avsql Info <10684>: Setting ctl message version to 3 (from 1)
2016-09-13 11:43:54 avsql Info <16136>: Setting ctl max message size to 268435456

>>>> Here, see the gap where the process it seems to stop responding for 46 minutes approximately <<<<<<
2016-09-13 12:29:39 avsql Info <6688>: Process 8000 (C:\Program Files\avs\bin\avtar) finished (code 0: success)
2016-09-13 12:29:39 avsql Info <7271>: Final summary generated subwork 44, cancelled/aborted 0, snapview 520, exitcode 0

These two Wait Times cause the backups to run for 1 hour and 20 minutes approximately, more than it should.

In the avtar log for the snap-view process (#0), we see ~46 minutes of Wait Time. It should complete in a matter of a few seconds instead.

NOTE: The lines below are coming from a different set of logs, but the behavior is the same.
 
2017-02-28 13:03:58 avtar Info <7209>: Finished reading makeview input 2017-02-28 13:03:58 avtar Info <7210>: Finished generating makeview tables 2017-02-28 13:03:58 avtar Info <16522>: Starting copy of paths in account '/domain/client-name' label number 632806 2017-02-28 13:12:02 avtar Info <7694>: Server(<avamar-server-IP>) not responding (possible network congestion?) (300 seconds) 2017-02-28 13:17:02 avtar Info <7694>: Server(<avamar-server-IP>) not responding (possible network congestion?) (600 seconds) ... 2017-02-28 13:37:03 avtar Info <7694>: Server(<avamar-server-IP>) not responding (possible network congestion?) (1800 seconds) 2017-02-28 13:42:03 avtar Info <7694>: Server(<avamar-server-IP>) not responding (possible network congestion?) (2100 seconds) ... 2017-02-28 13:50:02 avtar Info <16522>: Starting copy of paths in account '/domain/client-name' label number 632804 ... 2017-02-28 13:50:17 avtar Info <6083>: Backed-up 33.59 GB in 46.40 minutes: 43 GB/hour (0 files/hour) 2017-02-28 13:50:17 avtar Info <7883>: Finished at 2017-02-28 13:50:17 Arabian Standard Time, Elapsed time: 0000h:46m:23s 2017-02-28 13:50:17 avtar Info <8468>: Sending wrapup message to parent 2017-02-28 13:50:17 avtar Info <5314>: Command completed (6 warnings, exit code 0: success)
 
We see that the error is a bit misleading since it points to an issue with the "network" or "a nonresponsive Avamar server." In reality, this is due to the high Label Number that, in this case, is above 600 K.

Cause

As observed, the cause of the slow performance can be split into two parts;
  • "Initial Wait Time."
  • "Finalizing Wait time."

Cause of slow "Initial Wait Time":
The slow performance is due to a well-known issue where, when a client account has high label numbers, the Avamar server User Accounting System can take a long time to query and fetch the metadata for previous backups that are taken by the client.

This is checked by running a command similar to the following on the Avamar server utility node. The command queries how many backups are saved under Avamar.
 
avmgr getb --path=/clients/sqlclient.customer.com
 
The command may take a long time to complete. Even >1-hr. Prefixing the avtar command with Linux 'time' command times it for you.

Why do the Avamar backup label numbers become so large in such a short space of time? The Avamar SQL plug-in creates one label per database per backup.

If the SQL server has 10 databases, we see 11 label numbers being produced per scheduled backup (1 per database plus one for the wrap-up backup). If 23 incremental backups run each day, in addition to one full backup, this produces 264 label numbers per day. It is easy to see how the label numbers can grow to become large. After 1 day, it equals 264 labels. After 1 week, it equals 1.848 labels. After 1 month, it equals 7.392 labels. After 1 year, it equals 88.704 labels, and so on. 

If the SQL server had 500 databases to be backed up, which is not uncommon, the situation can be far more problematic:
 
After 1 day = 12.024 labels
After 1 week = 84.168 labels
After 1 month = 336.672 labels
After 1 year = 4.040.064 labels

Also, users sometimes run backups with a frequency greater than one hour which can exacerbate the situation even further.

Cause of slow "Finalizing Wait Time":
The second cause is due to a software defect that is also related to the high label number and which has been resolved in Bug 274997.

Resolution

Resolution:
Apply hotfixes as shown below. If there are many clients, there is a server-side hotfix to improve performance where client backup label numbers are high. For more information, see KB article 65560: Avamar GSAN User Accounting system experiences poor performance where a client's backup label numbers are very high

Workarounds:
For versions which have not been addressed, consider the following workarounds to mitigate or reset the behavior.

Workaround #1:
In the Avamar Administrator UI, retire the SQL client and register it again. The label number for the client drops back to zero.  

This mitigates the issue until the label number rises again. Performance issues usually become apparent where the label number rises above 100,000.

Note: This workaround addresses both Wait Times in one go.

Workaround #2:
Alternatively, to mitigate the issue, we can pass the "--after" flag to the avtar.cmd in order to restrict how much backup history avtar searches for in order to fetch ONLY the metadata for recent backups.

The flag should be placed in the client's avtar.cmd file (if for a cluster, the cluster node conducting the backup). For example:
 
--after=2016-12-20
 
Since the --after flag is set with a static date, it 'ages' and occasionally need updating by the user.

Note: This workaround only addresses the first Wait Time.

For Avamar client version 7.4, engineering added the flag "--within" which is similar to the flag "--after" but it autocalculates a date that is based off the current date. This means that it is not necessary to "babysit" the backup to occasionally be updating the flag. Using "--within," we only search metadata for recent backups.

Additional Information

High backup label numbers can also have of a serious impact on restore performance.

It is possible to measure the effect that the "--after" flag has on a backup. For example:

Without --after:
 
avtar --showlog --hfsaddr="avaserver" --hfsport="27000" --id=MCUser --password=XXXXX --path="/clients/sqlclient.customer.com"
real    34m37.527s
user    0m0.860s
sys     0m0.204s
root@avaserver:~/#:

With --after
 
time avtar --showlog --hfsaddr="avaserver" --hfsport="27000" --id=MCUser --password=XXXXX --path="/clients/sqlclient.customer.com" --quiet --after="2016-11-22 00:00:00"
real    0m4.083s
user    0m0.128s
sys     0m0.020s
root@avaserver:~/#:

Article Properties


Affected Product

Avamar

Product

Avamar, Avamar Plug-in for SQL

Last Published Date

27 Aug 2021

Version

3

Article Type

Solution