Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

Article Number: 000157795


VPLEX: Single component failures in the fabric or array controllers may lead to ongoing performance data unavailability on hosts accessing storage through VPLEX

Summary: This article talks about how to mitigate issues related to a single component failure that may impact performance in a VPLEX environment.

Article Content


Instructions

Issue Summary
End users may experience severe impact on some, or all, hosts connected to VPLEX from issues such as slow drains, array target controller faults, CRC errors, switch ASIC faults, switch reboots, etc. The VPLEX back-end utilizes a round robin policy that may cause issues on one fabric to impact all host paths on that fabric (or may affect paths on the other fabric as well).
  
For switch and array teams   
If an end user is reporting wide spread impact as a result of a single component failure, slow drain, etc. check with the end user to see if VPLEX is in the environment. If VPLEX is in the environment, and the extent of the problem is known, request that the end user block the affected path(s) on the switch. If VPLEX is in the environment and the affected paths are not known, engage Dell EMC Customer Support, explain the issue, and mention this article.
 
For the VPLEX Team 
If there is an SR where the end user is reporting ongoing impact and it is suspected the cause is due to poorly performing back end paths, identify the poorly performing paths and block them in VPLEX. If the affected paths are not evident, engage a coach for assistance. Switch and array collaborations can be done once the impact has ended.
 
Background
VPLEX to Array I/O Flow
VPLEX operates much like a clustered host environment. Each director, which receives I/O from the host, is responsible for completing that I/O. Each director has multiple paths across both fabrics to each LUN. Each VPLEX director is responsible for balancing the I/O across all the available active paths.
 
VPLEX Fault Detection and Mitigation
The primary method VPLEX uses for detecting and mitigating path faults is to monitor the ratio of timeouts on each path. If 90 percent of the I/O times out in two consecutive 15 second periods, VPLEX will banish the affected path and no longer use it. VPLEX will then periodically probe the banished path and un-banish it if I/O is seen again successfully on this path.
 
How Problems Can Arise
Due to the high threshold for path banishing, frequent probing, and the low threshold for unbanishing the path, unhealthy paths may continue to be used by VPLEX. The result is that VPLEX may send a significant amount of I/O through poorly performing paths or paths which have experienced soft faults. This I/O either times out or takes an excessive amount of time to complete. The result is significantly elevated response times across all host paths. This may result in performance data unavailability for any or all hosts connected to the VPLEX.


Recommendation
Upgrade to VPLEX GeoSynchrony target code 6.2 P3 or later for improved relief/handling of the above conditions. Refer to release notes for more details about back-end path management functionality.

Article Properties


Affected Product

VPLEX Series

Product

VPLEX for All Flash, VPLEX Series, VPLEX VS2, VPLEX VS6

Last Published Date

20 Nov 2020

Version

2

Article Type

How To