Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.
Some article numbers may have changed. If this isn't what you're looking for, try searching all articles. Search articles

Article Number: 000167989


VPLEX: VS6: How to interpret FC driver responses to Emulex chip errors in GeoSynchrony 6.1 and later

Summary: This article talks to an issue on the VS6 IO Module where a reset due to chip errors may cause a director to restart.

Article Content


Symptoms



The VS6 Hardware uses RainfallX SLICs for FC connectivity and the SLICs have Emulex Lancer Chips. Since the release of the VS6, there have been several reports of Emulex Lancer Chip errors. These issues can have several different underlying causes (HW issues, Lancer Chip bugs). This article is intended to give guidance on what actions to take if the VPLEX FC Driver reports an issue. This matter applies to any GeoSynchrony 6.0.x version and to the 6.1 code version.

Symptom:
The following call-home events were introduced in GeoSynchrony 6.1 and may be sent due to issues with Lancer Chip errors on VPLEX:

0x8a66901D FC/29 (FC_CHIP_RESET_NEEDED) - Chip error requires manual reset from DELL EMC Customer Service.
0x8a66901E FC/30 (FC_CHIP_UNRECOVERABLE_ERROR_DETECTED) - Unrecoverable chip error requires manual reset by Dell EMC Customer service
 
In 6.1 P2 and earlier code versions, rare circumstances the chip can become unresponsive, resulting in stuck IO (which is likely to result in Data Unavailable (DU)).  Manual intervention from DELL EMC Customer Service is required in this case.  Additional symptoms that would be seen in this case are streaming stdf/10 Abort Task events in the /var/log/VPlex/cli/firmware.log on the management-server, and possibly the stdf/53 event (call-home event 0x8a349035) indicating I/O is detected to be stuck on the system.
 
Note: In 6.1 P2 and earlier, if the Lancer Chip has produced a Chip dump, the director firmware will restart.  In addition to the above symptoms, this issue may be associated with a director death or STONITH (other directors attempt to restart the firmware of a director), which can be seen in the /var/log/VPlex/cli/firmware.log on the management-server in these firmware events seen in versions of GeoSynchrony code starting back with 5.2.x :

zpem/215 (0x8a4830d7) - The fault code on the director has changed
ecom/602 (0x8a4c925a) - GeoSynchrony has failed

Event that will be reported in the firmware logs:
nmg/58 (no call-home) - director STONITH
 
The VPLEX ports associated with the Lancer chip may not come back up following the firmware restart (that automatically occurs when the Lancer chip dump is detected), in which case a manual reset by DELL EMC Customer Service is needed.

Cause

The Lancer chip on a VS6 FC SLIC has crashed.  These issues can have several different underlying causes (HW issues, Lancer firmware issues). The system has detected this and is reporting the issue.

Notes:
  • The VS6 Midplane P/N 303-330-000B, has a known manufacturing flaw, which can lead to higher than normal parity errors, which can cause the Emulex chip to fail.
  • Several occurrences of Emulex chip failures have been associated with alpha particles (See "Notes" section below for more info on Alpha particles).

Resolution

Permanent Fix:
Fixes for Firmware driver issues are in GeoSynchrony 6.2 and later.
Newer Emulex Firmware (FW) has been added to GeoSynchrony 6.2 and later

If the VPLEX is running GeoSynchrony 6.2 and either of the above call-home events, mentioned in the Issue section, are received contact DELL EMC Customer Service immediately, and mention this article.

Several causes of Emulex chip faults are addressed with the new FW including alpha particle related chip failures. 
 
GeoSynchrony 6.1 Patch 1 and later code has been enhanced to:
  • Detect chip error where no chip dump is produced.
  • Perform chip level firmware resets to recover the Emulex chips in cases where recoverable errors are detected.
As of GeoSynchrony 6.2, VPLEX has added a chip monitor on its Front-End (FE) and Back-end (BE) ports.  In cases where the chip becomes unresponsive, VPLEX will gather a chip dump and perform a hard reset of the chip.  In these cases the director FW will reset which results in a director reset call home.  Collect diagnostics will gather all data required for data investigation.


NOTE: If data unavailable (DU) is ongoing in association with one of the call-home events listed in the Issues section then the VPLEX ports associated with the Lancer chip in question will not be able to be manually disabled in the VPlexcli.  The VPLEX Remote Support team needs to take other actions to manually reset the port(s) and end the DU. Reach out the Dell EMC Customer Service team should you receive any of the call home events and mention this article.

Additional Information

*Alpha particles and what it means with regards to electronic components.

The alpha particle is a helium nucleus; it consists of two protons and two neutrons. It contains no electrons to balance the two positively charged protons. Alpha particles are therefore positively charged particles moving at high speeds. These can produce what is known today as "soft errors".

The alpha particle emission issue was first observed in 1977. Package radioactive decay usually causes a soft error by alpha particle emission. The positive charged alpha particle travels through the semiconductor and disturbs the distribution of electrons there. If the disturbance is large enough, a digital signal can change from a 0 to a 1 or vice versa.

If detected, a soft error may be corrected by rewriting correct data in place of erroneous data. Highly reliable systems use error correction to correct soft errors on the fly. However, in many systems, it may be impossible to determine the correct data, or even to discover that an error is present at all. In addition, before the correction can occur, the system may have crashed, in which case the recovery procedure must include a reboot. Soft errors involve changes to data   the electrons in a storage circuit, for example   but not changes to the physical circuit itself, the atoms. If the data is rewritten, the circuit will work perfectly again.
 

Article Properties


Affected Product

VPLEX Series

Product

VPLEX Series, VPLEX VS6

Last Published Date

20 Nov 2020

Version

2

Article Type

Solution