Article Number: 000019983

OneFS: How to reset the CELOG database and clear events in OneFS 7.x

Summary: The Clusterwide Event Log (CELOG) provides a single location for logging events that happen on the cluster and provides a single point from which notifications about the events are generated, including sending alert emails and SNMP traps. Events are used to maintain a picture of cluster health for various cluster components. ...

This article may have been automatically translated. If you have any feedback regarding its quality, please let us know using the form at the bottom of this page.

Article Content

Instructions

BEFORE YOU START. This procedure is for clusters running OneFS 7.x., do not perform these steps on 8.x

In the case that an RCA is required, you will need to capture a full set of logs and investigate the CELOG problem fully.

Introduction

The Clusterwide Event Log (CELOG) provides a single location for logging events that happen on the cluster and provides a single point from which notifications about the events are generated, including sending alert emails and SNMP traps. Events are used to maintain a picture of cluster health for various cluster components.

This article describes how to manually reset and clear the CELOG database. It can be useful to do this if historical cluster events are not automatically being cleared by the system.

IMPORTANT!
Do not perform this procedure if your cluster is experiencing multiple repeating events regarding the same issue without confirming that the events are resolved.

Procedure

IMPORTANT!
This procedure does not work if your cluster is in SmartLock compliance mode. The compadmin user does not have privileged access to run the rm commands described in the procedure.

NOTE
If the /var partition is full, new CELOG database files cannot be created and the procedure below will fail. If you think this might be an issue, see the following article:

Event notification: The /var partition is near capacity (95% used), see LKB 000169344.

When the /var partition is no longer at or near capacity, perform the following procedure:

Open an SSH connection on any node in the cluster and log in using the "root" account.
To gather diagnostic information for Isilon Technical Support, run the following commands, in order, where <SR> is the Service Request number that is open for this issue, if there is one. If a service request is not open, you can use any other identifiable name, such as your company name, to identify the directory location of the saved files:

mkdir -p /ifs/.ifsvar/db/celog /ifs/data/Isilon_Support/<SR> /ifs/data/Isilon_Support/celog_backups
isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_monitor.core $(pgrep isi_celog_monitor)'

isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_coalescer.core $(pgrep isi_celog_coalescer)'

isi_for_array -sX 'gcore -c /ifs/data/Isilon_Support/<SR>/$(hostname)_$(date +"%Y-%m-%dT%H.%M.%S")_isi_celog_notification.core $(pgrep isi_celog_notifi)' ;sleep 120
Reset the CELOG database by running the following commands, in order. Alternatively, you can run the script listed below the commands.
1. Disable CELOG services by running the following three commands:
  
  isi services -a celog_coalescer disable
  isi services -a celog_monitor disable
  
  isi services -a celog_notification disable
2. Stop all CELOG processes that might be lingering on the cluster:
  
  isi_for_array -sX 'pkill isi_celog_'
3. Create a backup of the CELOG database:
  
  mv -vf /ifs/.ifsvar/db/celog/* /ifs/data/Isilon_Support/celog_backups/
4. Clear the CELOG database by running the following three commands:
  
  isi_for_array -sX 'rm -f /var/db/celog/*'
  
  isi_for_array -sX 'rm -f /var/db/celog_master/*.db'
  
  rm -f /ifs/.ifsvar/db/celog/*.db
5. Enable CELOG services by running the following three commands:
  
  isi services -a celog_coalescer enable
  
  isi services -a celog_monitor enable
  
  isi services -a celog_notification enable
Verify that the CELOG processes restarted:

isi_for_array -sX "pgrep celog | wc -l | sed 's/[^ 3]/FAIL/'"

The output should display a value of 3 for each node. If the output is anything other than 3 for each node, one or more of the CELOG processes did not start. Wait 120 seconds and try again. If one or more processes still do not start, contact Isilon Technical Support..
Send a test event to verify that CELOG is working properly:

isi events sendtest

This should generate a test event that will be listed in the output of the isi events command.
Run the isi events command and verify that the test event is listed. If not, wait 120 seconds and try again. If it is still not listed, contact Isilon Technical Support.
Gather cores and logs by running the following two commands, where <SR> is the Service Request number or other identifiable name:

isi_gather_info --local -f /ifs/data/Isilon_Support/<SR>
isi_gather_info

OneFS: How to reset the CELOG database and clear events in OneFS 7.x

Article Content

Instructions

Introduction

Procedure

Article Properties

Affected Product

Last Published Date

Version

Article Type

Welcome

Welcome to Dell

OneFS: How to reset the CELOG database and clear events in OneFS 7.x

Article Content

Instructions

Introduction

Procedure

Article Properties

Affected Product

Last Published Date

Version

Article Type