Module 7: Server Cluster Maintenance and Troubleshooting v
Troubleshooting Cluster Service
The key point of this section is to give the students the tools and techniques
that are useful in reducing the time it takes to find a root cause for common
Cluster service problems.
• Troubleshooting Tools: The tools that are used to help troubleshoot a
problem with Cluster service are the same tools that are used to help
troubleshoot a server running Microsoft Windows
® 2000.
• Examining the Cluster Log: Cluster service logs every change
configuration and problem to the cluster log. It is important for the
students to become familiar with the syntax of the log.
• Troubleshooting Network Communications: Students need to know that
there are different troubleshooting paths to follow depending on whether
the network problem is a node-to-node or a client-to-node problem.
• SCSI Configuration Problems: SCSI is less reliable than Fibre. There
can be problems with the SCSI controller, SCSI termination, and SCSI
cabling.
• Group and Resource Failures: Remind students to keep dependency trees
vertical so that if a resource fails, it is easier to find a root cause as to
which resource is causing the failure of the group.
• Quorum Log Corruption: If Cluster service cannot write information to
the quorum log, it will not start. You can attempt to reset the quorum
log, or you can delete the quorum log and let Cluster service create a
new log.
vi Module 7: Server Cluster Maintenance and Troubleshooting
Instructor Setup for a Lab
Lab Strategy
This lab is designed to prepare the students to use Backup and Clusrest.exe to
perform the proper backup and restore procedures. Students will uninstall
Cluster service in preparation for the Network Load Balancing (NLB) portion
of the course. NLB and Cluster service cannot run on the same computer.
Lab A: Cluster Maintenance
To conduct this lab:
Read though the lab carefully, paying close attention to the instructions and
details.
Students will need the Clusrest utility from c:\moc\2087\labfiles\mscs
Students work in teams of two, grouped together by their shared bus.
Help the students determine whether they are Node A or Node B. In these
exercises each node performs a specific task in the backup and restoration
procedures. Both nodes will uninstall Cluster service.
Module 7: Server Cluster Maintenance and Troubleshooting 1
Overview
Cluster Maintenance
Troubleshooting Cluster Service
*****************************
ILLEGAL FOR NON-TRAINER USE******************************
Server cluster maintenance and troubleshooting are considered two separate
disciplines. Maintenance is continuous, whereas troubleshooting has a
beginning when the problem is discovered, and an end when the problem is
resolved. The two disciplines are complimentary, however. When every
troubleshooting procedure that you follow fails, you will need to rebuild the
cluster from a backup tape that was generated during a maintenance procedure.
After completing this module, you will be able to:
Perform the steps to successfully back up a server cluster.
Perform the steps to successfully restore a server cluster.
Evict a node from a server cluster.
Identify the tools that are necessary to troubleshoot a cluster failure.
Interpret the entries on the cluster log.
Identify and troubleshoot common server cluster failures: network
communications, small computer system interface (SCSI) configuration
problems, group, resource, and quorum failures.
Topic Objective
To provide an overview of
the module topics and
objectives.
Lead-in
In this module, we will cover
Cluster maintenance in the
form of backing up and
restoring a cluster, and
troubleshooting Cluster
service.
2 Module 7: Server Cluster Maintenance and Troubleshooting
Cluster Maintenance
Backup
Restoring the First Node
Restoring Cluster Disks
Restoring the Second Node
Evicting a Node
*****************************
ILLEGAL FOR NON-TRAINER USE******************************
Cluster service uses the self-tuning features of Microsoft
® Windows® 2000 and
requires very little maintenance. The only day-to-day maintenance operation
that you need to perform is to back up the cluster.
Under special circumstances, a node in the cluster may need to be replaced, for
example, when your organization decides to perform a hardware upgrade. In
this situation, you need to evict a node from the cluster and add the upgraded
node to the cluster.
Topic Objective
To introduce the
fundamental tasks for
maintaining a server cluster.
Lead-in
The only maintenance
performed on a cluster is
backing up and restoring
Cluster service.
Module 7: Server Cluster Maintenance and Troubleshooting 3
Backup
Backing Up the System State
Backing Up the Local Disk
Backing Up the Cluster Disk
*****************************
ILLEGAL FOR NON-TRAINER USE******************************
Backing up the cluster is no different from backing up Microsoft
Windows 2000 Advanced Server. It is recommended that you perform regular
backups by using the Windows 2000 Backup program (NTBackup), or other
compatible backup programs. Additional backup agents are still necessary to
back up applications running on the cluster, such as Microsoft SQL Server
™
and Microsoft Exchange.
A cluster-aware backup program will be able to perform the same backup
operations as NTBackup, especially with regard to backing up the System State
and the cluster configuration database.
Backing Up the System State
The configuration information for the cluster is located on the registry on each
node (HKEY_LOCAL_MACHINE\Cluster). The Backup tool that is included
with Windows 2000 backs up the cluster database when you back up each
node’s system state.
NTBackup backs up the system state on each node. The system state includes:
The quorum log.
The local registry.
The Cluster registry hive.
Topic Objective
To describe how to back up
the system state, node, and
cluster disks.
Lead-in
A backup of the cluster
includes the system state,
the node, and the cluster
disk.
Note
4 Module 7: Server Cluster Maintenance and Troubleshooting
Backing Up the Local Disk
Follow standard computer backup procedures to back up the operating system
and the data on the local drives. You must also back up key cluster files on the
local disks.
On each node, back up the cluster database files:
%systemroot%\cluster\CLUSDB
%systemroot%\cluster\CLSUDB.LOG
On each node, back up the clustering service:
%systemroot%\cluster\*.*
Backup is essential, but regular testing to make sure that backups and
restores actually work as expected is also necessary. A good practice is to
schedule test backup and restore operations frequently.
Backing Up the Cluster Disks
It is critical to back up cluster files on the quorum disk and data on the cluster
disks, because Cluster service will write information to files in the
\mscsdirectory on the quorum disk and cluster-aware applications will likely be
placing data on the cluster disk. Because either node of the cluster could own
the cluster disk resource at any time, it is possible for each node to back up the
data on the drive. However, having each node back up data would require you
to install backup hardware and software on each cluster node, which is not the
best solution.
One possibility is to identify a nonclustered server running Windows 2000
Server and schedule it to back up data remotely through a network connection
to the Cluster disk’s administrative share or a hidden share that you create. For
example, you might create FBackup$, GBackup$, HBackup$, and WBackup$
file share resources on the virtual server for the root of drives F, G, H, and W.
F, G, and H would be cluster disks with data, and W would be the drive letter
for the quorum disk. Hidden shares would not appear in a browse list and you
could configure them to allow access only to members of the Backup Operators
group.
Note
Module 7: Server Cluster Maintenance and Troubleshooting 5
Restoring the First Node
Steps For Restoring a Server Cluster:
1. Restore the first node
2. Restore the cluster disks
3. Restore the second node
4. Perform node testing
*****************************
ILLEGAL FOR NON-TRAINER USE******************************
The following sections describe the procedure for restoring a server cluster in
the event that both nodes and the cluster disk fail. It is possible that any one of
the components in the cluster could fail independently. In the case of a failed
component, you follow the same procedure for restoring that specific
component.
Performing a complete restore of a server cluster is a straightforward process.
1. Restore a node of the cluster.
2. Restore the cluster disks of the restored first node.
3. Restore the remaining node of the cluster.
4. Perform node testing.
Topic Objective
To list the steps for restoring
a server cluster and
describe how to restore the
first node.
Lead-in
In the event of a complete
cluster failure, you first
restore a node.
Delivery Tip
This page lists the four
steps that are involved in
restoring a complete cluster
and covers the first step,
Restoring a Node. Details
about the other three steps
follow on the next pages.
6 Module 7: Server Cluster Maintenance and Troubleshooting
Restoring a Node of the Cluster
To restore a node in a server cluster, you follow the same procedure that you
would use in restoring a Windows 2000 operating system.
1. Install a fresh copy of Windows 2000 Advanced Server on the node to be
restored.
2. Log on as Administrator and restore the system and boot partition, system
state, and associated volumes from the backup. Make sure that you select
the option to restore the system state to the original location in the backup
program.
3. Restart the node.
4. Perform the steps for restoring the cluster disk. These steps follow in the
next section.
The difference between the time of the backup and the time of the
restoration to the new computer may affect the computer account on the domain
controller. You may have to join a workgroup and then rejoin the domain.
Note
Module 7: Server Cluster Maintenance and Troubleshooting 7
Restoring Cluster Disks
Restoring Disk Signature Files
Restoring the Data on the Cluster Disk
Restoring the Cluster Configuration Files
*****************************
ILLEGAL FOR NON-TRAINER USE******************************
After you have restored a node in the cluster, you must restore the cluster disks.
Restoring the cluster disks involves restoring the disk signature file that the
cluster uses to identify the disk. You may also need to restore a cluster disk if
you are running out of disk space or if there is impending disk failure of a disk.
It can be costly to make mistakes while replacing a cluster disk; the
consequence can be the irrecoverable loss of all of the data on that disk. If the
disk is the quorum disk, the server cluster's configuration data is at risk.
Before restoring the cluster disks, stop Cluster service on all of the nodes of the
cluster. Stopping Cluster service will ensure that it will not attempt to start,
which would place a lock on the disks.
Restoring Disk Signature Files
Because Cluster service relies on disk signatures to identify and mount
volumes, if a disk is replaced, or if the bus is re-enumerated, Cluster service
will not find the disk signatures that it is expecting and will not function.
You can run Dumpcfg.exe to extract the disk signature from the registry and
write it to the new disk. Cluster service will recognize the new disk and
successfully start the resource.
The Dumpcfg.exe is a resource kit utility that restores an old disk
signature file to a new disk.
If the disk that you are replacing is the quorum disk, use Cluster Administrator
to move the quorum to a different disk, and proceed in the replacement of the
disk. After the disk is brought back online, you can move the quorum back to
the new disk.
Topic Objective
To describe how to restore
the cluster disk by restoring
signature files, data and
cluster configuration files.
Lead-in
Restoring a cluster disk
involves restoring the disk
signature file.
For Your Information
Be familiar with Q224075,
“Disk Replacement for
Windows 2000 Server
Cluster,” found on the
Student compact disk.
Note
8 Module 7: Server Cluster Maintenance and Troubleshooting
Restoring the Data on the Cluster Disk
Restoring the data on the cluster disk is the same as a restore of a local disk.
Before restoring the data, make sure that you have associated each cluster disk
to the same drive letter as before the disaster or failure. When restoring, make
sure that you restore the data to the original location and verify the integrity
after you have completed the restore.
Restoring the Cluster Configuration Files
The cluster configuration files include the cluster database and the quorum log.
The cluster database is the database or configuration data (cluster objects and
their settings) that are pertinent to the cluster. This database is the product of
the cluster registry key checkpoint and the changes that are recorded in the
quorum log. All of the nodes of the cluster hive maintain a local copy of this
database in the nodes local registry.
After you have restored the disk signature file and data, you can start the server
cluster. If the cluster files were not restored, or were corrupted, the following
procedure can restore the cluster database from the registry of the restored node.
Identify the node on which you will restore the database (in the case of a
disaster restore, this will be the first node that you have restored). Restore the
cluster database on the selected node by restoring the system state. Restoring
the system state creates a temporary folder under the %Systemroot%\Cluster
folder called Cluster_backup.
You use NTBackup to restore the cluster configuration files, which places them
on the node. You then restore the cluster database to the node’s registry by
using the Clusrest.exe tool. Clusrest.exe restores both the quorum log
(Quorum.log) file and the cluster database (Clusdb).
The Clusrest.exe tool is available in the Windows 2000 Resource Kit.
This tool is a free download from www.microsoft.com
Note
Không có nhận xét nào:
Đăng nhận xét