d0b1t2d0
Disk Drive (DASD) FUJITSU
MAP3367NP
0MB
Optimal
d0b0t2d0
Disk Drive (DASD) FUJITSU
MAP3367NP
0MB
Optimal
Solution type:
Workaround
Solution Description:
The following workaround solved the problem properly:
1) Power cycle of node A performed by customer on site in order to see
the
missing data disk on node A.
2) Performed workaround below to deleting and recreation of the faulty
raid without
deleting data. Copying all data from disks on node B to node A
3) Wait until rebuilding of the disks finish to make sure that second
disk
on A was working fine.
Right now, the node is up un running with all the raids in optimal
state,
if this problem happens again often with the same data disk on node A,
you should think about replacing A node, but currently it is working
fine.
Workaround to deleting and recreation faulty raid without deleting data:
1. Collect information for further analysis.
Log the information below from both nodes of the AP and send the result
to the owner of this solution.
!AP Command:!
!hostname!
!prcstate!
!date/t!
!time/t!
!raidutil -L all!
!raidutil -K!
!raidutil -e soft d0!
!raidutil -e recov d0!
!raidutil -e nonrecov d0!
!raidutil -e status d0!
!mktr <YYMMDD>-<HHMM> -c!
2. Determine the source disk for the RAID re-create.
When the RAID is deleted and re-created a disk must be chosen as the
source of the data for the RAID.
Contact APG40 second line support for assistance.
3. Connect to the node which does not contain the source disk.
AP Command:
hostname
4. Shutdown the node.
AP Command:
prcboot -s
5. Connect to the node which contains the source disk.
AP Command:
hostname
6. Set the "Cluster Server" and ACS_PRC_ClusterControl startup type to
Disabled.
Do not disable the "Cluster Disk" device as this will prevent the RAID
from being deleted.
C:\> del C:\TEMP\ClusSvc_Disabled.reg
7. Set BIOS "Cluster Support" to Disabled (Off).
AP Command:
raidutil +cluster off
8. Reboot the node.
Do not use prcboot. The normal "prcboot" command sets the "Cluster
Server" startup type to automatic.
There may be no response from the terminal until the AP finishes
rebooting after the shutdown command is entered. This will take several
minutes.
Windows 2003 Command:
shutdown /f /r /t 0
Windows NT Command:
shutdown /f /r %COMPUTERNAME%
9. Check the size of the RAID.
Make a note of the size of the RAID that will be deleted and re-created.
Note: If the capacity of the disks are different then the size of the
RAID has to be set when it is re-created.
AP Command:
raidutil -L raid
Example where the RAID size is 17432:
C:\> raidutil -L raid
Address
Type
Manufacturer/Model
Capacity
Status
------------------------------------------------------------------------
---
d0b0t0d0
RAID 1 (Mirrored) DPT
RAID-1
17432MB
Optimal
d0b0t0d0
Disk Drive (DASD) FUJITSU
MAT3073NP
17522MB
Optimal
Delete RAID d0b0t2d0:
C:\> raidutil -D d0b0t2d0
d0b0t2d0
11. Check that the RAID has been deleted.
If the RAID has not been deleted then follow the note "Additional steps
to delete the RAID" below and then continue with the next step.
AP Command:
raidutil -L logical
Expected Printout:
Failure:Can't find component by address
12. Re-create the RAID.
The first disk specified after the "-g" parameter is used as the source
of the data when re-creating the RAID.
The "-s" parameter is only required if the size of the RAID has to be
set as described above. If the "-s" parameter is not specified then the
size of the RAID is set to the capacity of the first disk specified
after the "-g" parameter.
If it is not possible to re-create the RAID then replace the other,
faulty, node and repeat the procedure.
If a spare node is not immediately available then follow the note
"Disconnect SCSI cables" and continue with this procedure. This will
leave the RAID deleted and allow the AP to run as a single node. The
faulty node should be left shutdown as it will not be able to be active.
If this is done the RAID must be re-created when the faulty node is
replaced using the note "RAID re-create during node change" below.
AP Command:
raidutil -l 1 -g d0b0t<#>d0,d0b1t<#>d0 [-i -s <size>]
C:\> del C:\TEMP\Cluster_Enabled.reg
16. Reboot the node.
AP Command:
prcboot
17. Check the status of the RAIDs.
The re-created RAIDs should have the status Reconstruct or
Reconstruct/Pending.
If the RAID status has returned to the status failed then replace the
other, faulty, node and repeat the procedure.
If a spare node is not immediately available then follow the note
"Disconnect SCSI cables" below and repeat this procedure. This will
leave the RAID deleted and allow the AP to run as a single node. The
faulty node should be left shutdown until a replacement is available.
If this is done the RAID must be re-created when the faulty node is
replaced using the note "RAID re-create during node change" below.
AP Command:
raidutil -L logical
18. Wait for all resources to come online.
Note: The resources owned by the shutdown node will not come online.
If the shutdown node is going to be replaced then the procecure is
complete.
19. Reboot the shutdown node.
Note: This step should not be performed if the RAID was not re-created
above and the faulty node should be left shutdown.
AP Command:
fcc_reset other
20. Perform a health check of the AP.