查看: 4657|回复: 0

apg重建raid [复制链接]

gisjxy

军衔等级：

中士

注册：2006-9-22

发表于 2010-9-5 16:42:51 |显示全部楼层

Logical View
Address
Type
Manufacturer/Model
Capacity
Status
------------------------------------------------------------------------
---
d0b0t0d0
RAID 1 (Mirrored) DPT
RAID-1
17522MB
Drive
Failed

d0b1t0d0
Disk Drive (DASD) FUJITSU
MAP3367NP
0MB
Optimal

d0b0t0d0
Disk Drive (DASD) FUJITSU
MAP3367NP
0MB
Optimal
d0b0t1d0
RAID 1 (Mirrored) DPT
RAID-1
17522MB
Degraded

d0b0t3d0
Disk Drive (DASD) DPT
--UNKNOWN--
0MB
Missing

d0b0t1d0
Disk Drive (DASD) FUJITSU
MAP3367NP
0MB
Optimal
d0b0t2d0
RAID 1 (Mirrored) DPT
RAID-1
17522MB
Drive
Failed

d0b1t2d0
Disk Drive (DASD) FUJITSU
MAP3367NP
0MB
Optimal

d0b0t2d0
Disk Drive (DASD) FUJITSU
MAP3367NP
0MB
Optimal

Solution type:
Workaround

Solution Description:
The following workaround solved the problem properly:

1) Power cycle of node A performed by customer on site in order to see
the
missing data disk on node A.

2) Performed workaround below to deleting and recreation of the faulty
raid without
deleting data. Copying all data from disks on node B to node A

3) Wait until rebuilding of the disks finish to make sure that second
disk
on A was working fine.

Right now, the node is up un running with all the raids in optimal
state,
if this problem happens again often with the same data disk on node A,
you should think about replacing A node, but currently it is working
fine.

Workaround to deleting and recreation faulty raid without deleting data:

1. Collect information for further analysis.

Log the information below from both nodes of the AP and send the result
to the owner of this solution.

!AP Command:!
!hostname!
!prcstate!
!date/t!
!time/t!
!raidutil -L all!
!raidutil -K!
!raidutil -e soft d0!
!raidutil -e recov d0!
!raidutil -e nonrecov d0!
!raidutil -e status d0!
!mktr <YYMMDD>-<HHMM> -c!
2. Determine the source disk for the RAID re-create.

When the RAID is deleted and re-created a disk must be chosen as the
source of the data for the RAID.

Contact APG40 second line support for assistance.
3. Connect to the node which does not contain the source disk.

AP Command:
hostname
4. Shutdown the node.

AP Command:
prcboot -s
5. Connect to the node which contains the source disk.

AP Command:
hostname
6. Set the "Cluster Server" and ACS_PRC_ClusterControl startup type to
Disabled.

Do not disable the "Cluster Disk" device as this will prevent the RAID
from being deleted.

Windows 2003 Command:
sc config Clussvc start= Disabled
sc config ACS_PRC_ClusterControl start= Disabled

Windows NT Command:
echo REGEDIT4 > C:\TEMP\Cluster_Disabled.reg
echo. >> C:\TEMP\Cluster_Disabled.reg
echo [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusSvc] >>
C:\TEMP\Cluster_Disabled.reg
echo "Start"=dword:00000004 >> C:\TEMP\Cluster_Disabled.reg
echo. >> C:\TEMP\Cluster_Disabled.reg
echo
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ACS_PRC_ClusterCon
trol] >> C:\TEMP\Cluster_Disabled.reg
echo "Start"=dword:00000004 >> C:\TEMP\Cluster_Disabled.reg

type C:\TEMP\Cluster_Disabled.reg

regedit /s C:\TEMP\Cluster_Disabled.reg

del C:\TEMP\Cluster_Disabled.reg

Example:
C:\> echo REGEDIT4 > C:\TEMP\Cluster_Disabled.reg

C:\> echo. >> C:\TEMP\Cluster_Disabled.reg

C:\> echo [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusSvc]
>> C:\TEMP\Cluster_Disabled.reg

C:\> echo "Start"=dword:00000004 >> C:\TEMP\Cluster_Disabled.reg

C:\> echo. >> C:\TEMP\Cluster_Disabled.reg

C:\> echo
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ACS_PRC_ClusterCon
trol] >> C:\TEMP\Cluster_Disabled.reg

C:\> echo "Start"=dword:00000004 >> C:\TEMP\Cluster_Disabled.reg

C:\> type C:\TEMP\Cluster_Disabled.reg
REGEDIT4

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusSvc]
"Start"=dword:00000004

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ACS_PRC_ClusterCon
trol]
"Start"=dword:00000004

C:\> regedit /s C:\TEMP\ClusSvc_Disabled.reg

C:\> del C:\TEMP\ClusSvc_Disabled.reg
7. Set BIOS "Cluster Support" to Disabled (Off).

AP Command:
raidutil +cluster off
8. Reboot the node.

Do not use prcboot. The normal "prcboot" command sets the "Cluster
Server" startup type to automatic.

There may be no response from the terminal until the AP finishes
rebooting after the shutdown command is entered. This will take several
minutes.

Windows 2003 Command:
shutdown /f /r /t 0

Windows NT Command:
shutdown /f /r %COMPUTERNAME%
9. Check the size of the RAID.

Make a note of the size of the RAID that will be deleted and re-created.

Note: If the capacity of the disks are different then the size of the
RAID has to be set when it is re-created.

AP Command:
raidutil -L raid

Example where the RAID size is 17432:
C:\> raidutil -L raid
Address
Type
Manufacturer/Model
Capacity
Status
------------------------------------------------------------------------
---
d0b0t0d0
RAID 1 (Mirrored) DPT
RAID-1
17432MB
Optimal

d0b0t0d0
Disk Drive (DASD) FUJITSU
MAT3073NP
17522MB
Optimal

d0b1t0d0
Disk Drive (DASD) FUJITSU
MAH3182MP
17432MB
Optimal
d0b0t1d0
RAID 1 (Mirrored) DPT
RAID-1
17432MB
Optimal

d0b0t1d0
Disk Drive (DASD) FUJITSU
MAT3073NP
17522MB
Optimal

d0b1t1d0
Disk Drive (DASD) FUJITSU
MAH3182MP
17432MB
Optimal
d0b0t2d0
RAID 1 (Mirrored) DPT
RAID-1
17432MB
Failed

d0b0t2d0
Disk Drive (DASD) FUJITSU
MAT3073NP
17522MB
Failed
drive

d0b1t2d0
Disk Drive (DASD) FUJITSU
MAH3182MP
17432MB
Failed
drive
10. Delete the RAID.

Only delete the RAIDs that are Failed, Impacted or Dead.

If it is not possible to delete the RAID then follow the note "
Additional steps to delete the RAID" below and then continue with the
next step.

AP Command:
raidutil -D d0b0t<#>d0

Examples:
Delete RAID d0b0t0d0:
C:\> raidutil -D d0b0t0d0
d0b0t0d0

Delete RAID d0b0t1d0:
C:\> raidutil -D d0b0t1d0
d0b0t1d0

Delete RAID d0b0t2d0:
C:\> raidutil -D d0b0t2d0
d0b0t2d0
11. Check that the RAID has been deleted.

If the RAID has not been deleted then follow the note "Additional steps
to delete the RAID" below and then continue with the next step.

AP Command:
raidutil -L logical

Expected Printout:
Failure:Can't find component by address

12. Re-create the RAID.

The first disk specified after the "-g" parameter is used as the source
of the data when re-creating the RAID.

The "-s" parameter is only required if the size of the RAID has to be
set as described above. If the "-s" parameter is not specified then the
size of the RAID is set to the capacity of the first disk specified
after the "-g" parameter.

If it is not possible to re-create the RAID then replace the other,
faulty, node and repeat the procedure.
If a spare node is not immediately available then follow the note
"Disconnect SCSI cables" and continue with this procedure. This will
leave the RAID deleted and allow the AP to run as a single node. The
faulty node should be left shutdown as it will not be able to be active.
If this is done the RAID must be re-created when the faulty node is
replaced using the note "RAID re-create during node change" below.

AP Command:
raidutil -l 1 -g d0b0t<#>d0,d0b1t<#>d0 [-i -s <size>]

Examples:
Re-create RAID d0b0t0d0:
C:\> raidutil -l 1 -g d0b0t0d0,d0b1t0d0
Created:
RAID 1

Re-create RAID d0b0t1d0:
C:\> raidutil -l 1 -g d0b0t1d0,d0b1t1d0
Created:
RAID 1

Re-create RAID d0b0t2d0:
C:\> raidutil -l 1 -g d0b0t2d0,d0b1t2d0
Created:
RAID 1

Re-create RAID d0b0t0d0 with size 17432MB:
C:\> raidutil -l 1 -g d0b0t0d0,d0b1t0d0 -i -s 17432
Created:
RAID 1

Re-create RAID d0b0t1d0 with size 17432MB:
C:\> raidutil -l 1 -g d0b0t1d0,d0b1t1d0 -i -s 17432
Created:
RAID 1

Re-create RAID d0b0t2d0 with size 17432MB:
C:\> raidutil -l 1 -g d0b0t2d0,d0b1t2d0 -i -s 17432
Created:
RAID 1
13. Check that the RAID has been re-created.

If the RAID has not been re-created then contact the next level of
support.

AP Command:
raidutil -L logical

Example:
C:\> raidutil -L logical
Address
Type
Manufacturer/Model
Capacity
Status
------------------------------------------------------------------------
---
d0b0t0d0
RAID 1 (Mirrored) DPT
RAID-1
17522MB
Reconstruct 0%
d0b0t1d0
RAID 1 (Mirrored) DPT
RAID-1
17522MB
Reconstruct/Pending
d0b0t2d0
RAID 1 (Mirrored) DPT
RAID-1
17522MB
Reconstruct/Pending
14. Set BIOS "Cluster Support" to Enabled (On).

AP Command:
raidutil +cluster on
15. Set the "Cluster Server" and ACS_PRC_ClusterControl startup type to
Automatic and Manual respectively.

Note: In APZ 11.3 and later the ACS_PRC_ClusterControl service startup
type should be set to automatic. This will be done by prcboot in the
next step.

Windows 2003 Command:
sc config ClusSvc start= Auto
sc config ACS_PRC_ClusterControl start= Auto

Windows NT Command:
echo REGEDIT4 > C:\TEMP\Cluster_Enabled.reg
echo. >> C:\TEMP\Cluster_Enabled.reg
echo [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusSvc] >>
C:\TEMP\Cluster_Enabled.reg
echo "Start"=dword:00000002 >> C:\TEMP\Cluster_Enabled.reg
echo. >> C:\TEMP\Cluster_Enabled.reg
echo
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ACS_PRC_ClusterCon
trol] >> C:\TEMP\Cluster_Enabled.reg
echo "Start"=dword:00000003 >> C:\TEMP\Cluster_Enabled.reg

type C:\TEMP\Cluster_Enabled.reg

regedit /s C:\TEMP\Cluster_Enabled.reg

del C:\TEMP\Cluster_Enabled.reg

Example:
C:\> echo REGEDIT4 > C:\TEMP\Cluster_Enabled.reg

C:\> echo. >> C:\TEMP\Cluster_Enabled.reg

C:\> echo [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusSvc]
>> C:\TEMP\Cluster_Enabled.reg

C:\> echo "Start"=dword:00000003 >> C:\TEMP\Cluster_Enabled.reg

C:\> echo. >> C:\TEMP\Cluster_Enabled.reg

C:\> echo
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ACS_PRC_ClusterCon
trol] >> C:\TEMP\Cluster_Enabled.reg

C:\> echo "Start"=dword:00000002 >> C:\TEMP\Cluster_Enabled.reg

C:\> type C:\TEMP\Cluster_Enabled.reg
REGEDIT4

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusSvc]
"Start"=dword:00000002

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ACS_PRC_ClusterCon
trol]
"Start"=dword:00000003

C:\> regedit /s C:\TEMP\Cluster_Enabled.reg

C:\> del C:\TEMP\Cluster_Enabled.reg
16. Reboot the node.

AP Command:
prcboot
17. Check the status of the RAIDs.

The re-created RAIDs should have the status Reconstruct or
Reconstruct/Pending.

If the RAID status has returned to the status failed then replace the
other, faulty, node and repeat the procedure.
If a spare node is not immediately available then follow the note
"Disconnect SCSI cables" below and repeat this procedure. This will
leave the RAID deleted and allow the AP to run as a single node. The
faulty node should be left shutdown until a replacement is available.
If this is done the RAID must be re-created when the faulty node is
replaced using the note "RAID re-create during node change" below.

AP Command:
raidutil -L logical
18. Wait for all resources to come online.

Note: The resources owned by the shutdown node will not come online.

If the shutdown node is going to be replaced then the procecure is
complete.
19. Reboot the shutdown node.

Note: This step should not be performed if the RAID was not re-created
above and the faulty node should be left shutdown.

AP Command:
fcc_reset other
20. Perform a health check of the AP.

举报本楼

返回列表

手机版|C114 ( 沪ICP备12002291号-1 )|联系大家 |网站地图

GMT+8, 2024-11-16 00:36 , Processed in 0.520304 second(s), 15 queries , Gzip On.

Discuz Licensed

		自动登录	找回密码
密码			注册