Wednesday, April 21, 2010

How to replace dead hard disk in Solaris using SVM/SDS

For this example, consider the following 2 disks

# format < /dev/null

Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@1f,4000/scsi@3/sd@0,0
1. c1t1d0
/pci@1f,4000/scsi@3/sd@1,0

Specify disk (enter its number):

# metastat -p
d300 -m d310 d320 1
d310 1 1 c3t1d0s0
d320 1 1 c3t2d0s0
d3 -m d13 d23 1
d13 1 1 c1t0d0s3
d23 1 1 c1t1d0s3
d5 -m d15 d25 1
d15 1 1 c1t0d0s5
d25 1 1 c1t1d0s5
d1 -m d11 d21 1
d11 1 1 c1t0d0s1
d21 1 1 c1t1d0s1
d0 -m d10 d20 1
d10 1 1 c1t0d0s0
d20 1 1 c1t1d0s0

The failed disk for this example will be c1t1d0 which is a mirrored copy of c1t0d0. The first thing we need to do is determine if this disk had any metadb replicas on it:

# metadb -i

flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s6
a p luo 8208 8192 /dev/dsk/c1t0d0s6
a p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16 8192 /dev/dsk/c1t1d0s6
a p luo 8208 8192 /dev/dsk/c1t1d0s6
a p luo 16 8192 /dev/dsk/c1t1d0s7
a p luo 8208 8192 /dev/dsk/c1t1d0s7

There are 2 metadb’s on the failed disk slices s6 & s7. We can delete these records.

# metadb -d /dev/dsk/c1t1d0s6 /dev/dsk/c1t1d0s7

The metadb records should now be deleted off the failed disk.(this is not actually deleting them off the disk - the disk is busted - its just saying they dont live there anymore)

# metadb -i
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s6
a p luo 8208 8192 /dev/dsk/c1t0d0s6
a p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7

Next we need to unconfigure the device to do this run a ‘cfgadm –al’

# cfgadm -al | grep c1t1d0
c1::dsk/c1t1d0 disk connected configured unknown

# cfgadm -c unconfigure c1::dsk/c1t1d0


The disk should now be unconfigured from the system

# cfgadm -al | grep c1t1d0

c1::dsk/c1t1d0 disk connected unconfigured unknown

You can go ahead with the physical replace of the drive.

Now, we will need to configure the drive.

# cfgadm -c configure c1::dsk/c1t1d0

To check it has been configured run:

# cfgadm -al | grep c1t1d0
c1::dsk/c1t1d0 disk connected configured unknown

now copy the partition table (VTOC) across from the working disk to the new disk. Prvtvtoc will print the VTOC to fmthard that will format the new disk

#prtvtoc /dev/rdsk/c1t0d0s2 |fmthard -s - /dev/rdsk/c1t1d0s2

The next step is to run metareplace on all the meta devices which lay on the dead disk c1t1d0. To identify these you can run ‘metastat -p’ again

# metastat -p
d300 -m d310 d320 1
d310 1 1 c3t1d0s0
d320 1 1 c3t2d0s0
d3 -m d13 d23 1
d13 1 1 c1t0d0s3
d23 1 1 c1t1d0s3 <-----HERE
d5 -m d15 d25 1
d15 1 1 c1t0d0s5
d25 1 1 c1t1d0s5 <-----HERE
d1 -m d11 d21 1
d11 1 1 c1t0d0s1
d21 1 1 c1t1d0s1 <-----HERE
d0 -m d10 d20 1
d10 1 1 c1t0d0s0
d20 1 1 c1t1d0s0 <----and HERE

As you can see from the above, c1t1d0 had mirrors in d0, d1, d3 and d5.

Do the following:

# metareplace -e d0 c1t1d0s0
# metareplace -e d1 c1t1d0s1
# metareplace -e d3 c1t1d0s3
# metareplace -e d5 c1t1d0s5


The disks should now start resyncronising, when this is complete all of the devices will be in the OK state.

#while :; do metastat -p | grep -i stale; sleep 5; done

No comments:

Post a Comment