It's plugging hot!
After vlad died last week, I rebuilt him with a new hard drive in the main system RAID array. This drive was twice the size of the old one – 160GiB, not 80GiB – so I had a bunch of spare space not being used. Yesterday, I bought another 160GiB drive, and decided to test the whole SATA hotplug thing…
It works. Beautifully.
However, I wouldn’t recommend trying it without LVM on your side, and probably the RAID subsystem too. Here’s what I did.
First, check that I really do know what my RAID configuration is:
$ cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda2[0] sdb2[1] 78019584 blocks [2/2] [UU] unused devices:$ sudo mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Sun May 14 18:37:29 2006 Raid Level : raid1 Array Size : 78019584 (74.41 GiB 79.89 GB) Device Size : 78019584 (74.41 GiB 79.89 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sun Apr 20 18:47:27 2008 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 368229f2:0e38f898:b369f97a:73d67d7e Events : 0.14487488 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2
I have sda2 and sdb2 in the array, and it’s fully working (so even if I do pull the wrong drive out, I’ll still have my data safe). Then make sure that I know which drive is which:
$ cat /sys/block/sda/device/model ST380815AS $ cat /sys/block/sdb/device/model ST3160815AS
So, sdb is the new, larger drive and sda is the older, smaller one. The first job is to ensure that the old drive isn’t expected to be there. I suspect that I could have just yanked out the drive and let the RAID layer deal with the sudden non-existence of one of its drives, like it’s meant to do, but I was already about to do something that my 25+ years experience of computers said was dangerous, so I didn’t want to invite more disaster.
Therefore, I told the RAID system to drop the old drive from the array:
$ sudo mdadm --manage /dev/md0 --fail /dev/sda2
That gave me a rude email in my inbox telling me that I’d lost a drive in my RAID array.
Then… the moment of truth. I opened up the case, found the drive I wanted1, and pulled out the data cable. I got a bunch of scary-looking messages in the syslog. Everything still seemed to be working. No sparks. No blue smoke. Mahler 5 played on.
Something of an anticlimax, really.
After plugging in the new drive, I got another bunch of messages in the syslog telling me that the new drive was /dev/sdd. Still no blue smoke. Comets failed to pass overhead. No two-headed lambs were reported in the village outside the castle.
So now for putting it all back together. First, create some partitions on the new, virgin disk. (Aha… that’s where the blood came from):
$ sudo cfdisk /dev/sdd
Then, in quick succession, add the new partition to the RAID array, and remove the old drive completely:
$ sudo mdadm --manage /dev/md0 --add /dev/sdd2 $ sudo mdadm --manage /dev/md0 --remove /dev/.static/dev/sda2
Finally, wait until it’s rebuilt, and check that it’s all OK:
$ sudo mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Sun May 14 18:37:29 2006 Raid Level : raid1 Array Size : 78019584 (74.41 GiB 79.89 GB) Device Size : 78019584 (74.41 GiB 79.89 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sun Apr 20 18:47:27 2008 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 368229f2:0e38f898:b369f97a:73d67d7e Events : 0.14487488 Number Major Minor RaidDevice State 0 8 50 0 active sync /dev/sdd2 1 8 18 1 active sync /dev/sdb2
That looks like what I was expecting. The one remaining problem now is that it’s still only 74GiB in size, despite having considerably more than that in the underlying volumes. This calls for some enlargement. First, grow the RAID volume to the maximum size allowed by the partitions it’s sitting on:
$ sudo mdadm --grow /dev/md0 --size max
This starts a resync process:
$ cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdd2[0] sdb2[1] 156039232 blocks [2/2] [UU] [==========>..........] resync = 50.2% (78459776/156039232) finish=20.5min speed=62884K/sec unused devices:
Secondly, tell LVM that the physical volume that is contained in the RAID array should be made bigger:
$ sudo pvresize /dev/md0 Physical volume "/dev/md0" changed 1 physical volume(s) resized / 0 physical volume(s) not resized $ sudo vgdisplay primary --- Volume group --- VG Name primary System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 29 VG Access read/write VG Status resizable MAX LV 0 Cur LV 12 Open LV 9 Max PV 0 Cur PV 1 Act PV 1 VG Size 148.81 GB PE Size 4.00 MB Total PE 38095 Alloc PE / Size 17305 / 67.60 GB Free PE / Size 20790 / 81.21 GB VG UUID VlBSZF-p0DK-Gm7I-sZjE-LBA0-TW5q-EgGmQV
That’s up to the size I was expecting… I’d say that’s all done (once my RAID array finishes syncing in 15 minutes’ time or so). Total server downtime from all this: nil.
1. OK, I’ll admit it. I got the wrong drive. I pulled out the cable from the wrong drive. A few seconds later, after the audio buffer emptied, xmms stopped playing Mahler 5. However, plugging it back in and restarting the LVM volume group that that drive was in, plus NFS, got it all back.