It's plugging hot!

Sun, Apr 20, 2008

After vlad died last week, I rebuilt him with a new hard drive in the main system RAID array. This drive was twice the size of the old one – 160GiB, not 80GiB – so I had a bunch of spare space not being used. Yesterday, I bought another 160GiB drive, and decided to test the whole SATA hotplug thing…

It works. Beautifully.

However, I wouldn’t recommend trying it without LVM on your side, and probably the RAID subsystem too. Here’s what I did.

First, check that I really do know what my RAID configuration is:

I have sda2 and sdb2 in the array, and it’s fully working (so even if I do pull the wrong drive out, I’ll still have my data safe). Then make sure that I know which drive is which:

So, sdb is the new, larger drive and sda is the older, smaller one. The first job is to ensure that the old drive isn’t expected to be there. I suspect that I could have just yanked out the drive and let the RAID layer deal with the sudden non-existence of one of its drives, like it’s meant to do, but I was already about to do something that my 25+ years experience of computers said was dangerous, so I didn’t want to invite more disaster.

Therefore, I told the RAID system to drop the old drive from the array:

That gave me a rude email in my inbox telling me that I’d lost a drive in my RAID array.

Then… the moment of truth. I opened up the case, found the drive I wanted1, and pulled out the data cable. I got a bunch of scary-looking messages in the syslog. Everything still seemed to be working. No sparks. No blue smoke. Mahler 5 played on.

Something of an anticlimax, really.

After plugging in the new drive, I got another bunch of messages in the syslog telling me that the new drive was /dev/sdd. Still no blue smoke. Comets failed to pass overhead. No two-headed lambs were reported in the village outside the castle.

So now for putting it all back together. First, create some partitions on the new, virgin disk. (Aha… that’s where the blood came from):

Then, in quick succession, add the new partition to the RAID array, and remove the old drive completely:

Finally, wait until it’s rebuilt, and check that it’s all OK:

That looks like what I was expecting. The one remaining problem now is that it’s still only 74GiB in size, despite having considerably more than that in the underlying volumes. This calls for some enlargement. First, grow the RAID volume to the maximum size allowed by the partitions it’s sitting on:

This starts a resync process:

Secondly, tell LVM that the physical volume that is contained in the RAID array should be made bigger:

That’s up to the size I was expecting… I’d say that’s all done (once my RAID array finishes syncing in 15 minutes’ time or so). Total server downtime from all this: nil.

1. OK, I’ll admit it. I got the wrong drive. I pulled out the cable from the wrong drive. A few seconds later, after the audio buffer emptied, xmms stopped playing Mahler 5. However, plugging it back in and restarting the LVM volume group that that drive was in, plus NFS, got it all back.