Discussion:
RAID-5 swap all HDD question
(too old to reply)
d***@hotmail.com
2014-08-19 23:41:16 UTC
Permalink
We have a RAID-5 server with 3 HDD. It has been in continuous 24/7/365 operation for approx 7 years. We probably ought to exchange the HDDs after such a long period so are planning to swap them all out for new ones, and have devices of the same, type size, and even same maker.

Can this simply be done by one by one, over a period, exchanging each of the three existing HDD for new ones. My understanding is RAID-5 needs minimum 3 HDD to function but one drive can be missing at a time. So can we for example swap out the 1st drive, wait a period, swap the 2nd, wait, and swap the 3rd ?

Will that work ? If so what sort of period needs to be allowed for the new 1st HDD to establish itself before moving on to the 2nd, ditto 2nd to 3rd ?

If this method is not possible, what is the best method to swap out these HDD, bearing in mind taking it off line and/or powering the thing down is not an easy option for this particular device.

--
Nick
Don Kuenz
2014-08-22 16:17:19 UTC
Permalink
Post by d***@hotmail.com
We have a RAID-5 server with 3 HDD. It has been in continuous 24/7/365
operation for approx 7 years. We probably ought to exchange the HDDs
after such a long period so are planning to swap them all out for new
ones, and have devices of the same, type size, and even same maker.
Can this simply be done by one by one, over a period, exchanging each
of the three existing HDD for new ones. My understanding is RAID-5
needs minimum 3 HDD to function but one drive can be missing at a
time. So can we for example swap out the 1st drive, wait a period,
swap the 2nd, wait, and swap the 3rd ?
Will that work ? If so what sort of period needs to be allowed for the
new 1st HDD to establish itself before moving on to the 2nd, ditto 2nd
to 3rd ?
If this method is not possible, what is the best method to swap out
these HDD, bearing in mind taking it off line and/or powering the
thing down is not an easy option for this particular device.
FWIW my Intel (LSI) RAID includes software to monitor the health of the
arrays. Among other things, it shows me when a degraded drive is back
online.

You may want to re-think changing out all of your hard drives at the
same time. A mix of drives of different ages protects you from near
simultaneous failure of all of the drives from the same bad batch.

--
Don Kuenz
Dave Warren
2014-08-26 05:02:11 UTC
Permalink
In the last episode of
Post by d***@hotmail.com
We have a RAID-5 server with 3 HDD. It has been in continuous 24/7/365
operation for approx 7 years. We probably ought to exchange the HDDs
after such a long period so are planning to swap them all out for new
ones, and have devices of the same, type size, and even same maker.
Can this simply be done by one by one, over a period, exchanging
each of the three existing HDD for new ones. My understanding is RAID-5
needs minimum 3 HDD to function but one drive can be missing at a time.
So can we for example swap out the 1st drive, wait a period, swap the
2nd, wait, and swap the 3rd ?
Will that work ? If so what sort of period needs to be allowed for the
new 1st HDD to establish itself before moving on to the 2nd, ditto 2nd
to 3rd ?
If this method is not possible, what is the best method to swap out
these HDD, bearing in mind taking it off line and/or powering the thing
down is not an easy option for this particular device.
As with many things in life, there's no right or wrong answer. Given the
age, is the system even using modern drives, or is it using IDE drives?
If it's IDE or anything other than SATA or SAS, the answer is probably
to build out a new array on a new controller and migrate the data from
one array to another.

And while we're on the topic of controllers, are you prepared to replace
your 7-year old controller if it fails? Or would a controller failure
mean a 100% data loss scenario? Consider this question, if you do need
to buy a new controller, it might be smart to buy two, or use software
RAID-10 rather than hardware RAID-5, so that you're not dependent on any
particular piece of hardware.

As far as the minimum number of drives, this depends on your
configuration, but typically RAID-5 means that you have 'x' number of
drives plus one parity. You can lose any one drive and rebuild safely.
However, if a second drive fails during the rebuild, you're out of luck,
and so is all of your data until you have time to perform a rebuild from
backup. And your odds of a failure go up during a recovery since the
drive activity goes up from normal levels, rebuilding an array normally
maxes out the capacity of the controller or the drives (or both),
leading to more heat, leading to higher chances of a failure.

If your controller is on the higher end, it may be able to do live
migrations. With a smart enough controller, you could insert a new
drive, tell the controller you want to take an existing drive out of the
system and migrate to the new drive without actually failing a drive. If
the migration to the new drive fails, the existing drives can pick up
where it left off. Be aware that live migrations can take substantial
time, and will have a performance hit for the duration of the migration.
d***@hotmail.com
2014-08-26 10:37:25 UTC
Permalink
Post by Dave Warren
In the last episode of
Post by d***@hotmail.com
We have a RAID-5 server with 3 HDD. It has been in continuous 24/7/365
operation for approx 7 years. We probably ought to exchange the HDDs
after such a long period so are planning to swap them all out for new
ones, and have devices of the same, type size, and even same maker.
Can this simply be done by one by one, over a period, exchanging
each of the three existing HDD for new ones. My understanding is RAID-5
needs minimum 3 HDD to function but one drive can be missing at a time.
So can we for example swap out the 1st drive, wait a period, swap the
2nd, wait, and swap the 3rd ?
Will that work ? If so what sort of period needs to be allowed for the
new 1st HDD to establish itself before moving on to the 2nd, ditto 2nd
to 3rd ?
If this method is not possible, what is the best method to swap out
these HDD, bearing in mind taking it off line and/or powering the thing
down is not an easy option for this particular device.
As with many things in life, there's no right or wrong answer. Given the
age, is the system even using modern drives, or is it using IDE drives?
If it's IDE or anything other than SATA or SAS, the answer is probably
to build out a new array on a new controller and migrate the data from
one array to another.
And while we're on the topic of controllers, are you prepared to replace
your 7-year old controller if it fails? Or would a controller failure
mean a 100% data loss scenario? Consider this question, if you do need
to buy a new controller, it might be smart to buy two, or use software
RAID-10 rather than hardware RAID-5, so that you're not dependent on any
particular piece of hardware.
As far as the minimum number of drives, this depends on your
configuration, but typically RAID-5 means that you have 'x' number of
drives plus one parity. You can lose any one drive and rebuild safely.
However, if a second drive fails during the rebuild, you're out of luck,
and so is all of your data until you have time to perform a rebuild from
backup. And your odds of a failure go up during a recovery since the
drive activity goes up from normal levels, rebuilding an array normally
maxes out the capacity of the controller or the drives (or both),
leading to more heat, leading to higher chances of a failure.
If your controller is on the higher end, it may be able to do live
migrations. With a smart enough controller, you could insert a new
drive, tell the controller you want to take an existing drive out of the
system and migrate to the new drive without actually failing a drive. If
the migration to the new drive fails, the existing drives can pick up
where it left off. Be aware that live migrations can take substantial
time, and will have a performance hit for the duration of the migration.
Thanks

I'm not sure how we will proceed.

In a sort of cheickens and eggs scenario, not alloweed to make changes until the HDD are somehow replaced, but in some ways ought not touch HDDs until changes are made.

HDD are SCSI.

The server is actually dual redundant, there are two identical servers each with own RAID5, at any one time one is duty and one standby. We only get a single shot changeover on a fault though, have to manually bring back up to standby the server that incurred the fault.

Attempts to clone a server to an external device HDD have so far failed.

--
Nick
Jimmy Mac
2014-08-26 17:17:03 UTC
Permalink
Post by d***@hotmail.com
Thanks
I'm not sure how we will proceed.
In a sort of cheickens and eggs scenario, not alloweed to make changes
until the HDD are somehow replaced, but in some ways ought not touch
HDDs until changes are made.
HDD are SCSI.
The server is actually dual redundant, there are two identical servers
each with own RAID5, at any one time one is duty and one standby. We
only get a single shot changeover on a fault though, have to manually
bring back up to standby the server that incurred the fault.
Attempts to clone a server to an external device HDD have so far failed.
--
Nick
I have done this dozens of times and had no issues but before you begin,
you will want to be sure that the new drives are similar in speed to the
old ones. SCSI doesn't play quite as nicely as SAS so it's kind of
important.

Each time I've swapped drives, I begin with a FULL BACKUP of server. If
there are ANY services running that cannot be backed up while running,
stop them before running the backup.

Once the backup is complete, begin with one drive - ONLY one drive. It
rarely matters which but I begin with 0 and then move on. After
replacing the first drive, wait until the array has fully rebuilt before
swapping the second one etc... On older SCSI's a rebuild can take quite
some time. Even a few hours depending on the controller, drives and
speeds.

Hopefully you have hot swappable drives to minimize any issues related
to slight differences in the drive specs that may take a new one offline
during boot. I've seen a couple fail due to the need to power down
between swaps.

Even if you have Hot Swap, it's best to reboot after all drives have
been replaced and the RAID rebuild complete, just to be sure that all is
well. I did see a situation once on a much older system running NT
Server that didn't like the signature of the new drives but that was
easily resolved by editing the registry from a parallel install - again
that was Windows NT and I haven't seen it since.

Good luck!
Again... BACKUP!!! ;-)
d***@hotmail.com
2014-08-28 16:03:35 UTC
Permalink
Post by Jimmy Mac
Post by d***@hotmail.com
Thanks
I'm not sure how we will proceed.
In a sort of cheickens and eggs scenario, not alloweed to make changes
until the HDD are somehow replaced, but in some ways ought not touch
HDDs until changes are made.
HDD are SCSI.
The server is actually dual redundant, there are two identical servers
each with own RAID5, at any one time one is duty and one standby. We
only get a single shot changeover on a fault though, have to manually
bring back up to standby the server that incurred the fault.
Attempts to clone a server to an external device HDD have so far
failed.
--
Nick
I have done this dozens of times and had no issues but before you begin,
you will want to be sure that the new drives are similar in speed to the
old ones. SCSI doesn't play quite as nicely as SAS so it's kind of
important.
Each time I've swapped drives, I begin with a FULL BACKUP of server. If
there are ANY services running that cannot be backed up while running,
stop them before running the backup.
Once the backup is complete, begin with one drive - ONLY one drive. It
rarely matters which but I begin with 0 and then move on. After
replacing the first drive, wait until the array has fully rebuilt before
swapping the second one etc... On older SCSI's a rebuild can take quite
some time. Even a few hours depending on the controller, drives and
speeds.
Hopefully you have hot swappable drives to minimize any issues related
to slight differences in the drive specs that may take a new one offline
during boot. I've seen a couple fail due to the need to power down
between swaps.
Even if you have Hot Swap, it's best to reboot after all drives have
been replaced and the RAID rebuild complete, just to be sure that all is
well. I did see a situation once on a much older system running NT
Server that didn't like the signature of the new drives but that was
easily resolved by editing the registry from a parallel install - again
that was Windows NT and I haven't seen it since.
Good luck!
Again... BACKUP!!! ;-)
Thanks

Hot swappable yes

Rest of post noted

I'll let you know how we get on if I remember (or still have a job at the end of it)
Loading...