Hardware vs. Software RAID in the Real World
Tags: hardware, linux, RAID, server, software, sysadmin
I’m a sysadmin by trade and as such I deal with RAID enabled servers on a daily basis. Today a server with a hardware RAID controller reported (when I say reported I actually mean lit a small red LED on the front of the machine) a bad disk, which is not uncommon. But when replacing the failed disk with a shiny new one suddenly both drives went red and the system crashed. Upon reboot the hardware controller says to me “sorry buddy, I don’t see any drives” and wouldn’t boot. Luckily I had a similar system sitting idle, so I tested the same disks in this server and they worked just fine. After cursing loudly at the RAID controller I started wondering if the pros of hardware raid really outweigh the cons for a general purpose, nothing special, 1 or 2U server that just needs disk mirroring.
Pros of Hardware RAID
Easy to set up – Most controllers have a menu driven wizard to guide you through building your array or even are automatically set up right out of the box.
Easy to replace disks – If a disk fails just pull it out and replace it with a new one.
Performance improvements (sometimes) – If you are running tremendously intense workloads or utilizing an underpowered CPU hardware raid can offer a performance improvement.
Cons of Hardware RAID
Proprietary – Minimal or complete lack of detailed hardware and software specifications.
On-Disk meta data can make it near impossible to recover data without a compatible RAID card – If your controller goes casters-up you’ll have to find a compatible model to replace it with, your disks won’t be useful without the controller. This is especially bad if working with a discontinued model that has failed after years of operation
Monitoring implementations are all over the road – Depending on the vendor and model the ability and interface to monitor the health and performance of your array varies greatly. Often you are tied to specific piece of software that the vendor provides.
Additional cost – Hardware RAID cards cost more than standard disk controllers. High end models can be very expensive.
Lack of standardization between controllers (configuration, management, etc)– The keystrokes and software that you use to manage and monitor one card likely won’t work on another.
Inflexible – Your ability to reshape, split and perform other maintenance on arrays varies tremendously with each card.
Pros of Software RAID
Hardware independent – RAID is implemented on the OS which keeps things consistent regardless of the hardware manufacturer.
Standardized RAID configuration (for each OS) – The management toolkit is OS specific rather than specific to each individual type of RAID card.
Standardized monitoring (for each OS) – Since the toolkit is consistent you can expect to monitor the health of each of your servers using the same methods.
Good performance – CPUs just keep getting faster and faster. Unless you are performing tremendous amounts of IO the extra cost just doesn’t seem worthwhile.
Very flexible – Software raid allows you to reconfigure your arrays in ways that I have not found possible with hardware controllers.
Cons of Software RAID
Need to learn the software RAID tool set for each OS. – These tools are often well documented but not as quick to get off the ground as their hardware counterparts.
Typically need to incorporate the RAID build into the OS install process. – Servers won’t come mirrored out of the box, you’ll need to make sure this happens yourself.
Slower performance than dedicated hardware – A high end dedicated RAID card will match or outperform software.
Additional load on CPU – RAID operations have to be calculated somewhere and in software this will run on your CPU instead of dedicated hardware. I have yet to see a real-world performance degradation introduced by this, however.
Disk replacement sometimes requires prep work-You typically should tell the software RAID system to stop using a disk before you yank it out of the system. I’ve seen systems panic when a failed disk was physically removed before being logically removed.
So what do I make of all this?
I can certainly understand the argument of simple deployment and having a vendor to blame. But at the end of the day if you loose your data then it’s gone. No vendor will be able to bring it back. And what about manageability after initial deployment?
It seems to me that if you care about the integrity of your data and do not need ultra-intense IO performance then software RAID is a good choice. You’ll get the same core features as hardware with a standard management and monitoring suite. Additionally you are no longer bound to a specific piece of hardware. So in the event of a disk controller failure you could simply replace the controller (or the whole server) and the disks will remain usable. I also find it desirable that software raid will allow you to define the raid configuration during the installation process which when coupled with kickstart or a similar automated build process will allow you to ensure proper raid configuration and monitoring as soon as the server is brought online.
Considering these benefits, and my sour experiences with hardware RAID, I plan to deploy software RAID on a much wider scale from this point forward. But I’m curious…
Have others struggled with similar problems? How many people are running software RAID in an enterprise setting? And how many people have been similarly burned by software RAID?