If you’re looking to set up a simple file server on Ubuntu, you can use software RAID to improve performance. Here’s how to do it. ..


Do you need a file server on the cheap that is easy to setup, “rock solid” reliable with Email Alerting? will show you how to use Ubuntu, software RAID and SaMBa to accomplish just that.

Overview

Despite the recent buzz to move everything to the “all mighty”cloud,  sometimes you may not want your information in someone else’s server or it just maybe unfeasible to download the volumes of data that you require from the internet every time (for example image deployment). So before you clear out a place in your budget for a storage solution, consider a configuration that is licensing free with Linux.

With that said, going cheap/free does not mean “throwing caution to the wind”, and to that end, we will note points to be aware of, configurations that should be set in place in addition to using software RAID, to achieve the maximum price to reliability ratio.

Image by Filomena Scalise

About software RAID

As the name implies, this is a RAID (Redundant Array of Inexpensive Disks) setup that is done completely in software instead of using a dedicated hardware card. The main advantage of such a thing is cost, as this dedicated card is an added premium to the base configuration of the system. The main disadvantages are basically performance and some reliability as such a card usually comes with it’s own RAM+CPU to perform the calculations required for the redundancy math, data caching for increased performance, and the optional backup battery that keeps unwritten operations in the cache until power has been restored in case of a power out.

With a software RAID setup your sacrificing some of the systems CPU performance in order to reduce total system cost, however with todays CPUs the overhead is relatively  negligible (especially if your going to mainly dedicate this server to be a “file server”). As far as disk performance go, there is a penalty… however I have never encountered a bottleneck from the disk subsystem from the server to note how profound it is. The Tom’s Hardware guide “Tom’s goes RAID5” is an oldie but a goody exhaustive article about the subject, which I personally use as reference, however take the benchmarks with a grain of salt as it is talking about windows implementation of software RAID (as with everything else, i’m sure Linux is much better :P).

Prerequisites

Patience young one, this is a long read. It is assumed you know what RAID is and what it is used for. This guide was written using Ubuntu server9. 10 x64, therefore it is assumed that you have a Debian based system to work with as well. You will see me use VIM as the editor program, this is just because I’m used to it… you may use any other editor that you’d like. The Ubuntu system I used for writing this guide, was installed on a disk-on-key. Doing so allowed me to use sda1 as part of the RAID array, so adjust accordingly to your setup. Depending on the type of RAID you want to create you will need at least two disks on your system and in this guide we are using 6 drives.

RELATED: Which Type of RAID Should You Use For Your Servers?

Choosing the disks that make the array

The first step in avoiding a trap is knowing of it’s existence (Thufir Hawat from Dune).

Choosing the disks is a vital step that should not be taken lightly, and you would be wise to capitalize on yours truly’s experience and heed this warning:

Do NOT use “consumer grade” drives to create your array, use “server grade” drives!!!!!!

Now i know what your thinking, didn’t we say we are going to go on the cheap? and yes we did, but, this is exactly one of the places where doing so is reckless and should be avoided. Despite of their attractive price, consumer grade hard drives are not designed to be used in a 24/7 “on” type of a use. Trust me, yours truly has tried this for you. At least four consumer grade drives in the 3 servers I have setup like this (due to budget constraints) failed after about 1.5 ~ 1.8 years from the server’s initial launch day. While there was no data loss, because the RAID did it’s job well and survived… moments like this shorten the life expectancy of the sysadmin, not to mention down time for the company for the server maintenance (something which may end up costing more then the higher grade drives).

Some may say that there is no difference in fail rate between the two types. That may be true, however despite these claims, server grade drives still have a higher level of S.M.A.R.T restrictions and QAing behind them (as can be observed by the fact that they are not released to the market as soon as the consumer drives are), so i still highly recommend that you fork out the extra $$$ for the upgrade.

Choosing the RAID level.

While I’m not going to go into all of the options available (this is very well documented in the RAID wikipedia entry), I do feel that it is noteworthy to say that you should always opt for at least RAID 6 or even higher (we will be using Linux RAID10). This is because when a disk fails, there is a higher chance of a neighboring disk failure and then you have a “two disk” failure on your hands. Moreover, if your going to use large drives, as larger disks have a higher data density on the platter’s surface, the chance for failure is higher. IMHO disks from 2T and beyond will always fall into this category, so be aware.

Let’s get cracking

Partitioning disks

While in Linux/GNU, we could use the entire block device for storage needs, we will use partitions because it makes it easier to use disk rescue tools in case the system has gone bonkers. We are using the “fdisk” program here, but if your going to use disks larger then 2T you are going to need to use a partitioning program that supports GPT partitioning like parted.

Note: I have observed that it is possible to make the array without changing the partition type, but because this is the way described all over the net I’m going to follow suit (again when using the entire block device this is unnecessary).

Once in fdisk the keystrokes are:

n          ; for a new partition enter p          ; for a primary partition enter 1          ; number of partition enter    ; accept the default enter    ; accept the default t          ; to change the type fd        ; sets the type to be “Linux raid auto detect” (83h) w         ; write changes to disk and exit

Rinse and repeat for all the disks that will be part of the array.

Creating a Linux RAID10 array

The advantage of using “Linux raid10” is that it knows how to take advantage of a non-even number of disks to boost performance and resiliency even further then the vanilla RAID10, in addition to the fact that when using it the “10” array can be created in one single step.

Create the array from the disks we have prepared in the last step by issuing:

Note: This is all just one line despite the fact that the representation breaks it into two.

Let’s break the parameters down:

“–chunk=256” –  The size of bytes the raid stripes are broken to, and this size is recommended for new/large disks (the 2T drives used to make this guide were without a doubt in that category). “–level=10” – Uses the Linux raid10 (if a traditional raid is required, for what ever reason, you would have to create two arrays and join them). “-p f2” – Uses the “far” rotation plan see note below for more info and “2” tells that the array will keep two copies of the data.

Note: We use the “far” plan because this causes the physical data layout on the disks to NOT be the same. This helps to overcome the situation where the hardware of one of the drives fails due to a manufacturing fault (and don’t think “this won’t happen to me” like yours truly did). Due to the fact that the two disks are of the same make and model, have been used in the same fashion and traditionally have been keeping the data on the same physical location… The risk exists that the drive holding the copy of the data has failed too or is close to and will not provide the required resiliency until a replacement disk arrives. The “far” plan makes the data distribution to a completely different physical location on the copy drives in addition to using disks that are not close to each other within the computer case. More information can be found here and in the links below.

Once the array has been created it will start its synchronization process. While you may wish to wait for traditions’ sake (as this may take a while), you can start using the array immediately.

The progress can be observed using:

Create the mdadm.conf Configuration File

Note: It has been said that: “Most distributions expect the mdadm.conf file in /etc/, not /etc/mdadm. I believe this is a “ubuntu-ism” to have it as /etc/mdadm/mdadm.conf”. Due to the fact that we are using Ubuntu here, we will just go with it.

IMPORTANT! you need to remove one “0” from the newly created file because the syntax resulting from the command above isn’t completely correct (GNU/Linux isn’t an OS yet).

If you want to see the problem that this wrong configuration causes, you can issue the “scan” command at this point, before making the adjustment:

To overcome this, edit the file /etc/mdadm/mdadm.conf and change:

To read:

Running the mdadm –examine –scan command now should return without an error.

Filesystem setup on the array

I used ext4 for this example because for me it just built upon the familiarity of the ext3 filesystem that came before it while providing promised better performance and features. I suggest taking the time to investigate what filesystem better suits your needs and a good start for that is our “Which Linux File System Should You Choose?” article.

Note: In this case i didn’t partition the resulting array because, i simply didn’t need it at the time, as the requesting party specifically requested at least 3.5T of continuous space. With that said, had i wanted to create partitions, i would have had to use a GPT partitioning capable utility like “parted”.

Mounting

Create the mount point:

Note: This can be any location, the above is only an example.

Because we are dealing with an “assembled device” we will not use the filesystem’s UUID that is on the device for mounting (as recommended for other types of devices in our “what is the linux fstab and how does it work” guide) as the system may actually see part of the filesystem on an individual disk and try to incorrectly mount it directly. to overcome this we want to explicitly wait for the device to be “assembled” before we try mounting it, and we will use the assembled array’s name (“md”) within fstab to accomplish this. Edit the fstab file:

And add to it this line:

Note: If you change the mount location or filesystem from the example, you will have to adjust the above accordingly.

Use mount with the automatic parameter (-a) to simulate a system boot, so you know that the configuration is working correctly and that the RAID device will be automatically mounted when the system restarts:

You should now be able to see the array mounted with the “mount” command with no parameters.

Email Alerts for the RAID Array

Unlike with hardware RAID arrays, with a software array there is no controller that would start beeping to let you know when something went wrong. Therefore the Email alerts are going to be our only way to know if something happened to one or more disks in the array, and thus making it the most important step.

Follow the “How To Setup Email Alerts on Linux Using Gmail or SMTP” guide and when done come back here to perform the RAID specific steps.

Confirm that mdadm can Email The command below, will tell mdadm to fire off just one email and close.

If successful you should be getting an Email, detailing the array’s condition.

Set the mdadm configuration to send an Email on startup While not an absolute must, it is nice to get an update from time to time from the machine to let us know that the email ability is still working and of the array’s condition. your probably not going to be overwhelmed by Emails as this setting only affects startups (which on servers there shouldn’t be many). Edit the mdadm configuration file:

Add the –test parameter to the DAEMON_OPTIONS section so that it would look like:

You may restart the machine just to make sure your “in the loop” but it isn’t a must.

Samba Configuration

Installing SaMBa on a Linux server enables it to act like a windows file server. So in order to get the data we are hosting on the Linux server available to windows clients, we will install and configure SaMBa. It’s funny to note that the package name of SaMBa is a pun on the Microsoft’s protocol used for file sharing called SMB (Service Message Block).

In this guide the server is used for testing purposes, so we will enable access to its share without requiring a password, you may want to dig a bit more into how to setup permissions once setup is complete.

Also it is recommended that you create a non-privileged user to be the owner of the files. In this example we use the “geek” user we have created for this task. Explanations on how to create a user and manage ownership and permissions can be found in our “Create a New User on Ubuntu Server 9.10” and “The Beginner’s Guide to Managing Users and Groups in Linux” guides.

Install Samba:

Edit the samba configuration file:

Add a share called “general” that will grant access to the mount point “/media/raid10/general” by appending the below to the file.

The settings above make the share addressable without a password to anyone and makes the default owner of the files the user “geek”.

For your reference, this smb.conf file was taken from a working server.

Restart the samba service for the settings to take affect:

Once done you can use the testparm command to see the settings applied to the samba server. that’s it, the server should now be, accessible from any windows box using:

Troubleshooting

When you need to troubleshoot a problem or a disk has failed in an array, I suggest referring to the mdadm cheat sheet (that’s what I do…).

In general you should remember that when a disk fails you need to “remove” it from the array, shutdown the machine, replace the failing drive with a replacement and then “add” the new drive to the array after you have created the appropriate disk layout (partitions) on it if necessary.

Once that’s done you may want to make sure that the array is rebuilding and watch the progress with:

Good luck! :)

References: mdadm cheat sheet RAID levels break down Linux RAID10 explained mdadm command man page mdadm configuration file man page Partition limitations explained

Using software RAID won’t cost much… Just your VOICE ;-)