2012-12-02

System Reconfig

A year and a half ago, I bought a new system. Meet Virgil:



Virgil's main specs:

  • 2x Xeon E5645 (2.4GHz, 6 cores, 12MB L3, hyperthreading—apiece)
  • the legendary EVGA SR-2 motherboard
  • 24GB RAM
  • a pair of WD RE4 2TB hard drives
  • in a Lian Li PC-V2120 case
  • all kinds of I/O ports and a ho-hum-but-CUDA-capable GPU
The SR-2 motherboard is a non-standard size and will not fit in a normal ATX case. It necessitated the mondo case. At the time, the SR-2 was about the only desktop board that allowed for dual Xeons (otherwise you needed to buy a true server; blech). A year and a half after purchase, it is still far more computer than I need and I sometimes feel bad considering how much it idles. With its cores at only 2.4GHz, some laptops can beat Virgil at single-threaded tasks, but Virgil excels at parallel compiles, data processing, and the like. With Xeon fans, Virgil sounds like a red-lined Ferrari when its under load.

Operating Systems

I first started off running Ubuntu, which I mostly liked. However, the upgrade to 12.04 failed completely—Virgil could not boot, and I had no interest in trouble-shooting the failure. Additionally, I found Ubuntu's Unity interface a little tiresome. I probably appreciated it more than most Linux users, but it involved a bit more mousing around than I wanted.

A colleague is an expert Fedora user, so I installed Fedora. I figure it'd be worthwhile to learn an alternate approach and I could always piggyback off his knowledge. However, I hated Fedora. Gnome 3 is a disaster, Xfce felt like 1998 all over again, and Fedora demanded far too much sysadmin time from me. I didn't want to re-learn how to setup Samba configuration files again, as if it was still 2002; I wanted to click a checkbox and have my home folder shared so I could access it from VMs without any fuss. This was possible on Ubuntu... not so much on Fedora.

Additionally, I gradually realized that the Fedora installer had messed up my RAID. Under Ubuntu, I'd configured the two drives to act as an LVM RAID-1. When I installed Fedora, it could not recognize the partition, which was hugely annoying, but it did seem to recognize the underlying LVM RAID. I was wrong, though: it turned the RAID-1 into a spanned volume. I no longer had a redundant 2TB drive; I had a 4TB drive with twice the probability of failure as a single drive and no redeeming performance qualities.

Of course, by the time I realized this (disclosure: I am really bad at IT/sysadmin), I'd already accumulated a little over 2TB of data, thanks in large part of VirusShare. It would not be so simple to go back to the RAID-1.

Reconfig

So I bought another 2TB WD RE4 hard drive and a 256GB OCZ Vertex 4 SSD. I had two main goals: first, I wanted to install Mint as the main Linux system and I wanted to do it in such a way that I could trash it, install some other system, and be up and running again in a day or less, without too much time spent sysadmin'ing; second, I wanted a RAID for my main data, and I wanted the volume somewhat separate from /home so I could simply remount it as a subdirectory in my home directory under a new system, perhaps issue a recursive chown & chgrp, and be good to go. (Fedora and Ubuntu use different numbering schemes for UIDs, of course; it should be possible to preserve one's UID from system to system with a little bit of work, but the underlying theme of this blog post is how I am too lazy to do such things).

I first installed the SSD in the system and installed the newly released Mint 14 (with the Cinnamon UI) on a single / partition. I then backed up all of my data to an external 3TB drive, installed the 3rd WD drive, and blew my old data away.

Although the SR-2 board supports various RAID options in hardware, I decided I wanted a software RAID so I could survive a motherboard failure. Virgil has compute power to spare, and these are only 7200 RPM drives so I shouldn't suffer significant performance loss by going the software RAID route. I first constructed an unformatted partition on each of the drives, leaving a small buffer before and after in case I someday need to add another drive that differs slightly in its sector count. I followed the instructions to construct a RAID5 array from the three partitions, according the Linux RAID wiki:
mdadm --create --verbose /dev/md0 --level=5 --devices=3 /dev/sda1 /dev/sdc1 /dev/sdd1
mdadm --detail --scan >> /etc/mdadm/mdadm.conf
This created a 4TB RAID5 across the disks, with a default-selected chunk size of 512KB.

XFS

I narrowed my choices for the filesystem to EXT4 or XFS. Since I have a mix of very large files (evidence and other archives) and very small files (source code), I wanted an extent-based filesystem. Despite its advanced feature-set, reading up on BtrFS does not fill one with confidence about its reliability. Of course, one can find data loss stories about EXT4 and XFS as well. In the end I decided to go with XFS as it is an older design that's been strenuously tested on far more advanced RAID hardware, but it's still under active development and maintenance. This is an intuitive choice more than anything.

So I issued the command:
mkfs -t xfs /dev/md0
and immediately was presented with an error that the 512KB chunk size was too large for XFS. It seems that the default chunk size of 512KB is a recent change in the mdadm tools, likely geared towards EXT4. However, the maximum chunk size that XFS can handle is 256KB and XFS developers still recommend a 32KB chunk size in the general case. So, I deleted the RAID and recreated it:

mdadm --create --verbose /dev/md0 --level=5 --devices=3 /dev/sda1 /dev/sdc1 /dev/sdd1 
mdadm --detail --scan >> /etc/mdadm/mdadm.conf 
mkfs -t xfs /dev/md0
mount /dev/md0 /mnt/raid
mkdir /mnt/raid/jon/data
ln -s /mnt/raid/jon/data /home/jon/data 
I now had my XFS RAID5 and had it easily accessible inside my home directory, while still having the overall volume available for other uses.

One thing I noticed in reading about Linux RAIDs is that many people use LVM on top of the raw RAID device and then create different filesystems on top of LVM. If one were managing a system with a few large RAIDs for a large number of users, I could see how LVM would be useful. However, I cannot see any utility for it when you just want a single RAID volume. There's likely no performance loss to using LVM, but it adds another layer of complexity... and I am too lazy for unnecessary complexity.

The Disposable System

The RAID solves my data problem, but what about being able to get up and running with another system in the future? This enters into the realm of configuration management. There are two new configuration management systems that are taking the open source world by storm, Puppet and Chef. Both are implemented in Ruby and both seem like good pieces of software. They're both geared towards sysadmins who need to manage many systems. Essentially what both do is take configuration files specifying what packages, files, user accounts, and other configuration details should apply to a given system, and then they make the system conform to that description.

The main differentiator between Puppet and Chef is that Puppet has its own syntax for its config files whereas Chef's config files are valid Ruby source code. What I want to do is simply track the packages I install over time, so that I can get them reinstalled quickly if I need to. After looking at the syntax for each of Puppet and Chef, I decided that Puppet would be slightly more simple.

So, what I have been doing is building a directory of Puppet config files, each containing a list of software packages. The config files are roughly categorized, so I have one for basic C/C++ development, another for Python, one for system tools, one for GUI applications, etc. When I find myself in need of a package, I add it to the appropriate config file, and I can then bring my system into compliance with:
sudo puppet apply config-file.pp
Puppet figures out which, if any, packages are missing and then invokes apt-get to download and install them. If I switched back to Fedora, or to FreeBSD or some other crazy system, I would likely need to remap some of these package names slightly, but at least I'm tracking them. Additionally, I will sometimes create an EC2 instance to accomplish a specific task, and having categorized Puppet config files should help me bootstrap barebones instances in short order. I track my config files with git, natch, and mirror them on Github as I do with all my other source code.

Data has mass; I now have a RAID that should be able to survive a move to a different environment without having to make copies. And by using Puppet to reduce system management into, essentially, software development, I have reduced the mass of my operating system to almost nothing. All I need to do on a new system is get it to mount my RAID, drop my ssh keys, install Puppet, and then wait while all the rest of the software I depend on is downloaded and installed automatically.

etckeeper

Speaking of Git, another useful software package for anyone running Linux is etckeeper. etckeeper is a software package that interacts with your package management system and tracks changes to /etc. If you inadvertently screw up some aspect of your system via editing of various /etc config files, you can always go back in time to restore things. In this respect, it's sort of like System Restore, except that it works.
`

No comments:

Post a Comment