Scientific Computing: October 2008

Monday, October 13, 2008

Linux: update drivers

For a updated driver to be corrected recognized in the next boot:

To save the current ramdiscimage and make a new one:
> mv /boot/initrd.img-`uname -r` /boot/initrd.img-`uname -r`.old
> mkinitramfs -o /boot/initrd.img-`uname -r`

Thursday, October 2, 2008

RAID configuration

I will introduce the different RAID configurations commonly used.

1) RAID 0: simply connect several drives together as one big drive.

Good things:

* cost no CPU, so the onboard(motherboard) RAID controller or soft RAID is adequate.

* simple configuration

* sometimes, for big files, the read/write speed is promoted as you are writing to several disks at the same time. In this sense, don't put two drives on a single IDE cable (master and slave) for IDE hard drives.

Bad thing:

* No redundancy. One drive goes bad, all things are lost. You are double, triple .... chances of losing data.

2) RAID 1: simply mirror of two drives. This is a way to protect your data while maybe costs a little machine efficiency.

3) RAID 1+0 (which is also called RAID 10): use several disks as RAID0 and the same number of mirror disks ( RAID 1). Taking the advantage of both speed and safety. But not efficient.

4) RAID 5: RAID 5 is the most popular RAID configuration to take the both advantage of date redundancy (by additional parity data) and the read/write speed by accessing many disks at the same time. A RAID 5 uses block-level striping with parity data distributed across all member disks.

* You will need at least three drives for RAID 5.

* You had better buy a decent RAID controller card (with an own CPU) rather than cheap controllers( 99% of motherboard controllers) because there are a lot of calculations which may instead cost your own CPU.

5) JBOD: JBOD stands for Just a Bunch Of Disks (Just a Box Of Drives). Some RAID controllers use JBOD to refer to configuring drives without RAID features. Each drive shows up separately in the OS. Concatenation or Spanning of disks is not one of the numbered RAID levels, but it is a popular method for combining multiple physical disk drives into a single virtual disk. It provides no data redundancy. As the name implies, disks are merely concatenated together, end to beginning, so they appear to be a single large disk. The good thing compared with RAID 0 is that you can combine different volumes of physical drives (120G, 300G ...) into a single logical drive. This is similar to Logical Volume Manager(LVM) in Linux.

6) Linux soft RAID compared with RAID controller.

It's almost the same unless RAID 5 will cost a little bit of CPU.

FAQ:

1) how to build a RAID 1 for an existing drive?

You will loose the data in existing disk. The proper procedure a) better back up the data first(actually recommended to all raid changes)

b) create the array first with the new drive, c) copy data back d) add the old drive as the missing drive .

in linux, use commands as

mdadm --create /dev/md0 --level 1 --raid-devices=2 missing /dev/sdb2

mdadm /dev/md0 -a /dev/sda2

2) add a new drive to RAID 5 (grow).

IN general, possible. But be sure to check the RAID controller manual to see whether it's supported. If supported, you need to add the drive to the existing RAID in the configuration menu(BIOS).

For soft-RAID, you can skip and boot Linux directly.

The general procedure is 1) to add the drive to the RAID array 2) to reshape the data distributed.

In Linux, use commands like

mdadm --add /dev/md1 /dev/sdx1

mdadm --grow /dev/md1 --raid-devices=n (n = new total numbers of physical drives).

Then wait hours for it to reshape.

3) recover the hard drive failure in RAID 5. If one drive fails, you still have chances to fix it. Normally it's a procedure by replacing the failed drive and rebuilding the data from other drivers. But be sure to read the hardware/software manual first.

In linux a) use # smartctl -cv /dev/hdx to check the failed drive, 2) shutdown the computer and replace

the failed drive. 3) reboot (very slow), partition the new drive as type fd (that is in hexadecimal, aka 0xfd), "Linux raid autodetect". 4) /sbin/mdadm /dev/md0 -a /dev/sdx1.

What is SELinux (Security-Enhanced Linux) ?

What is SELinux (Security-Enhanced Linux) ?
Wiki: Security-Enhanced Linux (SELinux) is a Linux feature that provides a variety of security policies, including U.S. Department of Defense style mandatory access controls, through the use of Linux Security Modules (LSM) in the Linux kernel. It is not a Linux distribution, but rather a set of modifications that can be applied to Unix-like operating systems, such as Linux and BSD. Its architecture strives to streamline the volume of software charged with security policy enforcement, which is closely aligned with the Trusted Computer System Evaluation Criteria (TCSEC, referred to as Orange Book) requirement for trusted computing base (TCB) minimization (applicable to evaluation classes B3 and A1) but is quite unrelated to the least privilege requirement (B2, B3, A1) as is often claimed.[citation needed] The germinal concepts underlying SELinux can be traced to several earlier projects by the U.S. National Security Agency.

More resources:

* NSA

* WIKI

* Fedora

I will take the example of Fedora 9 as an example to explain the settings for SELinux. More to come ....

SELinux and Intel Fortran/C compilers

I have met several problems with SELinux when using Intel Fortran/C compilers on EMT64 systems. For examples, one with the libifcore.so.5 conflict : http://physicscomputing.blogspot.com/2008/04/intel-fortran-compiler-selinux-and.html.

A recent problem is the program stopped after running for a couple of hours, only reporting page errors in /var/log/messages. The system sometimes crashed. After I disabled the selinux, everything is fine.

SELinux has restricted certain access to shared libraries. You can configure it to allow the access to shared libraries. In Fedora Core, you can bring up "system-config-security". If you feel that the computation stability is of the first priority or your computer is well protected(such as nodes in a cluster), you can simply disable SELinux.

Scientific Computing