HIDI: How I Did It.
A collection of examples for how I managed to accomplish a few random things in the world of computing. I don't promise that "How I Did It" is necessarily the correct way to get things done, and I'm always open to suggestions for improvements, but this seemed like a fairly decent way of doing documentation.
It isn't a normal day at work when you discover that the only way to get an application to run is to manually invoke an alternate loader.
So for a bit of background, one of my colleagues uses a proprietary library from a vendor for part of his scientific computing. This is normally not a big deal, but due to differences in system upgrade cycles, a few versioning issues have cropped up. The first, and most simple, was that the vendor is using gfortran version 4.4. At work, most of our systems use RedHat Enterprise Linux 4, which is a bit old, and doesn't ship with gfortran-4.4. A simple download and build and we're all set. We can now link with this library and run code.
My colleague found a bug in the library, and submitted a bug report. The
vendor fixed the bug and sent back a new shared library. However, in the time
between sending us the original library, and building the new library, the
vendor had upgraded their systems. This new .so (the shared library)
requires GLIBC 2.5 or greater. RHEL4 ships with GLIBC 2.3. For whatever
reason, the vendor is unwilling to ship us a .so that links against
libc-2.3.so, nor are they willing to ship us a static library (.a). Had
they shipped us a static library, we could just link statically on a RHEL5
machine, and run on a RHEL4 box without issue. By requiring a dynamic
library, we have to build a dynamic executable, so we depend a bit more on our
running system.
At this point, I was asked if there was anything I could do. Now, this seems like it should be fairly straightforward. I mean, in the end, its just hunks of executable code. The trick is getting libraries to merge right.
Step 1: Find a RHEL5 box. Luckily, we happen to have a machine in our lab running RHEL5 that can be used as a compile system, but it isn't sufficient for running actual jobs.
Step 2: Compile test program with gfortran-4.4 and link with vendor's
library. To minimize the number of extraneous libraries, include
-static-libgcc -static-libgfortran on the link line.
Step 3: Copy the binary over to our RHEL4 system and try running it:
rhel4> ./a.out
./a.out: error while loading shared libraries: ./libVENDOR.so: requires glibc 2.5 or later dynamic loader.
rhel4>
Step 4: OK, so we NEED the GLIBC 2.5 loader to get this library into memory
for some reason. Luckily, we just happen to know that the loader is
/lib/ld-linux.so.2. Follow the sym-links, and see that the loader is
/lib/lib-2.5.so. Contrary to the .so extension, this is an executable
that can be used to load subsequent executables. Copy over the loader from
the RHEL5 box and try using it to launch our executable.
rhel4> ./ld-2.5.so ./a.out
./a.out: relocation error: /lib/tls/libc.so.6: symbol _dl_out_of_memory, version GLIBC_PRIVATE not defined in file ld-linux.so.2 with link time reference
rhel4>
Well, shoot. Looks like we need to bring over the libc from our RHEL5 box. While we're at it, let's pull over ALL the libraries we need.
Step 5: Determine which libraries are used.
rhel5> ldd a.out
linux-gate.so.1 => (0xffffe000)
libVENDOR.so => ./libVENDOR.so (0xec983000)
libm.so.6 => /lib/libm.so.6 (0x00b42000)
libc.so.6 => /lib/libc.so.6 (0x0093d000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00a3200)
/lib/ld-linux.so.2 (0x0091f000)
rhel5>
Step 6: Now, with some work, you might be able to convince the compiler and
linker to statically link libm, libc and libpthread. For now, let's
just copy those shared libs to our RHEL4 system.
Step 7: Attempt to run the binary on the RHEL4 system
rhel4> ./a.out
./a.out: relocation error: /lib/tls/libc.so.6: symbol _dl_out_of_memory, version GLIBC_PRIVATE not defined in file ld-linux.so.2 with link time reference
rhel4>
Step 8: Of course! We need to force the use of these libraries. The
environment variable LD_LIBRARY_PATH is referenced AFTER the system
library paths. Let's use LD_PRELOAD to force these three system libraries
to the versions we pulled over.
rhel4> env LD_PRELOAD="./libc-2.5.so ./libm-2.5.so ./libpthread-2.5.so" ./ld-2.5.so ./a.out
Hello, we have success!
rhel4>
And there you have it! A dynamic executable avoiding all base-system supplied
support (loader, system libraries, etc). It is as if it were a statically
linked program. Now, an exercise left to the reader is to figure out how to
get gfortran to statically link libc, libm and libpthread in a dynamic
executable so we can avoid the LD_PRELOAD environment variable.
Recently, the forkit.org server started running out of space on /home and
/var. This isn't really a problem, for we use LVM, the Logical Volume
Manager on marvin. LVM allows us to dynamically change the size of
partitions. We currently have 2 147GB SCSI Ultra320 disks in a RAID1
configuration (mirrored). 15GB of that 147GB is unallocated. I wanted to add
10GB to /home and 2GB to /var. Given our historical usage patterns, this
should allow us to live for another year or so before we need to look into
more disk space, and leaves us 3GB for "emergency" expansions.
So, marvin is a production server. Obviously, since it is a hobby-machine,
there are now SLAs or such, but a fair number of people depend on it for
webhosting and email and the like. We try to minimize downtime as much as
possible. I initially thought that I would be able to do the partition
resizing without any downtime. The ext3 filesystem (which is what we use)
supports online resizing, which should mean that I can resize the partition
(using LVM), and the filesystem (using resize2fs) without needing to
unmount on reboot the server. A Good Thing!
The typical mechanism to add 2GB to the /var partition would be:
# lvresize -L +2G /dev/mapper/vg-lv_var
# resize2fs /dev/mapper/vg-lv_var
I was able to do the lvresize portion - growing the partition.
However, when I attempted to resize the filesystem, resize2fs complained:
# resize2fs /dev/mapper/vg-lv_var
resize2fs 1.40-WIP (14-Nov-2006)
Filesystem at /dev/mapper/vg-lv_var is mounted on /var; on-line resizing required
resize2fs: Filesystem does not support online resizing
Well, after a bit of Googling, it appears that our ext3 partition was
created long enough ago that the default mkfs configuration did not
include support for online resizing. Specifically, the resize_inode option
was not set. Well, bummer. There is apparently a script out there that will
set that option, but requires that the filesystem be offline when running.
That doesn't really help us here, as the point is to avoid taking /var
offline, which is a somewhat difficult thing to do on a running system, and
even more difficult to do on a system that is remotely hosted.
So what is the solution? Well, as I just alluded to, we'll need to unmount
/home and /var. /home can be done while the system is up, we just
need to be careful to stop services like mail delivery and cron. /var is
much more difficult to unmount. Lots of things need /var to run. You are
much better served by booting into Single User Mode for doing work on the
/var partition. Single User Mode isn't really a great option for us, as
marvin is hosted in Louisville, KY, and I'm in Schenectady, NY. Luckily,
Jeff (another marvin owner/operator) lives in Louisville, and can go in to
get physical access if needed. This is our fallback.
The actual solution is to reboot marvin, force a fsck and insert the resizing commands into the boot sequence before non-root partitions are mounted.
To force the fsck is simple. First, create a file '/forcefsck'. This
tells the boot system (on Debian systems at least) that a filesystem check
should be run during the next boot. By default, this check won't fix errors
automatically. This is fine for systems where you can sit there during boot,
but colocated systems without network consoles like marvin should probably
be set to automatically fix the errors. Do this by setting FSCKFIX=yes in
/etc/rcS.d/S30checkfs.sh.
The next part, resizing the filesystems, requires writing a script to do the
resize, and scheduling the script to run after the check of the filesystems,
and before the mounting. This can be done by making a script called
/etc/rcS.d/S32resize-disks. The scripts in /etc/rcS.d/ are run before the
runlevel-dependent scripts, and basically initialize the system. The
filesystems will be checked at any number after '30', and they are mounted by
script S35mountall.sh. So using the number '32' insures the resize will
happen after the check, and before the mounting.
> cat S32resize-disks
#!/bin/sh
PATH=/sbin:/bin
LOGFILE=/var/log/fsck/resize
logsave -as $LOGFILE resize2fs /dev/mapper/vg0-lv_var
logsave -as $LOGFILE resize2fs /dev/mapper/vg0-lv_home
The logsave prefix to the resize2fs command causes the output to be
saved to the specified logfile, and stored in memory until /var is mounted.
Now, just reboot and hope! It took marvin approximately 30 minutes to do
the reboot, fsck and resize. After boot, just remove the S32resize-disks
script (don't need it to run on subsequent reboots). The /forcefsck file
should have been removed automatically.
Enjoy your new space!
At one point in the life of my web server, .php files were not being sent
with an appropriate MIME type. I think they were being sent as text/plain,
which caused some web browsers to display the HTML rather than rendering it.
As a kludge to get around this (rather than have the web admin change the
server config), I added to my start_header() function:
header("Content-Type: application/xhtml+xml; charset=utf-8");
and all was well.
Or so I thought.
As it turns out, I don't really test my web pages on Internet Explorer. I figure that if they work in Firefox and Safari and the w3c validator say that my page is fine, then I'm good. Even if the page doesn't look right in IE, the content should be there.
Well, I was wrong, kinda.
Apparently, IE (even through IE 7) doesn't understand the media-type
application/xhtml+xml and so prompts the user to just download and save the
web page. sigh I'm not real sure when I made that change, but I'm
guessing over a year ago, and I just discovered the problem today. Whoops.
Reading the w3's XHTML Media
Types web page gives a
bit of insight, in that you should only send application/xhtml+xml if the
web client (browser) explicitly claims to support that type, otherwise send
text/html.
I should have known better than to force the type of the document that I'm
sending, I guess. Oh well, removed the header call from my PHP, the web
server has been fixed at some time in the distant past, and now IE users can
see my web pages.
Hrm, I wonder if that's a good thing. 
It turns out that the Biostar Small-Form-Factor PC that I use as a NAS doesn't
have gigabit ethernet onboard. The onboard NIC is some RealTek 100Mbit
thing. To match with my new gigabit switch, I bought an Intel 82541PI PCI
Gigabit NIC to upgrade the NAS to Gigabit. I pop the card in, boot up the
NAS, expecting the new Intel card to be eth1, but...
> /sbin/ifconfig eth1
eth1: error fetching interface information: Device not found
Um, not good. I check dmesg, and see:
Intel(R) PRO/1000 Network Driver - version 8.0.9-NAPI
Copyright (c) 1999-2008 Intel Corporation.
ACPI: PCI Interrupt 0000:00:08.0[A] -> GSI 16 (level, low) -> IRQ 16
e1000: 0000:00:08.0: e1000_probe: (PCI:33MHz:32-bit) 00:1b:21:2e:3b:64
8139too Fast Ethernet driver 0.9.28
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt 0000:00:0d.0[A] -> GSI 19 (level, low) -> IRQ 17
eth1: RealTek RTL8139 at 0xd000, 00:e0:4c:ee:6e:71, IRQ 17
eth1: Identified 8139 chip type 'RTL-8100B/8139D'
This is weird. The kernel comes up and sees the Intel card as eth0, and
the RealTek card as eth1. Fine, right? But once init takes over, it
doesn't appear that way at all. eth0 is the RealTek device, and eth1 is
nowhere to be found.
After an hour or so of frustration: upgrading the e1000 driver,
disabling/enabling the RealTek NIC from the BIOS, Googling, etc. I finally
find a web site that reminds me of /proc/net/dev:
> cat /proc/net/dev | awk '{print $1;}'
Inter-|
face
lo:
eth2:
eth0:
Ah ha! Somehow, I no longer have an eth1, but rather eth0 (RealTek) and
eth2 (Intel). ipconfig will configure it, the card talks on the
network, and is just fine and dandy.
I am still stumped as to why the network cards get renamed. I've poked around
in /etc/udev/, but have not found anything that looks like it would cause
this renaming to occur.
So I put this query to the Lazyweb... Any ideas why the network cards get renamed?
Background
In my previous entry, I discussed how I used iSCSI to make
a hard drive that I have hosted at a friend's home available to as a normal
hard drive on my machine. In this article, I'm going to discuss how I used
dm-crypt and LUKS to encrypt the remote hard drive for security.
I used the following web sites as references:
- http://www.matthew.ath.cx/articles/cryptkey
- http://madduck.net/docs/cryptdisk/
- http://blog.dlgeek.net/?p=84
Setup
First of all, I needed to install support for encrypted volumes, which is handled by the cryptsetup package in Debian.
> sudo aptitude install cryptsetup
Now, my iSCSI-attached drive is known on my system as sdc. I need to create
a partition to use. Using fdisk, I created a single partition that
uses the entire disk. This becomes known as sdc1.
Next, I need to format the partition, but since I'm going to use encryption,
rather than just a normal ext3 filesystem, I don't use mkfs yet, but
first use cryptsetup.
> sudo cryptsetup luksFormat /dev/sdc1
Enter a passphrase at the prompt. I advise you not to forget the password, or else you'll never be able to get at your data.
Now that we have an encrypted volume, we need to make it available via
dm-crypt.
> sudo cryptsetup luksOpen /dev/sdc1 sdc1_crypt
Enter LUKS passphrase:
key slot 0 unlocked.
Command successful.
In this command, we tell cryptsetup to use LUKS to open /dev/sdc1 and name
it sdc1_crypt. This name is the Device-Mapper name. If you look in
/dev/mapper/, you should see something like this:
> ls -l /dev/mapper/
total 0
crw-rw---- 1 root root 10, 63 2009-02-01 13:45 control
brw-rw---- 1 root disk 253, 0 2009-02-01 13:49 sdc1_crypt
/dev/mapper/sdc1_crypt is now our device that we'll use. We'll need a
filesystem on it, as normal:
> sudo mkfs.ext3 /dev/mapper/sdc1_crypt
Now we have a normal block device that can be mounted, read, written and umounted as normal.
As we will be using this drive for unattended backups, we need to create another key that will be used in place of a passphrase.
> sudo cryptsetup luksClose /dev/mapper/sdc1_crypt
> sudo dd if=/dev/urandom of=/etc/keys/sdc1.luks bs=1k count=1
> sudo cryptsetup luksAddKey /dev/sdc1 /etc/keys/sdc1.luks
In step 1, I close the device. This isn't strictly necessary, but we will have the device starting in a closed state below in the Usage section, so I do it here. Step 2 creates a keyfile to use, and in step 3, we are adding that key file to the list of valid keys for our encrypted drive.
That's it. We're all set up. You could remove the passphrase key, but if you do so, make sure you back up the key file somewhere safe, otherwise, you won't be able to recover the data from your offsite backup in the case of needing to recover your entire system.
Usage
To use the encrypted drive, it's just a matter of opening the device, mounting it, using it as normal, then unmounting and closing.
> sudo cryptsetup luksOpen /dev/sdc1 sdc1_crypt --key-file=/etc/keys/sdc1.luks
> sudo mount /dev/mapper/sdc1_crypt /mnt
> $do_backup
> sudo umount /mnt
> sudo cryptsetup luksClose /dev/mapper/sdc1_crypt
That's all. Pretty simple, really.
Background
My coworker, Steve, and I have been talking about backup strategies for a while
now. Personally, I use a combination of RAID1 on my main machine, and a
nightly rdiff-backup to my NAS, which keeps a month or two of revisions.
This is a good start, but is not exactly tolerant to physical disaster, such
as fire, break-in, etc. Those reasons are why the concept of offsite backups
was invented.
So what is a good way to do offsite backups? I don't really want to dish out the cash for a tape system, and I've got a good amount of data that I'd like backed up (documents, photos, music, etc.). Hard drives make good sense. However, the big problem with offsite backup strategies that require remembering to swap out drives is that they require somebody to swap out drives.
So Steve and I both keep a server online in our homes 24/7. We thought about
giving each other shell access so that we could run rsync or rdiff-backup,
but that seemed somewhat sub-par of a solution. I really wanted something
that would have very little impact on the other's computers. I don't want to
waste his disk space, and while I do trust him, I don't really want to have my
backups stored on his machine unencrypted.
I think Steve was the first to think about iSCSI as a solution. We would each purchase a drive to have installed on the other's server. We then export that raw drive over iSCSI, where the other can connect and use the drive as they wish. In a future post, I'll explain how I set up the encrypted filesystem on my drive that is hosted at Steve's place. In this article, I'll explain how I went about getting iSCSI to work for this setup. I made use of this HOWTO.
To make the HDD available:
Add the hard drive to your computer, and note the device that it'll be. In my
case, it was hda.
Install the iSCSI Target packages. Assuming you're running a Debian
derivative, install iscsitarget and its kernel module:
> sudo aptitude install iscsitarget iscsitarget-source
Compile the module and install it:
> sudo module-assistant auto-install iscsitarget
> sudo dpkg -i /usr/src/iscsitarget-module-X.YY.ZZ-^I
Yes, that is a tab-complete on the dpkg install line. Use the .dpkg that was
created by module-assistant.
Add the following to your /etc/ietd.conf, setting the 'path' as appropriate:
Target iqn.2001-01.org.forkit:storage.erwin.bar.remotedrive0
IncomingUser bmoore ****************
OutgoingUser bmoore ++++++++++++++++
Lun 0 Path=/dev/hda,Type=blockio
ImmediateData Yes
I should explain some of this. The Target line specifies a new iSCSI
"controller" with the name
"iqn.2001-01.org.forkit:storage.erwin.bar.remotedrive0". The first part of
this Target name has a standard that it follows. Quoting from the man page
for ietd.conf:
Target iqn.<yyyy-mm>.<tld.domain.some.host>[:<identifier>]
A target definition and the target name. The targets name (the iSCSI Qualified Name ) must be a globally unique name (as defined by the iSCSI standard) and has to start with iqn followed by a single dot. The EUI-64 form is not supported. <yyyy-mm> is the date (year and month) at which the domain is valid. This has to be followed by a single dot and the reversed domain name. The optional <identifier> - which is freely selectable - has to be separated by a single colon. For further details please check the iSCSI spec.
IncomingUser and OutgoingUser specify the username that should be expected from an Initiator (a device trying to mount the disk), and the username that we should send back. In this case, I set them to the same id. The stars and plusses represent two unique passwords. I used pwgen to generate a couple of strings. The Lun line specifies a device to be present on the controller. I specify the path to my drive, and state that it is to use block I/O, rather than file I/O to access it (as it is a block device, not a file that we are exporting). Turning on ImmediateData is an attempt to optimize for the relatively slow link we'll have going over our broadband connections. It may not be useful, I'm not sure.
If applicable, update /etc/default/iscsitarget to have iSCSI start up at
boot time:
> cat /etc/default/iscsitarget
ISCSITARGET_ENABLE=true
> /etc/init.d/iscsitarget start
Finally, forward Port 3260 on your router to the server hosting the drive.
To mount the remote drive:
Install open-iscsi, and search for your targets.
> sudo aptitude install open-iscsi
> sudo iscsiadm -m discovery -t sendtargets -p remote.example.com:3260
192.168.0.5:3260,1 iqn.2001-01.org.forkit:storage.erwin.bar.remotedrive0
Now that we've found the remote drive, we need to add our passwords.
> cd /etc/iscsi/nodes/iqn.2001-01.org.forkit:storage.erwin.bar.remotedrive0
> sudo mv 192.168.0.5,3260 remote.example.com,3260
> sudo vi remote.example.com,3260
Delete the line that says node.session.auth.authmethod = None, and add the
following lines, where the passwords match up as above:
node.session.auth.authmethod = CHAP
node.session.auth.username = bmoore
node.session.auth.password = ****************
node.session.auth.username_in = bmoore
node.session.auth.password_in = ++++++++++++++++
Also, due to using NAT, we need to update one more place.
node.conn[0].address = remote.example.com. It is currently set to the
"Portal" address, but we need the address that we want to connect to.
Now log in:
> sudo iscsiadm --mode node \
--targetname iqn.2001-01.org.forkit:storage.erwin.bar.remotedrive0 \
--portal remote.example.com:3260 --login
Currently, I'm not getting anywhere. The login works, and in dmesg, I get a line like:
scsi2 : iSCSI Initiator over TCP/IP. However, no devices show up. I'm having Steve check his system logs, and will get back to you.
Update: Steve played around with a few things, for he was getting an error -16 when trying to lauch the iSCSI Target.
ERRNO 16is device busy. Googling around led us to believe that perhaps LVM had the device open. He added afilterline to hislvm.conffile and rebooted. Apparently, that wasn't quite enough. He had to remove theType=blockiopart of theLunline in his/etc/ietd.conffile. I'm really not sure why, but now it all works.
Check your kernel logs or dmesg. You should hopefully see something like:
scsi2 : iSCSI Initiator over TCP/IP
scsi 2:0:0:0: Direct-Access IET VIRTUAL-DISK 0 PQ: 0 ANSI: 4
sd 2:0:0:0: [sdc] 625142448 512-byte hardware sectors (320073 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 77 00 00 08
sd 2:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sd 2:0:0:0: [sdc] 625142448 512-byte hardware sectors (320073 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 77 00 00 08
sd 2:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sdc: sdc1
sd 2:0:0:0: [sdc] Attached SCSI disk
That means that your disk has been detected, and is now sdc. From here, you
can fdisk the drive, partition, etc, just as if the drive was plugged
in directly to your system.
To disconnect your iSCSI disk, use the following line:
> sudo iscsiadm --mode node \
--targetname iqn.2001-01.org.forkit:storage.erwin.bar.remotedrive0 \
--portal remote.example.com:3260 --logout