Storage Nirvana – Unlimited Storage using LVM, iSCSI and AoE

UPDATE: I don’t recommend this. See comment #4 – all you do is at your own risk.

If you want to have unlimited space like what google.com is doing with gmail, then here’s what you need to do. I used Ubuntu Feisty Fawn and Gutsy Gibbon to test out the solution.

  1. Setup and understand the usage of a LVM
  2. Setup iSCSI or AoE for network devices so that network storage can be used easily
  3. Setup RAID so that redundancy can be provided to LVM Physical Volumes

Important: Following steps may corrupt your data or partitions. Use them at your own risk. Also, I have listed down the steps as I went ahead doing them. They may not be accurate in your environment or all scenarios may not be covered. I would recommend going through the entire article before attempting anything.

Assumptions

  1. You are comfortable using Linux and fdisk
  2. You have backed up your data so that in case something goes wrong, nothing is lost
  3. You can connect to network and have access to more than 1 Linux machine to do the testing

This article is not complete and does not include the iSCSI and AoE related information as of now. I am working on it and will update it soon. However, you may continue on this page and learn about the LVM 🙂

Setup a LVM

Much help from http://linuxhelp.blogspot.com/2005/04/creating-lvm-in-linux.html

Quick Steps

  1. Identify the partitions which are to be used. These are referred to as physical volumes (pv). The steps are
    1. Identify the partitions such as /dev/hda1, /dev/sda1 etc.
    2. Set the correct partition type of Linux LVM, which is 8e
    3. Format them using pvcreate
  2. Create a Volume Group (vg) which consists of the partitions identified above. THis is the place where you would add/remove partitions to increase/decrease space.
  3. Creating the Logical Volumes (lv) which are partitions that you can format (using ext3 or gfs etc.).
  4. Use the partitions created as per your requirements. Add them into /etc/fstab if necessary.

So, diagrammatically it is something like the following

LVM Diagram

Analysis

So the analysis is

  1. We can add/remove partitions in a volume group, that too on the fly.
  2. A logical volume is created from the volume group. Its size is defined in extents.
    1. So, whenever we want, we can increase the size, or decrease the size as per requirements.
    2. If there are two partitions, /backup and /recordings and we have space available in /recordings, then we can reduce the size of /recordings and add it to /backup

Hands On LVM

Following is what I did for experimentation

  1. Install lvm2 through synaptic

  2. Create the partitions and tag them as LVM. Three partitions created – /dev/sda5, /dev/sda6 and /dev/sda7, 500 MB each. Tagged them as LVM (code 8e available in fdisk)

  3. Reboot the machine as the lvm2 needs to be initialized, and since changes in the partitions were done, they also need to be initialized

  4. Ran the command pvcreate

    1. pvcreate /dev/sda5 /dev/sda6 /dev/sda7
  5. Ran vgscan to check if any Volume Groups were present. None were

  6. Created a Volume Group using vgcreate for a total size of 1 GB.

    1. vgcreate my_disks /dev/sda5 /dev/sda6 # For Extent Size of 4 MB which means 256 GB of maximum space supported by that VG
    2. I could have also used vgcreate -s 16 my_disks /dev/sda5 /dev/sda6 for Extent Size of 16 MB so that I can use a maximum space of 1 TB. Since this is an experiment, no such need is necessary.
  7. Ran vgdisplay to view the details of the my_disks volume group

  8. Created a logical volume using
    1. lvcreate -l 75 -n vol1 my_disks
    2. - l 75 means 75 extents. Considering the size of 4 MB per extent, the total size of the logical volume would be 4×75 = 300 MB
    3. vol1 is the name of the logical volume. This can now be formatted.
    4. Verify it using lvdisplay
  9. Created another logical volume vol2 of 75 extents.

  10. Formatted both the volumes in ext3. I am using ext3 only because I have experience using it, and it is possible to resize it while still being mounted (online resize as of now is supported only for growing a partition, not shrinking)
    1. mkfs.ext3 /dev/my_disks/vol1
    2. mkfs.ext3 /dev/my_disks/vol2
  11. Mounted the partitions in /lvm

    1. mkdir /lvm
    2. mkdir /lvm/recordings /lvm/data
    3. mount /dev/my_disks/vol1 /lvm/data
    4. mount /dev/my_disks/vol2 /lvm/recordings
  12. Added the following entries in /etc/fstab so that they auto mount on reboot

    1. /dev/my_disks/vol1 /lvm/data ext3 defaults 0 3
    2. /dev/my_disks/vol2 /lvm/recordings ext3 defaults 0 3
  13. Rebooted the system and all was fine 🙂

Important executables

  1. pvcreate

  2. pvmove

  3. pvdisplay

  4. vgdisplay

  5. vgextend

  6. vgreduce

  7. lvdisplay

  8. lvresize

  9. resize2fs

Test Cases for LVM

A. /lvm/data needs 400 MB of space instead of existing 300 MB which can be taken from the Volume Group available

  1. Extents required : 100
  2. Commands
    1. lvresize -l 100 /dev/my_disks/vol1
    2. resize2fs /dev/my_disks/vol1 400M

It did online resizing (resize2fs) only because we were growing the partition and it was ext3 on Kernel 2.6.

B. /lvm/data needs to be resized to 800 MB (200 extents), but space available in volume group (my_disks) is only 268 MB (67 extents)

  1. Extents required : 200
  2. Extents available : 67
  3. Commands
    1. lvresize -l 200 /dev/my_disks/vol1

It shows an error message of insufficient extents.

C. Free up extents in vol2 so that they can be used in vol1 (for Case B.)

  1. Extents to be freed : 33
  2. Current Extents : 75 (300 MB)
  3. Commands
    1. umount /lvm/recordings
    2. e2fsck -f /dev/my_disks/vol2
    3. resize2fs /dev/my_disks/vol2 168M
    4. lvresize -l 42 /dev/my_disks/vol2 # This gave a warning which I ignored
    5. vgdisplay # 100 Extents available
    6. Run Case B. again – It works!!

The command e2fsck needs to be run before resize2fs, otherwise resize2fs would give an error.

D. The Logical Volume (vol2) of 42 extents has data upto 120 MB (30 extents) and has to be resized to 28 extents

  1. Commands
    1. mount /lvm/recordings
    2. cd /lvm/recordings
    3. Created a 120 MB file using dd if=/dev/zero of=myfile bs=12280 count=1024
    4. echo "hi" >> myfile
    5. md5sum myfile # To verify what happens if data is reduced. This helps verify corruption if it occurs
    6. umount /lvm/recordings
    7. e2fsck -f /dev/my_disks/vol2
    8. resize2fs /dev/my_disks/vol2 112M

This reports an error. So it will not resize when there is data 🙂

Try again, but this time increase it to 140 MB (35 extents)

  1. lvresize -L 140M /dev/my_disks/vol2
  2. resize2fs /dev/my_disks/vol2 140M
  3. mount /lvm/recordings

It works!. Doing md5sum /lvm/recordings/myfile is accurate.

E. vgdisplay has 242 extents of which only 7 are free. Now add the /dev/sda7 partition without unmounting anything

  1. Commands
    1. vgextend my_disks /dev/sda7

And it is extended without unmounting anything. Total Extents available : 363

F. Hard Disk /dev/sda6 has gone bad and has to be taken out from service. So it’s data has to be moved

  1. Commands
    1. modprobe dm-mirror # This has to be done as dm-mirror module is not loaded by default. You may add it in /etc/modules
    2. pvmove /dev/sda6

pvmove is used to move the data of a physical volume residing inside a volume group. This helps removing the physical volume easily.

Now removing the pv /dev/sda6 from the volume group

  1. vgreduce my_disks /dev/sda6

And it works too! Doing a md5sum also is successful, so no data got corrupted.

G. System got corrupted and is now unbootable. Now the data needs to be retrieved from the logical volumes.

  1. Restart the machine boot using a Ubuntu Gutsy Gibbon (7.10) Alternate Install CD. The Alternate install CD has a rescue mode available.
  2. Start the rescue mode
  3. The volume groups and logical volumes (/dev/my_disks, /dev/my_disks/vol1, and /dev/my_disks/vol2) were available and I was able to mount them successfully. md5sum also returned correct results.

It seems that Ubuntu Rescue detects the LVM volumes and initializes them so that they’re usable.

Observations

  • Over a Wifi network using iscsi was painfully slow if I tried using pvmove. So, if there was an iscsi target /dev/sdb1 which was on a different machine (machine B), and I wanted to remove /dev/sdb1 from my LVM on machine A, then I used “pvmove /dev/sdb1”. So it moved all the information available in /dev/sdb1 to extents available on my LVM on machine A which took ages (9m14sec for 50 MB of data). However, reversing the scene, wherein /dev/sdb1 was part of my volume group, and I wanted to free up a partition on my machine A (say /dev/sda8), then “pvmove /dev/sda8” worked flawlessly and was fast. No such problems occured when using ethernet.
  • AoE has better performance as compared to iSCSI in cases of pvmove. I was able to transfer 836 MB of data in approx 1.35 minutes as compared to 2.19 minutes for iSCSI.
  • iSCSI seems to be more reliable than AoE. Using AoE my system was hung a few times, and there were other issues which didn’t seem fit to be in production where anyone can be asked to manage a machine. Case in question is, it stuck when doing vgscan/lvdisplay and i was able to view the listing only after I somehow managed to rmmod aoe – probably I’d mounted the partition back (using vblade) and then only it started working. But another issue was, though I was able to do ‘pvmove /dev/sda8’ (of machine A) successfully with the network traffic appearing on machine B (sdb1), but when I tried to do ‘pvmove /dev/sdb1’, it gave some ioctl related errors and within 16 seconds said that 100% of the data was transferred. A md5sum was also successful! I don’t know what happened here.

Errors Encountered

  • /dev/sdc1: read failed after 0 of 4096 at 0: Input/output error

This had occured because I had made changes in fdisk to the primary hard disk. So it was not syncronized properly. Once reboot was done, all was fine.

One more problem occured – if you restart open-iscsi then open-iscsi changes the device name from /dev/sdb1 to /dev/sdc1 which causes the above mentioned problem. This needs to be worked upon.

6 thoughts on “Storage Nirvana – Unlimited Storage using LVM, iSCSI and AoE


  1. One more problem occured – if you restart open-iscsi then open-iscsi changes the device name from /dev/sdb1 to /dev/sdc1 which causes the above mentioned problem. This needs to be worked upon.

    Were you able to resolve this? What looks like a loss of filesystem when the iSCSI-device is unplugged after being securely unmounted looks like a serious bug. This is currently hitting me in my tests…

  2. Unfortunately I was not able to spend any time on the troubleshooting and all. No resolution to that as yet. Will start working again in last week of September. Will update if I find anything.

  3. I liked the article a lot, but I feel I should leave a caution note concerning pvmove. It still has bugs and even memory leaks.

    If you have the choice, use the free version of veritas storage foundation instead (4 volumes, 4 filesystems max – but each up to to 8 exabytes each)

    best wishes, florian

  4. I’ve had my fair share of issues with pvmove. I’ve seen data inconsistency amounting to around 1 TB with millions of files (call recordings) lost – as the partition could not be moved without errors. I was using two 500 GB removable USB disks (which I shouldn’t have in the first place), which either developed bad sectors, or got disconnected while pvmove was in progress. Also, pvmove was transferring the data over the network to a SAN drive which was mounted through iscsi (in lvm). I think there was no data consistency check while the data was being moved or it was error-prone, or using iscsi volume as a lvm pv is a bad idea.

    I have, for the time being decided to do away with LVM, and instead focusing on build applications which can work with multiple partitions seamlessly, over network.


  5. One more problem occured – if you restart open-iscsi then open-iscsi changes the device name from /dev/sdb1 to /dev/sdc1 which causes the above mentioned problem. This needs to be worked upon.

    I had the same problem and it only occurs if one has the created a logical volume with it. If you remove the LV and maintain the PV and VG it doesn’t change.

    It’s driving me crazy.

Leave a comment