zfs

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs. [read more at http://en.wikipedia.org/wiki/ZFS]




  • I've made many errors when building my NAS server, and
    this  force me to forget using SUN Zeta File System, at least
    for this year...In fact I have decide to build a NAS before
    even knowing the existence of ZFS, and bought following
    hardware components:
    • 1 Promise Supertrak EX8350 with 8 SATA2 3GB port (RAID6)
    • The cheapest integrated mainboard available: NFORCE4 IGP
    • AMD64 3000+
      I took me half a day to update both mainboard (in order to use the Promise EX8350 PCI e4X in the PCI e 16X port) and controller bios (support of RAID6)! The crazy process of updating BIOS and firmware with a floppy disk has still not disappear. The second issue was also to create a floppy disk on a system without any OS.

    The solution come of course from Knoppix. I was able to find old DOS floppy disk images at www.bootdisk.com: all DOS and Windows version are available there. I quickly boot my disk less machine Knoppix and format a new floppy:
    # fdformat /dev/fd0

    and extract the boot image by typing:
    # dd if=bootdisk.img of=/dev/fd0 bs=1440k
    This has permit me to flash the mainboard with the latest ASS bios available (1001) and the Promise controller.

    Ive contact Promise support 2 times  (Europe AND USA), the response is below:


    So if You ever want to build a NAS powered by a Solaris flavor, first consult the Hardware Compatibility List (HCL), and avoid Promise Technology. I've found that all others main manufacturer like Adaptec and ARECA provide Solaris drivers (HERE) even if it they are quite old (middle of 2005).

    Ive also tried some Solaris flavor which I can definitely recommend You, if You decide  to play with ZFS:
    Both version both seem not to use OpenSolaris Nevada build 44,  so I was not able to play with RAIDZ2 (simulate a RAID6 array)

    A replicated RAID-Z configuration can now have either single- or
    double-parity, which means that one or two device failures can be sustained
    respectively, without any data loss. You can specify the raidz2 keyword for a double-parity RAID-Z
    configuration. Or, you can specify the raidz or raidz1 keyword for a single-parity RAID-Z
    configuration.

    I've also tried Solaris Express 10 (Live CD) which is available also for free (non commercial use), but I was really not convince by the desktop, and hardware was not better recognized.
    What can also stop You from using ZFS is the encryption subproject which has not deliver yet, and the fact that the only supported pool share is NFS (Windows support it with "Windows Services for UNIX version" 300Mb), Samba export still being in development.




    This give me 2 options:  use either a Windows or Linux operating system.Windows has a major advantage by having all drivers support (Cool and Quiet, Nforce4 chipset, Promise driver and management console), but all insecurities and the fully fledged desktop is NOT needed on a true file server. Linux on the other side has also all drivers available (except Promise WebPam management console), and is a lot more modulable: I can remove all functionalities not needed: no FTP, no desktop, no HTTP daemon,... Samba, ssh2 and ReiserFS is all I need!

    I  may choose for the job:
    • OpenSuse 10.1 since I am using SuSE since 3 years  or
    • Free BSD, a leader in stability and securities in the Linux world.
    Right now, I've put 5 disks of 320 GB in a RAID5 logical array, the initialization of 1.2TB  took 18 hours!
    Promise Ex8350 initializing the NAS

    This box has 14 Sata Port and I've add old disk full of data 300GB and  160GB.and 8 USB port (+ 2 Maxtor 300GB USB disk).

    wattage controler checking power comsumption of NAS The power consumption is quite high not only because of all hard disks (15 Watts * 7 = 105), but also because of the AMD64 (95 Watts at 1800MHz and 63 Watts at 800MHz when Cool'nQuiet is active). The promise Intel IIOP cpu is also sucking energy. Without it into he box, total power consumption was below 100 Watts, with 150 Watts!

    In order to better tune the box for power consumption (down clocking, reduce main voltage of CPU core), I've bought a cheap Wattage controller (7 euro), left is the NAS running during init of the array without Cool and Quiet

     
  • Putting OpenSolaris in a NAS server

    OpenSolaris is an open source project created by Sun Microsystems to build a developer community around the Solaris Operating System technology
    OpenSolaris express is the official distribution and can be download HERE but I will use a fork of that code.  Raid @ home with opensolaris and ZFS Why Solaris for a NAS server?

    Solaris itself while being a rock solid operating system, is not really needed for a NAS server (oversized). What has increase my interest in it, is ZFS, the Zetabyte File System. This is an extract of opensolaris.org all arguments fits nicely to my need:

    <quote>

    • ZFS is a new kind of filesystem that provides simple administration, transactional semantics, end-to-end data integrity, and immense scalability. ZFS is not an incremental improvement to existing technology; it is a fundamentally new approach to data management. We've blown away 20 years of obsolete assumptions, eliminated complexity at the source, and created a storage system that's actually a pleasure to use.
    • ZFS presents a pooled storage model that completely eliminates the concept of volumes and the associated problems of partitions, provisioning, wasted bandwidth and stranded storage. Thousands of filesystems can draw from a common storage pool, each one consuming only as much space as it actually needs. The combined I/O bandwidth of all devices in the pool is available to all filesystems at all times.
    • All operations are copy-on-write transactions, so the on-disk state is always valid. There is no need to fsck(1M) a ZFS filesystem, ever. Every block is checksummed to prevent silent data corruption, and the data is self-healing in replicated (mirrored or RAID) configurations. If one copy is damaged, ZFS will detect it and use another copy to repair it.
    • ZFS introduces a new data replication model called RAID-Z. It is similar to RAID-5 but uses variable stripe width to eliminate the RAID-5 write hole (stripe corruption due to loss of power between data and parity updates). All RAID-Z writes are full-stripe writes. There's no read-modify-write tax, no write hole, and — the best part — no need for NVRAM in hardware. ZFS loves cheap disks.
    • But cheap disks can fail, so ZFS provides disk scrubbing. Like ECC memory scrubbing, the idea is to read all data to detect latent errors while they're still correctable. A scrub traverses the entire storage pool to read every copy of every block, validate it against its 256-bit checksum, and repair it if necessary. All this happens while the storage pool is live and in use.
    • ZFS has a pipelined I/O engine, similar in concept to CPU pipelines. The pipeline operates on I/O dependency graphs and provides scoreboarding, priority, deadline scheduling, out-of-order issue and I/O aggregation. I/O loads thatbring other filesystems to their knees are handled with ease by the ZFS I/O pipeline.
    • ZFS provides unlimited constant-time snapshots and clones. A snapshot is a read-only point-in-time copy of a filesystem, while a clone is a writable copy of a snapshot. Clones provide an extremely space-efficient way to store many copies of mostly-shared data such as workspaces, software installations, and diskless clients.
    • ZFS backup and restore are powered by snapshots. Any snapshot can generate a full backup, and any pair of snapshots can generate an incremental backup. Incremental backups are so efficient that they can be used for remote replication — e.g. to transmit an incremental update every 10 seconds.
    • There are no arbitrary limits in ZFS. You can have as many files as you want; full 64-bit file offsets; unlimited links, directory entries, snapshots, and so on.
    • ZFS provides built-in compression. In addition to reducing space usage by 2-3x, compression also reduces the amount of I/O by 2-3x. For this reason, enabling compression actually makes some workloads go faster.
    • In addition to filesystems, ZFS storage pools can provide volumes for applications that need raw-device semantics. ZFS volumes can be used as swap devices, for example. And if you enable compression on a swap volume, you now have compressed virtual memory.
    • ZFS administration is both simple and powerful.

    </quote>

    This speak by itself, Ive seen 2 Demos HERE, and while the hardware support is not that great, I've decide to give it a try.  Note that linux may have a port of  ZFS port before July 2006, as it is a sponsored Google summer of code project.


    Raid @ home with opensolaris and ZFS Which Solaris flavor

    In Fact it is possible to use one of the following OpenSolaris distribution:
    • BeleniX is a *NIX distribution that is built using the OpenSolaris source base. It is currently a LiveCD distribution but is intended to grow into a complete distro that can be installed to hard disk. BeleniX has been developed out of Bangalore the silicon capital of India and it was born at the India Engineering Center of SUN Microsystems. And... it USE KDE: the est open source desktop.
    • SchilliX, a live CD.s
    • marTux, a live CD/DVD, for Sparc
    • Nexenta, a Debian-based distribution combining GNU software and Solaris' SunOS kernel
    • Polaris, a PowerPC port

    Status: stable, in development
    # Developers: __

    homepage Belenix logo
    http://belenix.sarovar.org
     version 0.4.3a
    Based on OpenSolaris
    Support
    • NFS,
    • SMB/CIFS,
    • HTTP/WebDAV
    • FTP
    Network directories support
    • ???
    Software Raid 0,1,5,6
    Hardware Raid
    Interface None
    • Remote login is deactivated but can be re-enable: You need to comment out the line "CONSOLE=/dev/console" in the file /etc/default/login to allow remote root login.
    • maybe VNC remote access.
    Size ??
    Can be installed
    • Live CD -> but mount point has to e recreated
    On hard disk only because of its size
    File system EXT2/EXT3, ZFS
    HardDrive ATA/SATA, SCSI, USB and Firewire
    Network not well...

    RAID @ home raid5  Installation

    Since belenix is a Live CD, and just for playing around with ZFS, it is more than enough.

    Raid @ home with opensolaris and ZFS Playing with ZFS



    Raid @ home with opensolaris and ZFS Future






    Raid @ home with opensolaris and ZFS Links and ressources


     
  • I am still testing my NAS system (seven 300Gb disks) and while testing OpenSolaris (under Belenix), and Googling I found that page:

    This blog is about the Google Summer of Code project "ZFS filesystem for FUSE/Linux"

    For all of You that do not know what FUSE is, FUSE is the Filesystem in Userspace Linux kernel module. This module allows nonprivileged users to create their own filesystems without writing any kernel code.

    While ZFS has many features which can benefit all kinds of users - from the simple end-user to the biggest enterprise systems:
    • Provable integrity - it checksums all data (and meta-data), which makes it possible to detect hardware errors (hard disk corruption, flaky IDE cables..). 
    • Atomic updates - means that the on-disk state is consistent at all times, there's no need to perform a lengthy filesystem check after forced reboots/power failures.
    • Instantaneous snapshots and clones - it makes it possible to have hourly, daily and weekly backups efficiently, as well as experiment with new system configurations without any risks.
    • Built-in compression, encryption
    • Highly scalable
    • Pooled storage model - creating filesystems is as easy as creating a new directory. You can efficiently have thousands of filesystems, each with it's own quotas and reservations, and different properties (compression algorithm, checksum algorithm, etc..).
    • Built-in stripes (RAID-0), mirrors (RAID-1) and RAID-Z (it's like software RAID-5, but more efficient due to ZFS's copy-on-write transactional model). 
    • Variable sector sizes, adaptive endianness etc...
    In fact this is a sponsored Google summer of code project. Note that Apple is also currently porting ZFS under OS-X. That could mean that ZFS could be mainstream in a future not far away than 2 years.
    And I expect to test RAID-Z...For those interested by RAID-Z raw performances, You can read this highly technical blog entry: WHEN TO (AND NOT TO) USE RAID-Z

    Sun expect to have a stable ZFS version by June 2006.
  • Before putting my monster NAS online (pictures will follow soon), I am playing a lot with NEXENTA under VMWARE player.

    I've found that excellent PDF (THE LAST WORD IN FILE SYSTEMS) which explain why ZFS may be the Saint Graal of file system, while if you want to learn how to administrate pool, I recommend YouThe ZFS admin guide

    Here is my first try, with 7 simulated disks (this example use files and not real devices even if I have 7 real disks sitting next to me ;-)), next steps will be to export the pool as NFS share, plug some disks out, activate encryption, crontab snapshots and remote ssh backup of some vital data.


    # mkdir /vaultcreate a directory for storing all virtual  disks
    # mkfile 64m /vault/disk1
    # mkfile 64m /vault/disk2
    # mkfile 64m /vault/disk3
    # mkfile 64m /vault/disk4
    # mkfile 64m /vault/disk5
    # mkfile 64m /vault/disk6
    # mkfile 64m /vault/disk7
    I create 7 virtual disk name disk1 to disk7
    # zpool status
    no pools available
    check if there is any pool already defined....
    # zpool create nasvault raidz /vault/disk1 /vault/disk2 /vault/disk3  /vault/disk4 /vault/disk5 /vault/disk6 6 disks will be in a raidz pool
    # zpool status
      pool: nasvault
     state: ONLINE
     scrub: none requested
    config:

            NAME             STATE     READ WRITE CKSUM
            nasvault            ONLINE       0     0     0
              raidz              ONLINE       0     0     0
                /vault/disk1  ONLINE       0     0     0
                /vault/disk2  ONLINE       0     0     0
                /vault/disk3  ONLINE       0     0     0
                /vault/disk4  ONLINE       0     0     0
                /vault/disk5  ONLINE       0     0     0
                /vault/disk6  ONLINE       0     0     0
    RAIDZ:

    A replicated RAID-Z configuration can now have
    either single- or double-parity, which means that one or two device failures can be sustained
    respectively, without any data loss. Disks can be of different size, and there is no write hole as found in other RAID arrays.
    df -h /nasvault
    Filesystem             size   used  avail capacity  Mounted on
    nasvault                  384M    16K   384M     1%    /nasvault
    checking size of the pool
    zpool add nasvault raidz /vault/disk5 /vault/disk6Extending pool on the fly with 2 new disks

    Some noise about the development of a mini opensolaris boot file (miniroot.gz) under 60 Mb and able to boot on a USB disk have pop up on OpenSolaris forums. Exactly at the right scheedule for my NAS project, if it can come out in less than 2 weeks, it would be perfect! 
  • ZFS has so much promise that it sound to good to be true! I will make an extensive try of it soon.

    From ZFS: Threat or Menace? Pt. I

    .... In a storage industry where the hardware cost to protect data keeps rising, ZFS represents a software solution to the problem of wobbly disks and data corruption. Thus it is a threat to hardened disk array model of very expensive engineering on the outside to protect the soft underbelly of ever-cheaper disks on the inside...
    and part 2 is also here

    And I also found some benchmarks against EXT3, ResierFS, UFS

    will publish soon a lot of ZFS howton as well.