This is an old revision of the document!


CCEI Storage Appliance

The plankton.che.udel.edu appliance is a home-grown medium-scale storage system. Under the hood it uses a ZOL (ZFS on Linux) filesystem. ZFS has several benefits:

  • Per-directory quota control: Any directory (be it a user's storage, content to be visible on the web, whatever) can have a quota (maximum size) or a reservation (guaranteed size).
  • Storage pools: The filesystem doesn't live on physical hard disks, it is spread across groups of disks; the filesystem is easily grown by adding more disks.
  • Increased integrity: ZFS has advanced data-integrity features like triple-parity RAID and self-healing of silent corruption. Triple-parity means that if three hard disks fail for a given storage pool data will not be lost. Silent corruption is the mangling of bits on the hard disk itself, such that when you later read the data off disk you cannot tell if it is correct or not; ZFS mitigates this using additional checksums and parity data.
  • Snapshots: More on this below.

ZFS on Linux is an open source project that derives from the Sun (now Oracle) ZFS code that is a part of the Solaris operating system.

With ZOL providing the storage, the appliance needs interfaces through which users (CCEI staff and students) can access it. The following file-sharing mechanisms are currently configured on plankton:

  • Samba: Also known as CIFS or SMB, Samba allows you to mount directories on plankton on your Windows/Mac/Linux desktop and work with them as you would any other disk (drag and drop to copy, double-click to open and edit). Samba is NOT a secure file transfer protocol, though, so it is available ONLY when you are on-campus. Even when on-campus, don't use it on any data you consider to be private.
  • SFTP/SCP: The Secure FTP and Secure CP programs (part of the SSH client suite) can be used from anywhere on the Internet to connect to plankton and manipulate files.
  • NFS: Each CCEI student with an account on squidward as well as on plankton can access his/her plankton directory directly from the head node of squidward.

Please note that the storage appliance is currently a demonstration unit and all information on this page is subject to change.

When a file is written on ZFS, the data is always written to unused blocks: this is known as copy-on-write. So long as there are enough unused blocks available, the blocks containing the old copy of the file will not be overwritten with the new data. This also serves to increase write performance, since it usually means that a contiguous set of blocks can be allocated and written in one pass (versus a disparate set of blocks that must each be located and written).

Another benefit of copy-on-write and the retention of older blocks is that for some period of time one or more older copies of a file will still be present. Imagine that on Wednesday the filesystem creates a copy of the metadata that maps filenames to blocks they occupy. On Thursday I delete a page from charts.xls and save it, then realize I wanted to keep that page! If I could consult Wednesday's metadata copy, I could discover the blocks that contained the file prior to my mistake and recover it. This is exactly what ZFS snapshots are: a point-in-time copy of the metadata associated with the files/directories on the filesystem.

There is an invisible directory in every plankton file share called .zfs that contains any snapshots that are available. For example, if I SFTP to plankton I can use the command cd .zfs/snapshot and ls to see what snapshots are available:

sftp> cd .zfs/snapshot
sftp> ls -l
dr-xr-xr-x    1 root     root            0 Mar 18 13:42 0200
dr-xr-xr-x    1 root     root            0 Mar 18 13:42 0800
dr-xr-xr-x    1 root     root            0 Mar 18 13:42 1400
dr-xr-xr-x    1 root     root            0 Mar 18 13:42 2000
dr-xr-xr-x    1 root     root            0 Mar 18 13:42 Fri
dr-xr-xr-x    1 root     root            0 Mar 18 13:42 Mon
dr-xr-xr-x    1 root     root            0 Mar 18 13:42 Sat
dr-xr-xr-x    1 root     root            0 Mar 18 13:42 Sun
dr-xr-xr-x    1 root     root            0 Mar 18 13:42 Thu
dr-xr-xr-x    1 root     root            0 Mar 18 13:42 Tue
dr-xr-xr-x    1 root     root            0 Mar 18 13:42 Wed

As plankton is currently configured, snapshots are made on a six hour interval each day starting at 2 a.m. A daily snapshot is taken at 11 p.m. each day and is named to match the day of the week. So the snapshot directory 0200 represents the filesystem at 2 a.m. and 1400 at 2 p.m. – the last time it was 2 a.m. or 2 p.m. Likewise, Wed is the snapshot the last time it was 11 p.m. on a Wednesday.

Inside a particular snapshot directory you will find all files and directories that existed at that point in time and are still present on disk:

sftp> cd Wed
sftp> ls -al
drwx------    2 frey     cadmin          2 Feb 27 13:51 .
dr-xr-xr-x    3 root     root            3 Mar 18 08:00 ..
-rw-r--r--    2 frey     cadmin       4105 Mar 01 09:13 charts.xls

From this snapshot directory I can download that (older) copy of the file the same as any other file:

sftp> get charts.xls

The .zfs directory is accessible when using Samba or SFTP/SCP to access plankton. It currently does not work properly for NFS access from squidward.

  • cluster/squidward.che/plankton.1395165835.txt.gz
  • Last modified: 2014/03/18 18:03
  • by frey