This is an old revision of the document!
CCEI Storage Appliance
The plankton.che.udel.edu
appliance is a home-grown medium-scale storage system. Under the hood it uses a ZOL (ZFS on Linux) filesystem. ZFS has several benefits:
- Per-directory quota control: Any directory (be it a user's storage, content to be visible on the web, whatever) can have a quota (maximum size) or a reservation (guaranteed size).
- Storage pools: The filesystem doesn't live on physical hard disks, it is spread across groups of disks; the filesystem is easily grown by adding more disks.
- Increased integrity: ZFS has advanced data-integrity features like triple-parity RAID and self-healing of silent corruption. Triple-parity means that if three hard disks fail for a given storage pool data will not be lost. Silent corruption is the mangling of bits on the hard disk itself, such that when you later read the data off disk you cannot tell if it is correct or not; ZFS mitigates this using additional checksums and parity data.
- Snapshots: More on this below.
ZFS on Linux is an open source project that derives from the Sun (now Oracle) ZFS code that is a part of the Solaris operating system.
With ZOL providing the storage, the appliance needs interfaces through which users (CCEI staff and students) can access it. The following file-sharing mechanisms are currently configured on plankton
:
- Samba: Also known as CIFS or SMB, Samba allows you to mount directories on
plankton
on your Windows/Mac/Linux desktop and work with them as you would any other disk (drag and drop to copy, double-click to open and edit). Samba is NOT a secure file transfer protocol, though, so it is available ONLY when you are on-campus. Even when on-campus, don't use it on any data you consider to be private. - SFTP/SCP: The Secure FTP and Secure CP programs (part of the SSH client suite) can be used from anywhere on the Internet to connect to
plankton
and manipulate files. - NFS: Each CCEI student with an account on
squidward
as well as onplankton
can access his/herplankton
directory directly from the head node ofsquidward.
Please note that the storage appliance is currently a demonstration unit and all information on this page is subject to change.
Samba Access
Students: Samba URLs look like
smb://plankton.che.udel.edu/students-[username]/
where [username]
is your username (e.g. for me, frey
).
Staff: Samba URLs look like smb:plankton.che.udel.edu/staff-[username]/
where
[username] is your username (e.g. for me,
frey).
==== Mac ====
In the Finder choose Connect to Server… from the Go menu. Enter your
plankton URL and click the “Connect” button. You will be prompted for your username and password. If successful, your
plankton directory will appear on the desktop and/or in the righthand pane of Finder windows.
==== Windows ====
Given the URL mentioned above, you can find your Windows “folder name” by:
- Replace all forward slashes with backslashes
- Remove the leading
smb:
E.g.
\\plankton.che.udel.edu\students-frey\. Given your “folder name,” follow the directions presented on this Microsoft support page. You'll need to enable the checkbox for “Connect using different credentials.”
===== Snapshots =====
When a file is written on ZFS, the data is always written to unused blocks: this is known as copy-on-write. So long as there are enough unused blocks available, the blocks containing the old copy of the file will not be overwritten with the new data. This also serves to increase write performance, since it usually means that a contiguous set of blocks can be allocated and written in one pass (versus a disparate set of blocks that must each be located and written).
Another benefit of copy-on-write and the retention of older blocks is that for some period of time one or more older copies of a file will still be present. Imagine that on Wednesday the filesystem creates a copy of the metadata that maps filenames to blocks they occupy. On Thursday I delete a page from
charts.xls and save it, then realize I wanted to keep that page! If I could consult Wednesday's metadata copy, I could discover the blocks that contained the file prior to my mistake and recover it. This is exactly what ZFS snapshots are: a point-in-time copy of the metadata associated with the files/directories on the filesystem.
There is an invisible directory in every
plankton file share called
.zfs that contains any snapshots that are available. For example, if I SFTP to
plankton I can use the command
cd .zfs/snapshot and
ls to see what snapshots are available:
<code>
sftp> cd .zfs/snapshot
sftp> ls -l
dr-xr-xr-x 1 root root 0 Mar 18 13:42 0200
dr-xr-xr-x 1 root root 0 Mar 18 13:42 0800
dr-xr-xr-x 1 root root 0 Mar 18 13:42 1400
dr-xr-xr-x 1 root root 0 Mar 18 13:42 2000
dr-xr-xr-x 1 root root 0 Mar 18 13:42 Fri
dr-xr-xr-x 1 root root 0 Mar 18 13:42 Mon
dr-xr-xr-x 1 root root 0 Mar 18 13:42 Sat
dr-xr-xr-x 1 root root 0 Mar 18 13:42 Sun
dr-xr-xr-x 1 root root 0 Mar 18 13:42 Thu
dr-xr-xr-x 1 root root 0 Mar 18 13:42 Tue
dr-xr-xr-x 1 root root 0 Mar 18 13:42 Wed
</code>
As
plankton is currently configured, snapshots are made on a six hour interval each day starting at 2 a.m. A daily snapshot is taken at 11 p.m. each day and is named to match the day of the week. So the snapshot directory
0200 represents the filesystem at 2 a.m. and
1400 at 2 p.m. – the last time it was 2 a.m. or 2 p.m. Likewise,
Wed is the snapshot the last time it was 11 p.m. on a Wednesday.
Inside a particular snapshot directory you will find all files and directories that existed at that point in time and are still present on disk:
<code>
sftp> cd Wed
sftp> ls -al
drwx—— 2 frey cadmin 2 Feb 27 13:51 .
dr-xr-xr-x 3 root root 3 Mar 18 08:00 ..
-rw-r–r– 2 frey cadmin 4105 Mar 01 09:13 charts.xls
</code>
From this snapshot directory I can download that (older) copy of the file the same as any other file:
<code>
sftp> get charts.xls
</code>
The .zfs
directory is accessible when using Samba or SFTP/SCP to access plankton
. It currently does not work properly for NFS access from squidward
.