Differences

This shows you the differences between two versions of the page.

--- cluster:squidward.che:start [2015/10/09 18:03]
frey
+++ — (current)
@@ Line 1: / Line 1: @@
-====== squidward.che.udel.edu ======
-<php>
-include_once('clusterdb.php');
-CDBOpen();
-if ( ($clusterID = CDBClusterIDForClusterHost('squidward.che')) !== FALSE ) {
-  if ( $vendorTag = CDBClusterVendorTagForClusterID($clusterID) ) {
-    printf("<b>Vendor:</b>&nbsp;&nbsp;%s<br>\n",$vendorTag);
-  }
-  if ( $udPropTag = CDBClusterUDPropertyTagForClusterID($clusterID) ) {
-    printf("<b>UD Property Tag:</b>&nbsp;&nbsp;%s<br>\n",$udPropTag);
-  }
-  echo "<br>";
-  CDBListNodes($clusterID);
-  echo "<br>\n<b>Inventory:</b><br>\n<table border=\"0\"><tr valign=\"bottom\"><td>";
-  CDBListAssets($clusterID);
-  echo "</td><td>";
-  CDBAssetsLegend();
-  echo "</td></tr></table>\n\n";
-  if ( CDBClusterHasWebInterface($clusterID) ) {
-    printf("<a href=\"http://squidward.che.udel.edu/\">Cluster status</a> web pages.<br>\n");
-  }
-}
-</php>
-Links to additional information:
-  * [[pdu|Power Distribution Unit layouts]]
-  * [[ammasso|Ammasso ethernet adapter info]]
-  * [[mpich-selection|MPICH selection]]
-  * [[c2050-info|Using the Tesla C2050 node]]
-  * [[plankton|The CCEI storage appliance]]
-  * [[nbo6|Using NBO6 with Gaussian '09]]
-===== [2011-05-13] File Server Upgrade =====
-The design of Squidward has come full-circle:  in the last iteration of upgrades, a new head node was purchased with what should have been a high-availability RAID disk system and the old non-RAID file server was retired.  This gained a level of redundancy if users' data, but the 3ware RAID solution turned out to scale pretty poorly when 48 compute nodes were accessing it.  So now we return to hosting user home directories on an external file server.
-The new file server needed to be:
-  * Easily integrated in the existing Squidward infrastructure
-  * Scalable
-    * Add more storage space easily and transparently
-    * Perform at least as well as the 3ware solution
-Performance of file servers comes at a premium, and funding for the file server was limited in this case.  A parallel system like LUSTRE would have been desirable, but cost was prohibitive.
-An EonNAS 5100N Network-Attached Storage (NAS) system was purchased and added to Squidward.  The appliance uses "storage pools" so adding more storage amounts to inserting a hard drive and telling the appliance to start using that disk, as well (hence, transparent capacity scaling).  The appliance currently has a capacity of 10 TB.  The "disks" involved are logical devices in an EonSTOR RAID enclosure; each "disk" is composed of six (6) 2 TB hard disks in a RAID6 set.  RAID6 essentially makes two duplicates of the data on the disks, so that even if two disks were to fail at the same time the filesystem should be recoverable.  This equates to better protection of users' home directory data -- though it doesn't mean users shouldn't keep copies of truly important data off-system.
-User home directories are now mounted at ''/home/vlachos/{username}'' where before they were mounted at ''/home/{username}''.
-===== [2011-05-13] Cluster OS Upgrade, Cleanup =====
-Squidward had been running uninterrupted for over 550 days, so the time had come for some OS updates and general cleanup.
-  * Squidward's head node and compute nodes' VNFS images have been updated to CentOS 4.9, with kernel 2.6.19-100.
-  * In preparation for the removal of the first-generation nodes from the cluster, the nodes (names in ''node00-##'') have been turned off.
-  * First-generation nodes have been removed from GridEngine.  The ''amso.q'' queue that serviced those nodes alone has been removed.
-  * The new (third-generation) nodes which will be added to Squidward in the near future are the same as the second-generation nodes save for core count and memory size.  This makes the ''myricom.q'' queue unnecessary:  since all nodes are the same, they can all just go in the default ''all.q'' queue.
-So GridEngine is now setup in a simpler fashion than before.  The default queue is now the only queue.  Since all node differentiation has historically come from the parallel environment you choose for your job, you should not need to change how you submit your jobs.