This is an old revision of the document!
squidward.che.udel.edu
Vendor: TeamHPC #75989
UD Property Tag: 143757
Operating System | Architecture | RAM | |
---|---|---|---|
(24) compute nodes | CentOS release 4.9 (Final) | 2 x AMD Opteron 2352 (4 cores) @ 2100 MHz | 4096 MB |
head node | CentOS release 4.9 (Final) | 2 x AMD Opteron 2352 (4 cores) @ 2100 MHz | 8192 MB |
Inventory:
|
|
Links to additional information:
[2011-05-13] File Server Upgrade
The design of Squidward has come full-circle: in the last iteration of upgrades, a new head node was purchased with what should have been a high-availability RAID disk system and the old non-RAID file server was retired. This gained a level of redundancy if users' data, but the 3ware RAID solution turned out to scale pretty poorly when 48 compute nodes were accessing it. So now we return to hosting user home directories on an external file server.
The new file server needed to be:
- Easily integrated in the existing Squidward infrastructure
- Scalable
- Add more storage space easily and transparently
- Perform at least as well as the 3ware solution
Performance of file servers comes at a premium, and funding for the file server was limited in this case. A parallel system like LUSTRE would have been desirable, but cost was prohibitive.
An EonNAS 5100N Network-Attached Storage (NAS) system was purchased and added to Squidward. The appliance uses “storage pools” so adding more storage amounts to inserting a hard drive and telling the appliance to start using that disk, as well (hence, transparent capacity scaling). The appliance currently has a capacity of 10 TB. The “disks” involved are logical devices in an EonSTOR RAID enclosure; each “disk” is composed of six (6) 2 TB hard disks in a RAID6 set. RAID6 essentially makes two duplicates of the data on the disks, so that even if two disks were to fail at the same time the filesystem should be recoverable. This equates to better protection of users' home directory data – though it doesn't mean users shouldn't keep copies of truly important data off-system.
User home directories are now mounted at /home/vlachos/{username}
where before they were mounted at /home/{username}
.
[2011-05-13] Cluster OS Upgrade, Cleanup
Squidward had been running uninterrupted for over 550 days, so the time had come for some OS updates and general cleanup.
- Squidward's head node and compute nodes' VNFS images have been updated to CentOS 4.9, with kernel 2.6.19-100.
- In preparation for the removal of the first-generation nodes from the cluster, the nodes (names in
node00-##
) have been turned off. - First-generation nodes have been removed from GridEngine. The
amso.q
queue that serviced those nodes alone has been removed. - The new (third-generation) nodes which will be added to Squidward in the near future are the same as the second-generation nodes save for core count and memory size. This makes the
myricom.q
queue unnecessary: since all nodes are the same, they can all just go in the defaultall.q
queue.
So GridEngine is now setup in a simpler fashion than before. The default queue is now the only queue. Since all node differentiation has historically come from the parallel environment you choose for your job, you should not need to change how you submit your jobs.