![]() |
![]() |
![]() | UDCECC | ![]() | ![]() | ChemE | ![]() |
PERCEUS is a new clustering solution from the makers of Warewulf. The instructions contained herein are for version 1.3. The central source of documentation at this point is http://www.perceus.org/portal/book.
The source package for PERCEUS 1.3 was obtained via Subversion, from the PERCEUS development repository:
% svn co https://perceus.org/svn/perceus/perceus/1.3
Don’t be alarmed, the package doesn’t come with a configure script! Execute the autogen.sh script, though, and if all goes well you’ll have yourself a configure script to run.
Personally, I don’t like third-party products invading the /usr filesystem on my systems. My tendency is toward installing into /opt, yielding a structure something akin to:
% ls -l /opt/perceus lrwxrwxrwx 1 root root 3 Jun 27 13:51 current -> 1.3 drwxr-xr-x 9 root root 4096 Jun 26 12:37 1.3 drwxr-xr-x 9 root root 4096 Jun 26 12:37 1.2.1 /opt/perceus/1.3 : drwxr-xr-x 2 root root 4096 Jun 26 12:37 bin drwxr-xr-x 2 root root 4096 Jun 26 12:37 include drwxr-xr-x 3 root root 4096 Jun 26 12:37 lib drwxr-xr-x 3 root root 4096 Jun 26 12:37 libexec drwxr-xr-x 2 root root 4096 Jun 26 12:37 sbin drwxr-xr-x 4 root root 4096 Jun 26 12:37 share drwxr-xr-x 13 root root 4096 Jun 26 12:36 src
The source for each version resides inside the src subdirectory, and all source configurations begin with a –prefix pointing to the top of the versioned directory that contains src.
PERCEUS installs fairly easily in this manner, but still puts some of its configuration out in /etc/perceus. One other caveat is that PERCEUS maintains a state directory that, in particular, houses the Berkeley DB files containing all of the provisioning data, the DHCP leases served-out by the dnsmasq daemon, and the VNFS filesystems. The first time I configured and built PERCEUS I used only –prefix=/opt/perceus/1.3 and as a result this state directory wound up in /opt/perceus/1.3/usr/var/lib/perceus which was then shared-out to the compute nodes. This alone would not have been a bad thing, except for the fact that I was already sharing /opt and had a mount of it in the nodes’ fstab. The /opt filesystem was also configured to be a hybrid link in the VNFS capsule. What resulted was a rather ridiculous set of hard to diagnose errors and inconsistencies as the nodes’ attempted to mount /opt over top of what wound up being a symlink in the VNFS image (due to the marking of /opt as a hybrid exclusion). The same thing happened with my attempt to export and NFS-mount /usr/X11R6 from the head node1). The solution to my mount-point woes was to un-hybridize both /opt and /usr/X11R6 in the capsule’s related configuration file (state directory is /usr/var/lib and the capsule is named centos-4.5-1.stateless.x86_64:
% cd /usr/var/lib/perceus/vnfs/centos-4.5-1.stateless.x86_64/ % cat hybridize # Hybridization can be done pending several scenerios are met. First, the # "vnfs transfer method" in the perceus.conf must be set to "nfs". Second, # you can NOT hybridize the directory that you configured to be the --localstatedir # when Perceus was compiled (if installed via RPM, it is usually set to # /var/lib/perceus) otherwise it is under --prefix(/var/lib/perceus). # # To activate any changes in this file you must mount and then umount this # VNFS using Perceus: # # > perceus vnfs mount centos-4.5-1.stateless.x86_64.vnfs # > perceus vnfs umount centos-4.5-1.stateless.x86_64.vnfs /usr/share /usr/local #/usr/X11R6 /usr/lib/locale /usr/lib64/locale #/opt
Notice that my PERCEUS state lives in /usr/var/lib; when I configured the source package the second time I also included –localstatedir=/usr/var/lib. This was the second change I made to avoid PERCEUS’ need to share a subdirectory of /opt.
One thing I liked about Warewulf was the fact that it was very simple: nodes’ addressing information was added to the /etc/hosts file, a nodegroup was setup to allow a certain set of users/groups and those users and groups were automatically distributed – in addition to the system users in a VNFS image’s /etc/passwd and /etc/group file – to the nodes. DHCP leasing was handled by the standard Linux DHCP daemon.
The initial run of the perceus utility will help you generate the initial preference files; what follows are some edits that I made of those files.
PERCEUS uses a combination DHCP + DNS daemon running on the master node. The /etc/hosts files remain empty, save for what you statically add to them. The dnsmasq daemon is configured in /etc/perceus/dnsmasq.conf:
interface=eth0 enable-tftp tftp-root=/usr/var/lib/perceus//tftp dhcp-boot=pxelinux.0 local=// domain=darkstar expand-hosts dhcp-range=172.16.64.1,172.16.64.254 dhcp-lease-max=21600 read-ethers
There’s no documentation to state what a lot of this stuff is; when you first run the perceus executable it should prompt you for the necessary settings in this and other configuration files. Pay particular attention to that domain line: you can’t leave it empty! And I don’t much like using the head node’s domain, which is the default value assigned – the nodes are only visible on the cluster, so I don’t really need a fully-qualified domain for them2). Whatever domain you use, just go ahead and add it as a search domain on the head node’s /etc/resolv.conf and you’ll still be able to use unqualified names. Nodes boot with a /etc/resolv.conf that points back to the dnsmasq daemon and include the domain from /etc/perceus/dnsmasq.conf as a search domain, so they should handle unqualified hostnames, as well.
In /etc/perceus/defaults.conf you’ll find the default values assigned to certain properties when a new node is provisioned:
# This is the template name for all new nodes as they are configured. # Define the node name range. The '#' characters symbolize the node number # in the order of initalized. Node Name = node## # What is the default group for new nodes (this doesn't have to exist # anywhere before hand) Group Name = darkstar # Define the default VNFS image that should be assigned to new nodes Vnfs Name = centos-4.5-1.stateless.x86_64 # Are new nodes automatically enabled and provisionined? Enabled = 1
The cluster in question has only 8 nodes, so there’s no reason to go with a four-digit numbering scheme (which is the default). The Group Name is a rather cool attribute: nodes can be provisioned into distinct groups, and subsequently operations can be performed on entire node groups (disable all nodes, etc). I wiped the default and named my default group after the cluster itself. You need not provide a value for the default VNFS, but I figured why not be as sure as possible that PERCEUS is using the VNFS capsule I want it to!
PERCEUS implements the concept of modules that extend its behavior with respect to provisioning and configuration of VNFS capsules when served to nodes. An excellent example of this is the passwdfile and groupfile modules, which push a constructed passwd and group file to nodes when they boot – gotta do something like this if you want your cluster users to actually be able to login to nodes.
Modules are activated (and deactivated) using the perceus utility:
% perceus module activate groupfile init/all
% perceus module activate passwdfile init/all
% perceus module summary
groupfile:
init/all
hostfile:
hostname:
ipaddr:
modprobe:
passwdfile:
init/all
syncnodes:
I won’t go into what init/all means; check the real PERCEUS documentation for that. I have it on good authority that it’s relatively easy to create modules of your own: for the sake of rsh I’ll probably be making a hosts.equiv module at some point.
Once these modules are activated you must install pieces of a passwd and group file in /etc/perceus/modules/{passwdfile,groupfile}. These directories contain a base file, named all, and directories that allow you to fine-tune the user and group list on a per-node-group, per-node, and per-VNFS basis; only those pieces that you’ve activated will be processed, though (hence the all in the module activate statements above). The module builds using all applicable entities for a particular node, starting with all and appending anything from the directory for the node’s node-group, VNFS capsule, and finally the per-node directory.
Major caveat, though: if you activate these modules don’t expect the contents of all to be appended to the extant /etc/passwd or /etc/group file from the VNFS capsule! When I first activated them I added only cluster users and groups into the all files for the modules, and when next I booted the nodes the user root no longer existed! So the all files must contain at least the complement of system users for cluster nodes.
I created a base CentOS 4.5 capsule using the instructions on the PERCEUS website:
% /opt/perceus/1.3/share/perceus/vnfs-scripts/centos-4.5-genchroot.sh % /opt/perceus/1.3/share/perceus/vnfs-scripts/chroot2stateless.sh \ > /var/tmp/vnfs/centos-4.*/ /opt/perceus/centos-4.5-1.stateless.x86_64.vnfs % perceus vnfs import /opt/perceus/centos-4.5-1.stateless.x86_64.vnfs
I saved the original chroot image in /opt/perceus so that in the future if I need to create a fresh VNFS capsule I needed redo the lengthy chroot filesystem creation; plus, I’m assured that the next capsule starts with the same base as the current one on which I’m working.
The capsule is “mounted” using the perceus utility:
% perceus vnfs mount centos-4.5-1.stateless.x86_64
The “mount” will appear in /mnt/centos-4.5-1.stateless.x86_64 and you can treat that directory as a chroot filesystem:
% yum --installroot=/mnt/centos-4.5-1.stateless.x86_64 update % cp /etc/profile.d/matlab.* /mnt/centos-4.5-1.stateless.x86_64/etc/profile.d % cp /etc/init.d/gmond /mnt/centos-4.5-1.stateless.x86_64/etc/init.d % chroot /mnt/centos-4.5-1.stateless.x86_64 /sbin/chkconfig --add gmond % chroot /mnt/centos-4.5-1.stateless.x86_64 /sbin/chkconfig --levels=345 gmond on :
When you have finished making modifications, commit them back to the capsule using the command
% perceus vnfs umount centos-4.5-1.stateless.x86_64
One special note: while a VNFS capsule is mounted nodes will not boot properly (at least in 1.3 they would complain that the VNFS was not present).
There are some instructions on the PERCEUS web site regarding the installation of the Sun GridEngine execution daemon in VNFS capsules. While these instructions do technically get sgeexecd started on the nodes, they don’t do anything at all to configure those nodes to actually be a GridEngine worker!
My standard setup is to have GridEngine install in /opt/GridEngine and (as already demonstrated) mount /opt on cluster nodes. As shown in the documentation, I do add the sgeexecd startup to the VNFS. But after provisioning a node, I login and do the following:
[head]% qconf -ah node00 # node must be allowed to act as an
# admin host for now
[head]% ssh node00
[node00]% service sgeexecd stop # stop the daemon if it was started
[node00]% cd /opt/GridEngine # or wherever you have it
[node00]% ./install_execd
You’ll be prompted through the setup, and in the process the node will be setup on the GridEngine queue-master, too, with appropriate metrics for scheduling purposes as well as a default queue for serial jobs. This is an invaluable time-saver in my book: if everything is setup properly on the queue-master the process takes about a minute on each node; and on the next reboot of the node the sgeexecd in the VNFS capsule will startup once again, but this time will actually be ready to do work!
When you’re done doing all this, it’s usually best to remove the nodes’ status as admin hosts:
[head]% qconf -dh node00
This does not affect their ability to run jobs – they’ll still be execution hosts.
I also install Ganglia’s gmond on the nodes (the commands were shown above but I’ll reproduce my exact M.O. here)3). I install Ganglia on (you guessed it) /opt. Assuming I’ve mounted the VNFS capsule at /mnt/centos-4.5-1.stateless.x86_64:
% cp /opt/ganglia-3.0.3/sbin/gmond /mnt/centos-4.5-1.stateless.x86_64/etc/init.d % chroot /mnt/centos-4.5-1.stateless.x86_64 /sbin/chkconfig --add gmond % chroot /mnt/centos-4.5-1.stateless.x86_64 /sbin/chkconfig --levels=345 gmond on % cp /etc/gmond.conf /mnt/centos-4.5-1.stateless.x86_64/etc
I’ve always reverted my Ganglia monitoring from using multicast groups to doing simple broadcasts to the head node of the cluster, period: no reason for compute nodes to maintain state info for all the other nodes of the cluster, honestly, and if that’s the case there’s really no resource-savings to using multicast anyway. To boot, I don’t see the reason to keep TCP channels open for low-importance monitoring data, so I use UDP. Here are the important sections of the gmond.conf:
cluster {
name = "darkstar.coastal"
owner = "that guy who ponied-up the cash"
latlong = "unspecified"
url = "unspecified"
}
host {
location = "darkstar.coastal"
}
udp_send_channel {
host = 172.16.100.1
port = 8649
}
udp_recv_channel {
bind = 172.16.100.1
port = 8649
}
Of course, the receive channel is not actually necessary on the cluster nodes since nothing will be sending them data.
/usr filesystem, rather than isolating it in /usr/X11R6. But, the VNFS capsule methodology should actually allow one to hybridize all of /usr and make this change somewhat moot./etc/hosts was perfectly satisfactory for me in Warewulf.