Sentnix 0.70.5 is a clustering/grid or Openmosix patched Linux Operating systems. After it installs to a hard disk drive and boots to Openmosix there is no need of manual installation or configuration over and over again. It will add itself to a cluster automatically or start a new cluster. It also boots from a CD. If the nodes can dhcp an IP address that is in the masked range of a cluster, it will automatically add itself to the cluster.
Part of the description from the Sentinix web page, "SENTINIX is a GNU/Linux distribution designed for monitoring, intrusion detection, penetration testing, auditing, statistics/graphing and anti-spam. It's completely free; free to use, free to modify and free to distribute. SENTINIX includes the following software, installed and pre-configured; Nagios, Nagat, Snort, SnortCenter, ACID, Cacti, RRDTool, Nessus, Postfix, MailScanner, SpamAssassin, openMosix, MySQL, Apache, PHP, Perl, Python and lots more. "
Cluster Knoppix 3.3 – is a CD bootable way to add node to a cluster also.
In testing I have tried turning a machine/node off and back on while executing six awk processes on a three node cluster. Two of the processes failed to complete, when the node came back up the processes themselves back to the cluster processing and started working on processes.
I have been running a 6 node cluster of 600 MHz machines for a few weeks under a light load. It is a mixed Sentinix and Cluster Knoppix cluster. The cluster is working fine. I have run the test program on it once in its entirety and a couple of partials runs to demo multiprocesses being distributed to each node. I did over 60 billion math calculations in about 2 hrs and 25 minutes.
Because “top” shows the process running on the originating node and the completion information happens on the originating node, I surmise that if the originating node goes down then all the processed data can be lost.
The advantages that I have observed to this kind of clustering are :
The most obvious down side is that it has to be isolated from users that may insert a hostile node. This problem is particular for automatic Openmosix clustering. Security can be increased by increasing the difficulty to add a node. I have read that mosix clusters can use different authentication schemes. Another approach to reducing this problem is to have a secured master node/nodes that interface to the unsecured network and put the attached unsecured nodes on a dedicated/private network.
A version of Povray has been written for cluster graphics rendering. I haven’t tried it. This should be one of the more useful things for a cluster to do.
I haven’t been able to get the Numpy module to work because I don’t know where the multiarray module is.
I am starting to focus my attention on MPI Blast. It has the potential of being the most immediately useful to us by speeding up searches on large databases. It looks like anytime we have a search to do on a 1gig DB or bigger this is the way to do it. A 40 node cluster with .25 gig on each node would have 10gigs of available memory. This would allow searching and basic manipulation of fairly large DBs.
A 40 node rack mounted cluster with 2 ghz and 512 meg sdram blade servers could be put together for less than $20k. It would be about 80 ghz in processing speed and 20 gig of ram memory.
This could produce a possibly very fast search engine.
There is a parallel version of postgres, it is called Clusgres. I need to find out more about it .
It is a given that a clusters could be used for faster statistical number crunching. If we have a need for large matrix solving it would also be helpful. Some of this we haven’t explored because we have had no way to inexpensively do it in the past.
It looks like it would be useful for cracking passwords not only when it may become necessary but also to better understand the actual security risk a low cost openmosix cluster posses to our security.
Running the real video server and encoder on an openmosix should be beneficial not only for server and encoder but voice recognition captioning also.
There are other cluster monitoring gui gadgets that I haven’t mentioned because right now they only seem to be fluff.
Python and Java applications are almost tailored to be cluster adaptable but C++ and Perl programs appear to be real difficult to adapt to clustering/griding. In C there is something called a green thread. It is supposed distribute well across a cluster. Not all threads distribute well.
There is a collection of stress test programs for Openmosix clusters that generates a log of its stress test. Internal to it is a Perl program that does statistical analysis on multiple stocks at the same time. While this program is not much help to us as is, It looks like it may be of use to us in two other ways.
1. An example of how to write a perl cluster program.
2. A pattern for doing a statistical cluster program for looking at multilevel educational information.
It could also enhance email and ftp server. I can’t think of a lot it can do for ftp except search resident memory file searches could make the finding part of a large file system faster. Email on the other hand seems to have more possibilities. Besides distributing sendmail-qmail-postfix process and list servers it could help spamassasin particular its Bayesian filter. Neural Networks may be more helpful than Bayesian filters for this and their potential would be enhanced by clustering/grid. Bayesian filters and Neural Networks working together may be the way to go. Bayesian filters by themselves have been hard for small shops to maintain. The spammers have easily circumvented them.
Of course Neural Networks/AI applications would have more possibilities than just email. Opencyc is an AI program that would probably benefit form a cluster.
OmFS could be used as type of raid of old hard drives for an inexpensive storage array. Seems to be something you might want to do if the machine runs out of all other usefulness. But you realize that if you are getting ride of 100 – 600mhz machines with 20 gig hard drives you could build a 2 terabyte raw drive array out of it instead. The drive sizes don’t particularly have to match for a larger drive to be partially useful in a smaller drive collection. There may be some maintenance questions also. This is probably not particularly fast.
There are some circuit and electricity value loses that are not negligible.
All the computers don’t have to stay up - on all the time. You can bring the nodes up before you need them or even after you start execution of some cpu or memory intensive programs as you need them. There is an obvious ability for the cluster to redistribute cpu and memory usage, as nodes are added.
I tested this on a 7 node cluster. I turned off 3 nodes and started 7 awk loops . The first time I did this, 2 nodes added themselves right away. The 3rd node I had to turn on manually with a different switch, it came up but didn’t add itself to the cluster. I rebooted it and it added itself to the cluster but only one awk process was still running. I was able to turn off the 3 nodes while the process was running on another node. All the processes completed successfully.
The second time the 1st 2 nodes added them selves in less than a minute the 3rd node still didn’t come up but when I rebooted it added itself to the cluster and started processing.
Other problems:
Terms
Openmosix – command in Openmosix to start and stop open mosix
Omdiscd – command to find Openmosix cluster and attach it to cluster
Openmosixview – GUI gadget for viewing cluster CPU and memory usage
Mosmon – ASCII text monitor for viewing CPU usage
Mtop – besides normal process information it also tells you what node the process originated on this node are running on.
Openmosixcollector – history and information collector
Openmosixapplet – a web applet that displays node usage
Sigshm – node distributive memory management for use by other nodes
OmFS – Openmosix file system – allows you to attach disk storage from other nodes.
MPI – a framework that allows programs to be distributed across the nodes of a cluster.
MPIBlast – A database system that tries to distribute tables to ram memory of cluster to speedup processing across the cluster. (Is presently being used in genetic research to find genetic sequence pairs)
Python things
Zope – a Web server written in Python that is Python Extensible
Plone – a Workflow program that is written in Python that runs with Zope (Governor’s office uses this to mange workflow on zope server)
Gadfly – a Database and database server building program written in Python
Numpy - A python module for numerical processing in Python
Multiarry – a python module required by Numpy that is hard to find and is not well documented