| 1 |
[[T(Header|!AsynCluster|Asynchronous Cluster Computing)]] |
|---|
| 2 |
|
|---|
| 3 |
OK, you're probably looking at the !AsynCluster source and thinking, "Hey, this |
|---|
| 4 |
is cool, but why can't this guy seem to write any documentation?" The answer, |
|---|
| 5 |
as with a lot of free & open source software, is that the ''author'' needs no |
|---|
| 6 |
documentation; he understands how to use this code just fine and is busily |
|---|
| 7 |
doing so! Explaining it to other people is something that sadly gets put off |
|---|
| 8 |
and forgotten. Another reason is that -- let's face it -- it's usually a whole |
|---|
| 9 |
lot more fun to write code than to write documentation about it. The one saving |
|---|
| 10 |
grace is that the classes and methods in this code do tend to have ample |
|---|
| 11 |
docstrings, and those result in pretty decent |
|---|
| 12 |
[http://foss.eepatents.com/trac/AsynCluster/api API] documentation. |
|---|
| 13 |
|
|---|
| 14 |
Anyhow, let's take a look at how ''you'' can put !AsynCluster to work to run |
|---|
| 15 |
computing jobs on a cluster of PCs or CPU cores. |
|---|
| 16 |
|
|---|
| 17 |
|
|---|
| 18 |
== Installation == |
|---|
| 19 |
|
|---|
| 20 |
Make sure you have [http://twistedmatrix.com Twisted] installed on all the PCs |
|---|
| 21 |
that will be running !AsynCluster. Then install !AsynCluster, and customize the |
|---|
| 22 |
{{{/etc/asyncluster.conf}}} config file. |
|---|
| 23 |
|
|---|
| 24 |
One of the PCs will be your master, and the rest will be computing nodes. (If |
|---|
| 25 |
you have a multi-core CPU on the master PC, you will probably want to run one |
|---|
| 26 |
or more node processes on it, too.) The config file has a common section, a |
|---|
| 27 |
section that is only used by the master server, and a section that is used to |
|---|
| 28 |
specify how nodes connect to the master as TCP clients. |
|---|
| 29 |
|
|---|
| 30 |
You can check out the config file template that comes with the package |
|---|
| 31 |
[http://foss.eepatents.com/trac/AsynCluster/source/misc/etc_asyncluster.conf here]. |
|---|
| 32 |
Let's start with the '''server''' section, which is used by the master PC: |
|---|
| 33 |
|
|---|
| 34 |
{{{ |
|---|
| 35 |
# AsynCluster Client & Server Common Configuration File |
|---|
| 36 |
|
|---|
| 37 |
#--- Server-specific config items ------------------------- |
|---|
| 38 |
[server] |
|---|
| 39 |
|
|---|
| 40 |
# URL to Privilege & Usage Database |
|---|
| 41 |
database = DEFINE_A_URL |
|---|
| 42 |
|
|---|
| 43 |
# Comma-separate list of accepted client address definition(s) |
|---|
| 44 |
# Example: "subnets = 127.0.0.1, 192.168.1.0/24" |
|---|
| 45 |
subnets = 127.0.0.1, 192.168.135.0/24 |
|---|
| 46 |
|
|---|
| 47 |
}}} |
|---|
| 48 |
|
|---|
| 49 |
Specify a URL of a ''database'' that you'll be using to keep track of the |
|---|
| 50 |
privileges and usage of the people using your cluster nodes as |
|---|
| 51 |
workstations. The format is explained in the |
|---|
| 52 |
[http://www.sqlalchemy.org/docs/04/dbengine.html#dbengine_establishing documentation] |
|---|
| 53 |
for the underlying SQLAlchemy package. (Now there's a guy who knows how to |
|---|
| 54 |
document his code!) If you don't care about restricting and monitoring user |
|---|
| 55 |
access on the nodes, you can use {{{sqlite://:memory:}}} as your URL to have |
|---|
| 56 |
things hum away on an in-memory SQLite database that will simply evaporate on |
|---|
| 57 |
power-down. |
|---|
| 58 |
|
|---|
| 59 |
You can specify one or more ''subnets'' that match all clients you expect to |
|---|
| 60 |
have connecting to the master. The default permits connections from the master |
|---|
| 61 |
PC itself, ''e.g.'', for multi-core usage, and from the localnet IP address |
|---|
| 62 |
from 192.168.135.1 to 192.168.135.255. |
|---|
| 63 |
|
|---|
| 64 |
The '''client''' section is used by the nodes, defining how they connect as TCP clients to |
|---|
| 65 |
the master: |
|---|
| 66 |
|
|---|
| 67 |
{{{ |
|---|
| 68 |
#--- Client-specific config items ------------------------- |
|---|
| 69 |
[client] |
|---|
| 70 |
|
|---|
| 71 |
# Server host for node-master TCP connections |
|---|
| 72 |
host = main |
|---|
| 73 |
|
|---|
| 74 |
# User name for the client connection |
|---|
| 75 |
user = test |
|---|
| 76 |
|
|---|
| 77 |
# Password for the client connection |
|---|
| 78 |
password = YOU-MUST-CHANGE-THIS |
|---|
| 79 |
}}} |
|---|
| 80 |
|
|---|
| 81 |
It's pretty self-explanatory. The ''host'' is a qualified hostname or IP |
|---|
| 82 |
address. The ''user'' is a user name that is assigned to the node, not to any |
|---|
| 83 |
user accessing the node as a workstation. The ''password'' is in plain text. |
|---|
| 84 |
|
|---|
| 85 |
The '''common''' section is next: |
|---|
| 86 |
|
|---|
| 87 |
{{{ |
|---|
| 88 |
#--- Common config items ---------------------------------- |
|---|
| 89 |
[common] |
|---|
| 90 |
|
|---|
| 91 |
# Server port for node-master TCP connections |
|---|
| 92 |
tcp port = 9080 |
|---|
| 93 |
|
|---|
| 94 |
# UNIX Socket for master control connections |
|---|
| 95 |
socket = /tmp/.ndm |
|---|
| 96 |
|
|---|
| 97 |
# Server password for reverse login to client |
|---|
| 98 |
server password = YOU-MUST-CHANGE-THIS-TOO |
|---|
| 99 |
}}} |
|---|
| 100 |
|
|---|
| 101 |
The nodes connect to the master via the specified ''tcp port''. There is also a |
|---|
| 102 |
control client that runs on the master, which we'll be discussing a bit |
|---|
| 103 |
later. It connects via a UNIX domain ''socket''. |
|---|
| 104 |
|
|---|
| 105 |
When running jobs, the nodes will be accepting chunks of unknown Python code |
|---|
| 106 |
from the master. To be a bit more comfortable with that leap of faith, the |
|---|
| 107 |
nodes require the server to authenticate itself to the client after the client |
|---|
| 108 |
has satisfied the server with its own login. Set a ''server password'' for that |
|---|
| 109 |
reverse login. Theoretically, a hostile server that you accidentally connect to |
|---|
| 110 |
could spit your client login password back to you in a reverse login attempt, |
|---|
| 111 |
so use a different password here. (That's all very hypothetical, but why not |
|---|
| 112 |
use the extra security?) |
|---|
| 113 |
|
|---|
| 114 |
Now, if you are going to use the [wiki:NDM Node Display Manager], you'll want |
|---|
| 115 |
to configure the '''display''' section: |
|---|
| 116 |
|
|---|
| 117 |
{{{ |
|---|
| 118 |
#--- Display manager items -------------------------------- |
|---|
| 119 |
[display] |
|---|
| 120 |
|
|---|
| 121 |
# NDM Window size in pixels (fixed) |
|---|
| 122 |
size = 300, 200 |
|---|
| 123 |
|
|---|
| 124 |
# The window manager to launch for a new user session |
|---|
| 125 |
window manager = /usr/bin/startkde |
|---|
| 126 |
|
|---|
| 127 |
# Niceness level at which to run the window manager and thus all programs |
|---|
| 128 |
# launched by the user from there |
|---|
| 129 |
niceness = 10 |
|---|
| 130 |
}}} |
|---|
| 131 |
|
|---|
| 132 |
The default window manager is KDE, but I've actually switched to |
|---|
| 133 |
[http://icewm.org/ IceWM] for simplicity and ease of maintenance. The correct |
|---|
| 134 |
''window manager'' value for that configuration is |
|---|
| 135 |
{{{/usr/bin/icewm-session}}}. |
|---|
| 136 |
|
|---|
| 137 |
You can annoy your workstation users and give your jobs more CPU time by |
|---|
| 138 |
setting a low-priority ''niceness'' level for the user code. |
|---|
| 139 |
|
|---|
| 140 |
|
|---|
| 141 |
== A Simple Cluster Computing Job == |
|---|
| 142 |
|
|---|
| 143 |
== Running the Job == |
|---|
| 144 |
|
|---|
| 145 |
== Conclusions == |
|---|