root/projects/AsynCluster/trunk/doc/AsynCluster_Example.txt

Revision 117, 5.4 kB (checked in by edsuom, 1 year ago)

Working on usage example

Line 
1 [[T(Header|!AsynCluster|Asynchronous Cluster Computing)]]
2
3 OK, you're probably looking at the !AsynCluster source and thinking, "Hey, this
4 is cool, but why can't this guy seem to write any documentation?" The answer,
5 as with a lot of free & open source software, is that the ''author'' needs no
6 documentation; he understands how to use this code just fine and is busily
7 doing so! Explaining it to other people is something that sadly gets put off
8 and forgotten. Another reason is that -- let's face it -- it's usually a whole
9 lot more fun to write code than to write documentation about it. The one saving
10 grace is that the classes and methods in this code do tend to have ample
11 docstrings, and those result in pretty decent
12 [http://foss.eepatents.com/trac/AsynCluster/api API] documentation.
13
14 Anyhow, let's take a look at how ''you'' can put !AsynCluster to work to run
15 computing jobs on a cluster of PCs or CPU cores.
16
17
18 == Installation ==
19
20 Make sure you have [http://twistedmatrix.com Twisted] installed on all the PCs
21 that will be running !AsynCluster. Then install !AsynCluster, and customize the
22 {{{/etc/asyncluster.conf}}} config file.
23
24 One of the PCs will be your master, and the rest will be computing nodes. (If
25 you have a multi-core CPU on the master PC, you will probably want to run one
26 or more node processes on it, too.) The config file has a common section, a
27 section that is only used by the master server, and a section that is used to
28 specify how nodes connect to the master as TCP clients.
29
30 You can check out the config file template that comes with the package
31 [http://foss.eepatents.com/trac/AsynCluster/source/misc/etc_asyncluster.conf here].
32 Let's start with the '''server''' section, which is used by the master PC:
33
34 {{{
35 # AsynCluster Client & Server Common Configuration File
36
37 #--- Server-specific config items -------------------------
38 [server]
39
40 # URL to Privilege & Usage Database
41 database = DEFINE_A_URL
42
43 # Comma-separate list of accepted client address definition(s)
44 # Example: "subnets = 127.0.0.1, 192.168.1.0/24"
45 subnets = 127.0.0.1, 192.168.135.0/24
46
47 }}}
48
49 Specify a URL of a ''database'' that you'll be using to keep track of the
50 privileges and usage of the people using your cluster nodes as
51 workstations. The format is explained in the
52 [http://www.sqlalchemy.org/docs/04/dbengine.html#dbengine_establishing documentation]
53 for the underlying SQLAlchemy package. (Now there's a guy who knows how to
54 document his code!) If you don't care about restricting and monitoring user
55 access on the nodes, you can use {{{sqlite://:memory:}}} as your URL to have
56 things hum away on an in-memory SQLite database that will simply evaporate on
57 power-down.
58
59 You can specify one or more ''subnets'' that match all clients you expect to
60 have connecting to the master. The default permits connections from the master
61 PC itself, ''e.g.'', for multi-core usage, and from the localnet IP address
62 from 192.168.135.1 to 192.168.135.255.
63
64 The '''client''' section is used by the nodes, defining how they connect as TCP clients to
65 the master:
66
67 {{{
68 #--- Client-specific config items -------------------------
69 [client]
70
71 # Server host for node-master TCP connections
72 host = main
73
74 # User name for the client connection
75 user = test
76
77 # Password for the client connection
78 password = YOU-MUST-CHANGE-THIS
79 }}}
80
81 It's pretty self-explanatory. The ''host'' is a qualified hostname or IP
82 address. The ''user'' is a user name that is assigned to the node, not to any
83 user accessing the node as a workstation. The ''password'' is in plain text.
84
85 The '''common''' section is next:
86
87 {{{
88 #--- Common config items ----------------------------------
89 [common]
90
91 # Server port for node-master TCP connections
92 tcp port = 9080             
93
94 # UNIX Socket for master control connections
95 socket = /tmp/.ndm
96
97 # Server password for reverse login to client
98 server password = YOU-MUST-CHANGE-THIS-TOO
99 }}}
100
101 The nodes connect to the master via the specified ''tcp port''. There is also a
102 control client that runs on the master, which we'll be discussing a bit
103 later. It connects via a UNIX domain ''socket''.
104
105 When running jobs, the nodes will be accepting chunks of unknown Python code
106 from the master. To be a bit more comfortable with that leap of faith, the
107 nodes require the server to authenticate itself to the client after the client
108 has satisfied the server with its own login. Set a ''server password'' for that
109 reverse login. Theoretically, a hostile server that you accidentally connect to
110 could spit your client login password back to you in a reverse login attempt,
111 so use a different password here. (That's all very hypothetical, but why not
112 use the extra security?)
113
114 Now, if you are going to use the [wiki:NDM Node Display Manager], you'll want
115 to configure the '''display''' section:
116
117 {{{
118 #--- Display manager items --------------------------------
119 [display]
120
121 # NDM Window size in pixels (fixed)
122 size = 300, 200
123
124 # The window manager to launch for a new user session
125 window manager = /usr/bin/startkde
126
127 # Niceness level at which to run the window manager and thus all programs
128 # launched by the user from there
129 niceness = 10
130 }}}
131
132 The default window manager is KDE, but I've actually switched to
133 [http://icewm.org/ IceWM] for simplicity and ease of maintenance. The correct
134 ''window manager'' value for that configuration is
135 {{{/usr/bin/icewm-session}}}.
136
137 You can annoy your workstation users and give your jobs more CPU time by
138 setting a low-priority ''niceness'' level for the user code.
139
140
141 == A Simple Cluster Computing Job  ==
142
143 == Running the Job ==
144
145 == Conclusions ==
Note: See TracBrowser for help on using the browser.