Following description works for MPICH2, using mpiexec ot mpirun. That is how I tested.
On the node where you launch the program, type
>> mpdtrace -l
It gives you <node name>_<port>(IP)
blade50_51094 (IP **)
blade49_56382 (IP **)
blade47_35763 (IP **)
blade48_49526 (IP **)
blade51_53029 (IP **)
If not all nodes are listed here, for example, if blade46 is in the hosts file, and is available. ssh to blade46, type
>>mpd -h blade50 -p 51094 &
If you want to start more mpd
>>mpd -h blade50 -p 51094 -n &
Then the blade46 can be used.
To clean up mpd daemon, use mpdcleanup
Besides, if you want to launch m consecutive ranks on the same node, use mpd --ncpus=m
For example:
mpd --ncpus=2 &
or
mpd --ncpus=2 -h blade50 -p 51094 &
QUOTE"
If an mpd is started with the --ncpus option, then when it is its turn to start a process, it will start several application processes rather than just one before handing off the task of starting more processes to the next mpd in the ring. For example, if the mpd is started with
"END of QUOTE
To clean up mpd daemon, use mpdcleanup
Besides, if you want to launch m consecutive ranks on the same node, use mpd --ncpus=m
For example:
mpd --ncpus=2 &
or
mpd --ncpus=2 -h blade50 -p 51094 &
QUOTE"
If an mpd is started with the --ncpus option, then when it is its turn to start a process, it will start several application processes rather than just one before handing off the task of starting more processes to the next mpd in the ring. For example, if the mpd is started with
mpd --ncpus=4then it will start as many as four application processes, with consecutive ranks, when it is its turn to start processes. This option is for use in clusters of SMP's, when the user would like consecutive ranks to appear on the same machine. (In the default case, the same number of processes might well run on the machine, but their ranks would be different.) (A feature of the --ncpus=[n] argument is that it has the above effect only until all of the mpd's have started n processes at a time once; afterwards each mpd starts one process at a time. This is in order to balance the number of processes per machine to the extent possible.)
"END of QUOTE
1 comment:
I got the same error, and your solution worked. Thanks!
Post a Comment