Oct 18, 2010

Add blank page in latex without page number

To add blank pages / empty pages in latex, we can do this:

\documentclass[11pt, a4paper]{article}
\ begin {document}
This page one
\setcounter{page}{2} % This reset the page number counter.
This is page two after one blank page without numbering.
\ end{document}

Of course, there are ways to put words like "This page is intentionally left as blank."

Add blank page in latex without page number

To add blank pages / empty pages in latex, we can do this:

\documentclass[11pt, a4paper]{article}



This page one









\setcounter{page}{2} % This reset the page number counter.


This is page two after one blank page without numbering.


Of course, there are ways to put words like "This page is intentionally left as blank."

Sep 5, 2010

Use GDL to run IDL code and parse GOES AOD binary file

IDL (Interactive Data Language) has a free replacement, named GDL GNU Data Language.
NASA GASP GOES satellite provides AOD data with .aod that needs IDL to parse. Its idl_read code gives some hint about the data format of that binary file.

It is still painful to parse with c code. So GDL is used to function for IDL, which requires non-free licence.
To instal GDL, download it. Download GSL, PLPLOT, CMAKE, etc. packages.
Install all required packages.

After that, download CMSVLIB, and put untared files into <gdl>/share/gnudatalanguage/lib (This is for running 'save' in gdl)

add GDL's bin path to the .bashrc
Possibly, need to launch the program by running:
LD_LIBRARY_PATH=../../plplot/lib:$LD_LIBRARY_PATH gdl
which provides the plplot 's lib path

Besides, for the .aod binary file, its format is guessed to be 10 bytearries of size 2128 x 880. I wrote a c function to parse it also, which is a translation of IDL reader from Appendix A in http://www.ssd.noaa.gov/PS/FIRE/GASP/20090107_GASP_Algorithm_Updates.doc
The c function is also a demo, which is not fully optimized and prettified.

But notice that the archived data may be of size 17000000 bytes, which are before they switched GOES-EAST from GOES-12 to GOES-13. In that case, the array dimension is 2000x850. While there are other associated lat/lon.dat files.

The key points in this function are:
1. there are 10 byte_arrays
2. each of size 2128 x 880 = 1872640 bytes
3. the variable can be declared as uint8_t
4. Matlab can also do this, by
    (2)>>baod=fread(fid, [2128,880],'uint8'); # similar for the rest 9 arrays
5. in c, each array has elements stored as column major. So if loop over with one index from 0 to 1872640, each column is looped first.


int main()
    int i = 0;
    FILE *fp;
    long fsize;
    uint8_t *baod;
    uint8_t *bmsk;
    uint8_t *bcls;
    uint8_t *baodstd;
    uint8_t *bsfc;
    uint8_t *bch1;
    uint8_t *bmos;
    uint8_t *bcld;
    uint8_t *bsig;
    uint8_t *bsca;

    fp = fopen("2010244171519_i18_US.all.aod", "rb");
    //2010244171519_i18_US.all.aod", "rb");

    fseek(fp, 0, SEEK_END);
    fsize = ftell(fp);

    long arrsize = fsize / 10;

    baod = (uint8_t *)malloc(sizeof(uint8_t) * arrsize );
    bmsk = (uint8_t *)malloc(sizeof(uint8_t) * arrsize );
    bcls = (uint8_t *)malloc(sizeof(uint8_t) * arrsize );
    baodstd = (uint8_t *)malloc(sizeof(uint8_t) * arrsize );
    bsfc = (uint8_t *)malloc(sizeof(uint8_t) * arrsize );
    bch1 = (uint8_t *)malloc(sizeof(uint8_t) * arrsize );
    bmos = (uint8_t *)malloc(sizeof(uint8_t) * arrsize );
    bcld = (uint8_t *)malloc(sizeof(uint8_t) * arrsize );
    bsig = (uint8_t *)malloc(sizeof(uint8_t) * arrsize );
    bsca = (uint8_t *)malloc(sizeof(uint8_t) * arrsize );

    fread(baod, 1, arrsize, fp);
    fread(bmsk, 1, arrsize, fp);
    fread(bcls, 1, arrsize, fp);
    fread(baodstd, 1, arrsize, fp);
    fread(bsfc, 1, arrsize, fp);
    fread(bch1, 1, arrsize, fp);
    fread(bmos, 1, arrsize, fp);
    fread(bcld, 1, arrsize, fp);
    fread(bsig, 1, arrsize, fp);
    fread(bsca, 1, arrsize, fp);


   float *faod = (float *)malloc(sizeof(float) * arrsize );
    float *faodstd = (float *)malloc(sizeof(float) * arrsize );
    float *fsfc = (float *)malloc(sizeof(float) * arrsize );
    float *fch1 = (float *)malloc(sizeof(float) * arrsize );
    float *fmos = (float *)malloc(sizeof(float) * arrsize );
    float *fsig = (float *)malloc(sizeof(float) * arrsize );
    //float *fsca = (float *)malloc(sizeof(float) * arrsize );

    for (i=0; i
        faod[i] = (float)baod[i] / 100.0 - 0.5;
        faodstd[i] = (float)baodstd[i] / 100.0;
        fsfc[i] = (float)bsfc[i] / 500.0 - 0.1;
        fch1[i] = (float)bch1[i] / 600.0;
        fmos[i] = (float)bmos[i] / 600.0;
        fsig[i] = (float)bsig[i] / 250.0 - 0.5;
        //fsca[i] = (float)bsca[i] / 1.0;
   fp = fopen("test0.txt", "w");
   for (i=0; i
        fprintf(fp, "%d %d %d %d %d %d %d %d %d %d %d\n",i,baod[i],bmsk[i],bcls[i],baodstd[i],bsfc[i],bch1[i],bmos[i],bcld[i],bsig[i],bsca[i]);

    for(i=0; i
        if((faodstd[i]>=0.3) || (fsig[i]<=0.01) || (fsfc[i]>=0.15) ||
            (fsfc[i]<=0.005) || (bcls[i]<=15) || (faod[i]>=10.0) ||
            (fch1[i]<=0.0) || (bcld[i]!=1) || (bsca[i]<=70) ||
            (bsca[i]>=170) )
            faod[i] = -9999.0;

    fp = fopen("test.txt","w");
    for (i=0; i
        fprintf(fp, "%d %f\n", i, faod[i]);


    free(bcls); free(bsca);
    free(bcld); free(bmsk);

    return 0;

Jun 14, 2010

Use SVN and Google Code host

A step by step tutorial for hosting project on Google code with SVN command lines.
Need svn be installed on the linux machine; need a project page on Google code.
Example: You have one working project on your local machine <dir>/myproject
You also have a project page on google code

1. in xterm, go to <dir>
>> svn import myproject/ https://aproject.googlecode.com/svn/trunk/ --username yourgoogleaccount -m "Initial import"

2. At the Google code "source page", find the password.

3. After this initial import, at your local machine, remove the myporject/ contents. In xterm input
>>svn checkout https://aproject.googlecode.com/svn/trunk/ myproject --username yourgoogleaccount
This would attach svn info to this directory and its files.

4. After any modification of files in this project, in xterm, cd to this directory
>> cd <dir>/project
>>svn commit -m "any messages"

5. To delete one file, in the xterm
>>svn rm file1
>>svn commit -m "any messages"

So do "mv, mkdir", etc..

Everything will be OK.
This is a reference

Apr 16, 2010

Parallel code performance study

I have a parallel code using LETKF for data assimilation with shallow water equation as the model problem. It runs on Intel blades Nehalem cpu 1.5GHz, 8 cores sharing 4MB L3 cache, and ~24 GB memory, 10G Ether Network.

A test case: MPICH2-1.3, 1024x1024 size problem, 16 processes on 8 nodes, 2 processes per node, binding to core 0, 1.
Some profiling tools are used.
1. Timing with MPI_Wtime()
LETKF time at timestep 48 = 53.640148 comm = 0.001367
LETKF time at timestep 98 = 53.694284 comm = 0.002735
LETKF time at timestep 148 = 53.624503 comm = 0.003942
Finished analysis at 481.756978 with 16 processes
Total LETKF iter Time taken = 160.958935 :: Average LETKF Time taken = 53.652978
Total comm Time for LETKF taken = 0.003942 :: Average comm time taken = 0.001314
Total LETKF time taken = 160.962877 :: Average LETKF time taken = 53.654292
Total Model Time taken = 78.915314 :: Average Model Time taken = 0.526102
Total IO Time taken = 230.198382 :: Average IO Time taken = 1.534656
2. Timing with gprof.
To do that, compile and link with -g -pg. Run the code; a file named 'gmon.out' is generated; post-process it with 'gprof <my program name> gmon.out'

Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total          
 time   seconds   seconds    calls  Ts/call  Ts/call  name   
 42.72    104.10   104.10          MPIDI_CH3I_Progress
 27.01    169.91    65.81           single_time_step
 18.78    215.66    45.75           MPID_nem_tcp_connpoll
  4.60    226.87    11.21            MPID_nem_vc_terminate
  2.15    232.10     5.23             MPID_nem_network_poll
  2.13    237.29     5.19             MPID_nem_tcp_sm_init
  1.00    239.72     2.43             point_associates
  0.41    240.73     1.01             matrix_sqrt_invert_once
  0.36    241.61     0.88             mat2fvec
  0.20    242.09     0.48             vector_scale
  0.16    242.49     0.40             eigen_decomposition
  0.13    242.80     0.31             matrix_transpose
  0.10    243.04     0.24             matrix_invert
  0.06    243.19     0.15             initial_vector
  0.05    243.31     0.12             letkf_mod
 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this

calls      the number of times this function was invoked, if
           this function is profiled, else blank.

 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
       else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this
       function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
       the function in the gprof listing. If the index is
       in parenthesis it shows where it would appear in
       the gprof listing if it were to be printed.
3. MPI performance monitor with IPM
which profile is stored in an xml, and can be converted into html for graphic view
# command : unknown (running)
# host    : intel01                        mpi_tasks : 16 on 8 nodes
# start   : 04/16/10/09:02:57              wallclock : 479.626000 sec
# stop    : 04/16/10/09:10:57              %comm     : 62.54
# gbytes  : 0.00000e+00 total              gflop/sec : -3.33593e-02 total
#                           [total]         <avg>           min           max
# wallclock                  7672.59       479.537       479.517       479.626
# user                       6970.61       435.663       323.966       452.923
# system                      556.12       34.7575          9.68         47.97
# mpi                        4799.27       299.955       23.8315        348.25
# %comm                                    62.5393       4.96982       72.6102
# gflop/sec               -0.0333593   -0.00208496   -0.00208496   -0.00208496
# gbytes                           0             0             0             0
#                            [time]       [calls]        <%mpi>      <%wall>
# MPI_Barrier                4196.35          2400         87.44        54.68
# MPI_Sendrecv               560.118        821280         11.67         7.30
# MPI_Gather                 42.7768          2400          0.89         0.56
# region : ipm_noregion        [ntasks] = 16
#                           [total]         <avg>           min           max
# entries                         32             2             2             2
# wallclock              5.93667e+07   3.71042e+06        384240    4.8597e+06
# user                       6970.61       435.663        323.97        452.92
# system                     556.131       34.7582        9.6795        47.969
# mpi                        4799.27       299.954        23.831        348.25
# %comm                                 0.00808412    0.00615256     0.0896497
#                            [time]       [calls]        <%mpi>      <%wall>
# MPI_Barrier                4196.35          2400         87.44         0.01
# MPI_Sendrecv               560.118        821280         11.67         0.00
# MPI_Gather                 42.7768          2400          0.89         0.00

4. CPU, memory, process instrument with an unknow lib procmon.a

System load avg. = 1.99
utime + stime= 478.955
involuntary context switches =25741
voluntary context switches=62
CMD                         THCNT   PID   TID %CPU %MEM     TIME    SZ PSR
./main 1022 1022 tests/         1  5995  5995 99.3  2.8 00:07:57 1087732 0
*** READING /proc/5995/statm ***
total_program_size, memorykb, shared_pages, code_pages, data_pages, lib_pages, dirty_pages
284696 253338 1054 306 0 271934 0
5. Cache Instrumentation with PinTool
Not sure about if this works well for parallel code.

   Load Hits:         3017757
   Load Misses:          1699
   Load Accesses:     3019456
   Load Miss Rate:      0.06%

  Store Hits:               0
  Store Misses:             0
  Store Accesses:           0
  Store Miss Rate:       nan%

  Total Hits:         3017757
  Total Misses:          1699
  Total Accesses:     3019456
  Total Miss Rate:      0.06%
  Flushes:                  0
  Stat Resets:              0

   Load Hits:         1213480
   Load Misses:          1020
   Load Accesses:     1214500
   Load Miss Rate:      0.08%

  Store Hits:               0
  Store Misses:             0
  Store Accesses:           0
  Store Miss Rate:       nan%

  Total Hits:         1213480
  Total Misses:          1020
  Total Accesses:     1214500
  Total Miss Rate:      0.08%
  Flushes:                  0
  Stat Resets:              0

L1 Instruction Cache:
   Load Hits:         3016101
   Load Misses:          3355
   Load Accesses:     3019456
   Load Miss Rate:      0.11%

  Store Hits:               0
  Store Misses:             0
  Store Accesses:           0
  Store Miss Rate:       nan%

  Total Hits:         3016101
  Total Misses:          3355
  Total Accesses:     3019456
  Total Miss Rate:      0.11%
  Flushes:                  0
  Stat Resets:              0

L1 Data Cache:
   Load Hits:          692391
   Load Misses:          9721
   Load Accesses:      702112
   Load Miss Rate:      1.38%

  Store Hits:          482670
  Store Misses:         29718
  Store Accesses:      512388
  Store Miss Rate:      5.80%

  Total Hits:         1175061
  Total Misses:         39439
  Total Accesses:     1214500
  Total Miss Rate:      3.25%
  Flushes:                  0
  Stat Resets:              0

L2 Unified Cache:
   Load Hits:            9292
   Load Misses:          3784
   Load Accesses:       13076
   Load Miss Rate:     28.94%

  Store Hits:           28007
  Store Misses:          1711
  Store Accesses:       29718
  Store Miss Rate:      5.76%

  Total Hits:           37299
  Total Misses:          5495
  Total Accesses:       42794
  Total Miss Rate:     12.84%
  Flushes:                  0
  Stat Resets:              0

L3 Unified Cache:
   Load Hits:             315
   Load Misses:          3469
   Load Accesses:        3784
   Load Miss Rate:     91.68%

  Store Hits:               0
  Store Misses:          1711
  Store Accesses:        1711
  Store Miss Rate:    100.00%

  Total Hits:             315
  Total Misses:          5180
  Total Accesses:        5495
  Total Miss Rate:     94.27%
  Flushes:                  0
  Stat Resets:              0

Mar 26, 2010

Install MPICH2-1.3a1 for CPU affinity

I tested MPICH2-1.3a1, which uses hydra as the default process manager.
The test env. is the IBM H22 Intel Nehalem blades.
>>./configure --prefix=/home/mydir/mpich2-1.3a
>>make install

No special configuration option is required. (In 1.2.1, we need --with-pm=hydra)
Setup the 'myhost' file as
intel01:1 binding=user:0
intel02:1 binding=user:0
intel03:1 binding=user:0
intel04:1 binding=user:0

>>LD_LIBRARY_PATH=../socIntel/goto:$LD_LIBRARY_PATH mpiexec -f myhost -n 4 ./main 62 62 tests/
I have to say that there is no process migration among cpus. However,
I cannot say this installation really has cpu affinity, because when I
-binding user:2,4 processes are not really binded to cpu2, and 4. Even
if I use intel01:4 binding=user:4,5,6,7. I see cpus 0,1,2,3, are busy.

Nevertheless, this is the best result I can get from the Bluegrit. On
it, the OpenMPI can do cpu affinity only on one node, because of TCP
firewall. Besides, MVAPICH2 cannot really support cpu affinity, since
there is no IB, iWARP, etc. Last, early version of MPICH does not
support core binding. It is really hard to get core mapping as a
non-root. I don't know why the admin are reluctant to install these
for the users. I wasted a lot of time on that!

Install MVAPICH with HWLOC as a non-root

Here is descriptions of installing MVAPICH with hwloc as a non-root.
With hwloc, we can make mvapich support cpu affinity.
Download hwloc from http://www.open-mpi.org/software/hwloc
Download mvapich2, and untar both to hwloc-0.9.3 and mvapich-1.4.1
>>mkdir hwloc0.9.3
>>mkdir mvapich1.4.1
>>cd hwloc-0.9.3
>>./configure --prefix=/home/username/hwloc0.9.3
>>make install
>>cd ../mvapich-1.4.1
>>export LDFLAGS='-L/home/username/hwloc0.9.3/lib'
>>export CPPFLAGS='-I/home/username/hwloc0.9.3/include'
>>./configure --with-hwloc --with-rdma=gen2 --prefix=/home/username/mvapich1.4.1 --disable-f90
>>make install
After that, add the mvapich1.4.1 path to .bashrc

Feb 27, 2010

Plot data file with PGFPlot

Tikz and pgfplot can plot data file in Tex.

\begin{axis}[xlabel=nstep, ylabel=WaveHeight]
\addplot[mark=none] file {case14.data};   % Here is the data file
\addplot[color=green,mark=none] file {case14xb.data};
\legend{True, Predicted};
\begin{axis}[xlabel=nstep, ylabel=error(2-norm)]
\addplot[mark=none] file {case14err.data};
\caption{(a)True and predicted wave height at the domain center; (b)2-norm of numerical error on the whole domain}

The data file's format can be multiple columns separated by space. Each column is for one variable. Such as
# data file
0.00001 2.123243
0.00003 3.123452
Sometimes, Matlab's fprintf function may help to generate such data file.
For example:
fid = fopen('case14err.data','w');
y=[b ; a];
fprintf(fid,'%1.5f %1.5f\n', y);

Feb 26, 2010

A script to create shared lib

   A good script to create shared lib

   1. With root privileges: create directories /usr/local/lib and /usr/local/include (if not exist)
   2. add pass for /usr/local/lib in file /etc/ld.so.conf (if not exists)
   3. run script: create_shared_lib.sh (text below)
      f.e. you have money.c file that is prepared for creating library "money".
      $sh create_shared_lib money
   4. copy your money.h to /usr/local/include
   5. to use your library:
      include money.h in global area of your program (#include &ltmoney.h>)
      gcc -Wall -g -o my_prog.exe my_prog.c -I/usr/local/include -lmoney

   1. Without root privileges: create directory lib and include in your $HOME directory
   2. change script's line 'mv "$libname".* /usr/local/lib/' for 'mv "$libname".* ~/lib/'
   3. comment line 'ldconfig &'
   4. run script (as I explained above)
   5. copy your header file into ~/include directory
   6. to use your library:
      include money.h in global area of your program (#include &ltmoney.h>)
      gcc -Wall -g -o my_prog.exe my_prog.c -I$HOME/include -L$HOME/lib -lmoney



if [ "$#" -ne "$ARG" ] 
echo -e "\t\tUsages: `basename $0` name_of_shared_library"
exit $BADARG


# uncomment next 2 lines and comment 3-4 lines if you create C++ library #

#g++ -fPIC -c "$1".cpp          
#g++ -shared -Wl,-soname,"$libname".so.1 -o "$libname".so.1.0 "$1".o
gcc -fPIC -c "$1".c
gcc -shared -Wl,-soname,"$libname".so.1 -o "$libname".so.1.0 "$1".o
ln -s "$libname".so.1.0 "$libname".so.1
ln -s "$libname".so.1 "$libname".so

echo -e "\t\tSHARED LIBRARY $libname IS CREATED"
echo -e "\t\t==================================\n"
mv "$libname".* /usr/local/lib/

echo -e "\t\tldconfig is working"
ldconfig &

exit 0

Feb 13, 2010

Node plot and position in Tikz

It is difficult to create a neat, accurate and fancy scientific figure. Tikz/pgf, pstricks, metapost are not easy to learn, nor for me to have a full knowledge of their capabilities. On the other hand, gimp or photoshop is not suitable to produce figures from calculated data. I hope there will be a very handy tool that can help me using Tikz, or metapost easily. But I tested several java based front end. None of them would be a choice.

Here is an example of plotting nodes and point them to each other with arrows, with position determination. It is mainly about '\node', '\draw' and their decoration. It is easy to understand how it is produced, hence, I put no comments here.


\draw[dash pattern=on 2pt off 3pt on 4pt off 4pt](2,.2) -- (2,3.2);
\draw[dash pattern=on 2pt off 3pt on 4pt off 4pt](4,.2) -- (4,3.2);
\draw[dash pattern=on 2pt off 3pt on 4pt off 4pt](6,.2) -- (6,3.2);
\node at (2,2.9) [fill=blue!50,draw,circle, drop shadow,
label=left:\tiny{$x_k^b$}] (n1){};
\node at (2,1.9) [fill=red!50,draw,circle, drop shadow,label=left:\tiny{$x_k^a$}] (n2){};
\node at (2,.9) [fill=green!50,draw,circle, drop shadow,label=left:\tiny{$y_k^o$}] (n3){};
\node at (4,1.9) [fill=red!50,draw,circle, drop shadow,label=left:\tiny{$x_{k+1}^a$}] (n4){};
\node at (4,2.9) [fill=blue!50,draw,circle, drop shadow,label=left:\tiny{$x_{k+1}^b$}] (n5){};
\node at (4,.9) [fill=green!50,draw,circle, drop shadow,label=left:\tiny{$y_{k+1}^o$}] (n6){};
\node at (6,1.9) [fill=red!50,draw,circle, drop shadow,label=left:\tiny{$x_{k+2}^a$}] (n7){};
\node at (6,.9) [fill=green!50,draw,circle, drop shadow,label=left:\tiny{$y_{k+2}^o$}] (n8){};
\node at (6,2.9) [fill=blue!50,draw,circle, drop shadow,label=left:\tiny{$x_{k+2}^b$}] (n9){};
\node at (8,2.9) [fill=blue!20, anchor=base](n10){$\cdots$};
\draw[->, line width=1pt] (1,0.2) -- (10,0.2) node[right]{time};
\node at (2,0.2)[below]{\tiny{$k$}};
\node at (4,0.2)[below]{\tiny{$k+1$}};
\node at (6,0.2)[below]{\tiny{$k+2$}};
\path[->] (n2) edge [bend right] (n5);
\path[->] (n4) edge [bend right] (n9);
\path[->] (n7) edge [bend right] (n10);
\item \tikz\node [fill=blue!50,draw,circle]{}; Background state $x^b$
\item \tikz\node [fill=red!50,draw,circle]{}; Analysis state $x^a$
\item \tikz\node [fill=green!50,draw,circle]{}; Observation $y^o$

Feb 12, 2010

Install TeXLive on Ubuntu

To install texlive2008

1. sudo mount -o loop textlive2008-20080822.iso /mnt
2. cd /mnt
3. ./install-tl -gui

If there is error about "Cannot load Tk, maybe something is missing" or "perl/Tk unusable, cannot create main windows.",
Follow its suggestion, visit  http://tug.org/texlive/distro.html#perltk

4. sudo apt-get install perl-tk
5. sudo ./install-tl -gui
Make wise choice, default dir /usr/local/texlive/, click Install TeX Live
Download TexMaker, Install it. In configure, add correct path for each executable file, 
such as /usr/local/texlive/2008/bin/i386-linux/latex. In this way, those programs can be found.

6. sudo vi ~/.bashrc  and/or sudo vi /root/.bashrc
Add one line at last
export PATH=$PATH:/usr/local/texlive/2008/bin/i386-linux

To launch the tlmgr, 
7. sudo su
8. tlmgr -gui
If the path is not set in the bashrc, there is error when you launch the tlmgr, even if you execute it from its directory. The error are like:
Can't exec "kpsewhich"
Can't locate TeXLive/TLPOBJ.pm in @INC
So just ass the path of TexLive installation directory.

It is better to install texlive to somewhere does not require root. i.e. /home/yourname/textlive . But the 7th step solves this problem.

Feb 7, 2010

mpdtrace and using multiple nodes to run mpi

In a cluster, which the user may need to launch the mpd manually, here are descriptions of to-dos. The situation that one may consider to do so, is when you launch a executable program on multiple nodes, however, there is only the local node is used. Or you see some error like MPI connection or communication error. It is the time to check if all the nodes listed in the hosts file are able to communicate with each other. Error like: mpiexec: unable to start all procs; may have invalid machine names remaining specified hosts.

Following description works for MPICH2, using mpiexec ot mpirun. That is how I tested. 

On the node where you launch the program, type
>> mpdtrace -l
It gives you  <node name>_<port>(IP)
blade50_51094 (IP **)
blade49_56382 (IP **)
blade47_35763 (IP **)
blade48_49526 (IP **)
blade51_53029 (IP **) 

If not all nodes are listed here, for example, if blade46 is in the hosts file, and is available. ssh to blade46, type
>>mpd -h blade50 -p 51094 &
If you want to start more mpd
>>mpd -h blade50 -p 51094 -n &
Then the blade46 can be used.

To clean up mpd daemon, use mpdcleanup

Besides, if you want to launch m consecutive ranks on the same node, use mpd --ncpus=m
For example:
mpd --ncpus=2 &
mpd --ncpus=2 -h blade50 -p 51094 &

If an mpd is started with the --ncpus option, then when it is its turn to start a process, it will start several application processes rather than just one before handing off the task of starting more processes to the next mpd in the ring. For example, if the mpd is started with
mpd --ncpus=4
then it will start as many as four application processes, with consecutive ranks, when it is its turn to start processes. This option is for use in clusters of SMP's, when the user would like consecutive ranks to appear on the same machine. (In the default case, the same number of processes might well run on the machine, but their ranks would be different.) (A feature of the --ncpus=[n] argument is that it has the above effect only until all of the mpd's have started n processes at a time once; afterwards each mpd starts one process at a time. This is in order to balance the number of processes per machine to the extent possible.)

Feb 5, 2010

Animate image files

With a sequence of image files, we can code them into a video file. Suppose that they are named beginning from 1, use

ffmpeg -qscale 1 -r 20 -b 96000 -i %08d.png animate.mp4

If the file names do not start from 1, we can use a script do rename them in batch.


for fname in *.png
  mv $fname `printf "%08d.png" $d`

To enable mp3 etc.
>>sudo apt-get install ffmpeg libavcodec-extra-52
One example of extracting audio from mp4:
>>ffmpeg -ss 00:05:00:00 -t 00:02:00:00 -i input.mp4 -acodec libmp3lame -ab 128k output.mp3
ss: time offset from beginning of input in hh:mm:ss:frames.
t: duration of encode
ab: audio bitrate

Jan 16, 2010

GotoBlas and Lapack_wrapper

Gotoblas + lapack + lapack_wrapper
Rod Heylen has a good lapack wrapper, which can be used with ATLAS.
Find it here (http://itf.fys.kuleuven.be/~rob/computer/lapack_wrapper/index.html)
However, ATLAS is very hard and time consuming (>6hr for me) to
install. While, GotoBlas is said to have better performance than ATLAS
does, and is easy to install.
I tested to replace ATLAS with GotoBlas, and compile lapack_wrapper
examples successfully, on Ubuntu9.10, 32bit intel Core2DUO 2.2GHz,
gcc4.4. Of course, we need to change the lapack_wrapper a little. This
is because of some functions and variable declarations are
incompatible in ATLAS and GotoBlas.
Download GotoBLAS2-1.10.tar.gz.
Untar it to a directory, i.e. /home/shiming/GOTO
>>tar xvfz GotoBLAS2-1.10.tar.gz
cd to that directory
>> make
>>sudo cp libgoto2_* /usr/lib
>>sudo mkdir /usr/local/include/goto
>>sudo cp cblas.h /usr/local/include/goto
Download CLAPACK3.1.1, untar it, and copy CLAPACK-3.1.1/INCLUDE/f2c.h
and CLAPACK-3.1.1/INCLUDE/clapack.h to /usr/local/include/goto
>>cd /usr/local/include/goto
>>sudo chmod 755 *
>>cd /usr/lib
>>sudo ln -s libgoto2_*.a libgoto.a
>>sudo ln -s libgoto2_*.so libgoto.so
Change a little in lapack_wrapper.c. I do not list them here. To see
the difference, just download the original one, and use 'diff'
The changed version is available here

Also edit the cblas.h by adding one line
#define blasint int
And edit the f2c.h by changing the line on the 10th line to
typedef int integer;

To compile, use
>>gcc -o lapack_example lapack_example.c lapack_wrapper.c -lm -lgoto -lgfortran
Everything works well.

In a machine that I have no 'root' right, there is a little difference to run the code.
For example, a PPC blade, RHEL, 64bit, gcc4.1.2. Just compile to GotoBlas and copy necessary header files and lib files to somewhere, i.e. /home/shiming1/goto, and chmod 755 *.

In the directory where lapack_example.c exists, compile with
>> gcc -o test lapack_example.c lapack_wrapper.c -lm -L../soc/goto -lgoto2 -lgfortran
Run it with

Here is a good article about using dynamic library
One example about cblas_dgemv. Suppose matrix A is a 3 by 4 double, stored as a 'cvec' (row major).
A=[1 3 2; 6 4 1; 2 8 7, 3 4 5].
x1=[1; 1; 1]
x2=[1; 1; 1; 1]
y =Ax1

cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 3, 1.0, A, 4, x1, 1, 0.0, y, 1);
If y= A'x2
cblas_dgemv(CblasRowMajor, CblasTrans, 3, 4, 1.0, A, 3, x2, 1, 0.0, y, 1);

Jan 6, 2010

A Matlab file saving format

Let me take this as a good start.
There is one special requirement for saving Matlab vectors into a file with certain format: updated vector Vec in each step should be saved like,
[1, 2, 3, 4, 5, 6]
[2, 3, 4, 5, 6, 7]
Which mean each vector is enclosed with [ and ], each entry separated by a comma. No comma after the last entry.
There could be more special requirements for the file format. Other commands' combination can accomplish more complicated task, such as "dlmwrite", "csvwrite".

A snippet Matlab code is:

fid = fopen('Hdata.dat','wb');
% Inside a loop body
% vec is updated
fprintf(fid, '[');
fprintf(fid,'%12.8f,', vec); % add a comma after each entry
fseek(fid,-1,0); % file pointer rewind one
fprintf(fid, ']\n'); % cover the last comma with a ]
% end of a loop body