## Oct 18, 2010

### Add blank page in latex without page number

To add blank pages / empty pages in latex, we can do this:

\documentclass[11pt, a4paper]{article}
\usepackage{fancyhdr}
\ begin {document}
\newpage
%---------------------
\fancyhf{}
\thispagestyle{empty}
\newpage
\mbox{}
\newpage
%---------------------
\setcounter{page}{2} % This reset the page number counter.
\newpage
This is page two after one blank page without numbering.
\ end{document}

Of course, there are ways to put words like "This page is intentionally left as blank."

### Add blank page in latex without page number

To add blank pages / empty pages in latex, we can do this:

\documentclass[11pt, a4paper]{article}

\usepackage{fancyhdr}

\begin{document}

\newpage

%---------------------

\fancyhf{}

\thispagestyle{empty}

\newpage

\mbox{}

\newpage

%---------------------

\setcounter{page}{2} % This reset the page number counter.

\newpage

This is page two after one blank page without numbering.

\end{document}

Of course, there are ways to put words like "This page is intentionally left as blank."

## Sep 5, 2010

### Use GDL to run IDL code and parse GOES AOD binary file

IDL (Interactive Data Language) has a free replacement, named GDL GNU Data Language.
NASA GASP GOES satellite provides AOD data with .aod that needs IDL to parse. Its idl_read code gives some hint about the data format of that binary file.

It is still painful to parse with c code. So GDL is used to function for IDL, which requires non-free licence.
Install all required packages.

After that, download CMSVLIB, and put untared files into <gdl>/share/gnudatalanguage/lib (This is for running 'save' in gdl)

add GDL's bin path to the .bashrc
Possibly, need to launch the program by running:
LD_LIBRARY_PATH=../../plplot/lib:$LD_LIBRARY_PATH gdl which provides the plplot 's lib path Besides, for the .aod binary file, its format is guessed to be 10 bytearries of size 2128 x 880. I wrote a c function to parse it also, which is a translation of IDL reader from Appendix A in http://www.ssd.noaa.gov/PS/FIRE/GASP/20090107_GASP_Algorithm_Updates.doc The c function is also a demo, which is not fully optimized and prettified. But notice that the archived data may be of size 17000000 bytes, which are before they switched GOES-EAST from GOES-12 to GOES-13. In that case, the array dimension is 2000x850. While there are other associated lat/lon.dat files. The key points in this function are: 1. there are 10 byte_arrays 2. each of size 2128 x 880 = 1872640 bytes 3. the variable can be declared as uint8_t 4. Matlab can also do this, by (1)>>fid=fopen('2010244171519_i18_US.all.aod') (2)>>baod=fread(fid, [2128,880],'uint8'); # similar for the rest 9 arrays 5. in c, each array has elements stored as column major. So if loop over with one index from 0 to 1872640, each column is looped first. #include #include #include int main() { int i = 0; FILE *fp; long fsize; uint8_t *baod; uint8_t *bmsk; uint8_t *bcls; uint8_t *baodstd; uint8_t *bsfc; uint8_t *bch1; uint8_t *bmos; uint8_t *bcld; uint8_t *bsig; uint8_t *bsca; fp = fopen("2010244171519_i18_US.all.aod", "rb"); //2010246174517_i18_US.all.aod","rb"); //2010244171519_i18_US.all.aod", "rb"); fseek(fp, 0, SEEK_END); fsize = ftell(fp); rewind(fp); long arrsize = fsize / 10; baod = (uint8_t *)malloc(sizeof(uint8_t) * arrsize ); bmsk = (uint8_t *)malloc(sizeof(uint8_t) * arrsize ); bcls = (uint8_t *)malloc(sizeof(uint8_t) * arrsize ); baodstd = (uint8_t *)malloc(sizeof(uint8_t) * arrsize ); bsfc = (uint8_t *)malloc(sizeof(uint8_t) * arrsize ); bch1 = (uint8_t *)malloc(sizeof(uint8_t) * arrsize ); bmos = (uint8_t *)malloc(sizeof(uint8_t) * arrsize ); bcld = (uint8_t *)malloc(sizeof(uint8_t) * arrsize ); bsig = (uint8_t *)malloc(sizeof(uint8_t) * arrsize ); bsca = (uint8_t *)malloc(sizeof(uint8_t) * arrsize ); fread(baod, 1, arrsize, fp); fread(bmsk, 1, arrsize, fp); fread(bcls, 1, arrsize, fp); fread(baodstd, 1, arrsize, fp); fread(bsfc, 1, arrsize, fp); fread(bch1, 1, arrsize, fp); fread(bmos, 1, arrsize, fp); fread(bcld, 1, arrsize, fp); fread(bsig, 1, arrsize, fp); fread(bsca, 1, arrsize, fp); fclose(fp); float *faod = (float *)malloc(sizeof(float) * arrsize ); float *faodstd = (float *)malloc(sizeof(float) * arrsize ); float *fsfc = (float *)malloc(sizeof(float) * arrsize ); float *fch1 = (float *)malloc(sizeof(float) * arrsize ); float *fmos = (float *)malloc(sizeof(float) * arrsize ); float *fsig = (float *)malloc(sizeof(float) * arrsize ); //float *fsca = (float *)malloc(sizeof(float) * arrsize ); for (i=0; i { faod[i] = (float)baod[i] / 100.0 - 0.5; faodstd[i] = (float)baodstd[i] / 100.0; fsfc[i] = (float)bsfc[i] / 500.0 - 0.1; fch1[i] = (float)bch1[i] / 600.0; fmos[i] = (float)bmos[i] / 600.0; fsig[i] = (float)bsig[i] / 250.0 - 0.5; //fsca[i] = (float)bsca[i] / 1.0; } fp = fopen("test0.txt", "w"); for (i=0; i { fprintf(fp, "%d %d %d %d %d %d %d %d %d %d %d\n",i,baod[i],bmsk[i],bcls[i],baodstd[i],bsfc[i],bch1[i],bmos[i],bcld[i],bsig[i],bsca[i]); } fclose(fp); for(i=0; i { if((faodstd[i]>=0.3) || (fsig[i]<=0.01) || (fsfc[i]>=0.15) || (fsfc[i]<=0.005) || (bcls[i]<=15) || (faod[i]>=10.0) || (fch1[i]<=0.0) || (bcld[i]!=1) || (bsca[i]<=70) || (bsca[i]>=170) ) { faod[i] = -9999.0; } } fp = fopen("test.txt","w"); for (i=0; i { fprintf(fp, "%d %f\n", i, faod[i]); } fclose(fp); free(baod); //free(bmsk); free(baodstd); free(bsfc); free(bch1); free(bmos); free(bsig); //free(bsca); free(bcls); free(bsca); free(bcld); free(bmsk); free(faod); free(faodstd); free(fsfc); free(fch1); free(fmos); free(fsig); //free(fsca); return 0; } ## Jun 14, 2010 ### Use SVN and Google Code host A step by step tutorial for hosting project on Google code with SVN command lines. Need svn be installed on the linux machine; need a project page on Google code. Example: You have one working project on your local machine <dir>/myproject You also have a project page on google code https://aproject.googlecode.com 1. in xterm, go to <dir> >> svn import myproject/ https://aproject.googlecode.com/svn/trunk/ --username yourgoogleaccount -m "Initial import" 2. At the Google code "source page", find the password. 3. After this initial import, at your local machine, remove the myporject/ contents. In xterm input >>svn checkout https://aproject.googlecode.com/svn/trunk/ myproject --username yourgoogleaccount This would attach svn info to this directory and its files. 4. After any modification of files in this project, in xterm, cd to this directory >> cd <dir>/project >>svn commit -m "any messages" 5. To delete one file, in the xterm >>svn rm file1 >>svn commit -m "any messages" So do "mv, mkdir", etc.. Everything will be OK. This is a reference ## Apr 16, 2010 ### Parallel code performance study I have a parallel code using LETKF for data assimilation with shallow water equation as the model problem. It runs on Intel blades Nehalem cpu 1.5GHz, 8 cores sharing 4MB L3 cache, and ~24 GB memory, 10G Ether Network. A test case: MPICH2-1.3, 1024x1024 size problem, 16 processes on 8 nodes, 2 processes per node, binding to core 0, 1. Some profiling tools are used. ---------------------------------------- 1. Timing with MPI_Wtime() LETKF time at timestep 48 = 53.640148 comm = 0.001367 LETKF time at timestep 98 = 53.694284 comm = 0.002735 LETKF time at timestep 148 = 53.624503 comm = 0.003942 Finished analysis at 481.756978 with 16 processes Total LETKF iter Time taken = 160.958935 :: Average LETKF Time taken = 53.652978 Total comm Time for LETKF taken = 0.003942 :: Average comm time taken = 0.001314 Total LETKF time taken = 160.962877 :: Average LETKF time taken = 53.654292 Total Model Time taken = 78.915314 :: Average Model Time taken = 0.526102 Total IO Time taken = 230.198382 :: Average IO Time taken = 1.534656 ---------------------------------------- 2. Timing with gprof. To do that, compile and link with -g -pg. Run the code; a file named 'gmon.out' is generated; post-process it with 'gprof <my program name> gmon.out' Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls Ts/call Ts/call name 42.72 104.10 104.10 MPIDI_CH3I_Progress 27.01 169.91 65.81 single_time_step 18.78 215.66 45.75 MPID_nem_tcp_connpoll 4.60 226.87 11.21 MPID_nem_vc_terminate 2.15 232.10 5.23 MPID_nem_network_poll 2.13 237.29 5.19 MPID_nem_tcp_sm_init 1.00 239.72 2.43 point_associates 0.41 240.73 1.01 matrix_sqrt_invert_once 0.36 241.61 0.88 mat2fvec 0.20 242.09 0.48 vector_scale 0.16 242.49 0.40 eigen_decomposition 0.13 242.80 0.31 matrix_transpose 0.10 243.04 0.24 matrix_invert 0.06 243.19 0.15 initial_vector 0.05 243.31 0.12 letkf_mod % the percentage of the total running time of the time program used by this function. cumulative a running sum of the number of seconds accounted seconds for by this function and those listed above it. self the number of seconds accounted for by this seconds function alone. This is the major sort for this listing. calls the number of times this function was invoked, if this function is profiled, else blank. self the average number of milliseconds spent in this ms/call function per call, if this function is profiled, else blank. total the average number of milliseconds spent in this ms/call function and its descendents per call, if this function is profiled, else blank. name the name of the function. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parenthesis it shows where it would appear in the gprof listing if it were to be printed. ---------------------------------------------- 3. MPI performance monitor with IPM which profile is stored in an xml, and can be converted into html for graphic view ##IPMv0.982##################################################################### # # command : unknown (running) # host : intel01 mpi_tasks : 16 on 8 nodes # start : 04/16/10/09:02:57 wallclock : 479.626000 sec # stop : 04/16/10/09:10:57 %comm : 62.54 # gbytes : 0.00000e+00 total gflop/sec : -3.33593e-02 total # # [total] <avg> min max # wallclock 7672.59 479.537 479.517 479.626 # user 6970.61 435.663 323.966 452.923 # system 556.12 34.7575 9.68 47.97 # mpi 4799.27 299.955 23.8315 348.25 # %comm 62.5393 4.96982 72.6102 # gflop/sec -0.0333593 -0.00208496 -0.00208496 -0.00208496 # gbytes 0 0 0 0 # [time] [calls] <%mpi> <%wall> # MPI_Barrier 4196.35 2400 87.44 54.68 # MPI_Sendrecv 560.118 821280 11.67 7.30 # MPI_Gather 42.7768 2400 0.89 0.56 ############################################################################### # region : ipm_noregion [ntasks] = 16 # # [total] <avg> min max # entries 32 2 2 2 # wallclock 5.93667e+07 3.71042e+06 384240 4.8597e+06 # user 6970.61 435.663 323.97 452.92 # system 556.131 34.7582 9.6795 47.969 # mpi 4799.27 299.954 23.831 348.25 # %comm 0.00808412 0.00615256 0.0896497 # [time] [calls] <%mpi> <%wall> # MPI_Barrier 4196.35 2400 87.44 0.01 # MPI_Sendrecv 560.118 821280 11.67 0.00 # MPI_Gather 42.7768 2400 0.89 0.00 ------------------------------------------------------------------ 4. CPU, memory, process instrument with an unknow lib procmon.a System load avg. = 1.99 utime + stime= 478.955 involuntary context switches =25741 voluntary context switches=62 CMD THCNT PID TID %CPU %MEM TIME SZ PSR ./main 1022 1022 tests/ 1 5995 5995 99.3 2.8 00:07:57 1087732 0 *** READING /proc/5995/statm *** total_program_size, memorykb, shared_pages, code_pages, data_pages, lib_pages, dirty_pages 284696 253338 1054 306 0 271934 0 ----------------------------------------------------------- 5. Cache Instrumentation with PinTool Not sure about if this works well for parallel code. ITLB: Load Hits: 3017757 Load Misses: 1699 Load Accesses: 3019456 Load Miss Rate: 0.06% Store Hits: 0 Store Misses: 0 Store Accesses: 0 Store Miss Rate: nan% Total Hits: 3017757 Total Misses: 1699 Total Accesses: 3019456 Total Miss Rate: 0.06% Flushes: 0 Stat Resets: 0 DTLB: Load Hits: 1213480 Load Misses: 1020 Load Accesses: 1214500 Load Miss Rate: 0.08% Store Hits: 0 Store Misses: 0 Store Accesses: 0 Store Miss Rate: nan% Total Hits: 1213480 Total Misses: 1020 Total Accesses: 1214500 Total Miss Rate: 0.08% Flushes: 0 Stat Resets: 0 L1 Instruction Cache: Load Hits: 3016101 Load Misses: 3355 Load Accesses: 3019456 Load Miss Rate: 0.11% Store Hits: 0 Store Misses: 0 Store Accesses: 0 Store Miss Rate: nan% Total Hits: 3016101 Total Misses: 3355 Total Accesses: 3019456 Total Miss Rate: 0.11% Flushes: 0 Stat Resets: 0 L1 Data Cache: Load Hits: 692391 Load Misses: 9721 Load Accesses: 702112 Load Miss Rate: 1.38% Store Hits: 482670 Store Misses: 29718 Store Accesses: 512388 Store Miss Rate: 5.80% Total Hits: 1175061 Total Misses: 39439 Total Accesses: 1214500 Total Miss Rate: 3.25% Flushes: 0 Stat Resets: 0 L2 Unified Cache: Load Hits: 9292 Load Misses: 3784 Load Accesses: 13076 Load Miss Rate: 28.94% Store Hits: 28007 Store Misses: 1711 Store Accesses: 29718 Store Miss Rate: 5.76% Total Hits: 37299 Total Misses: 5495 Total Accesses: 42794 Total Miss Rate: 12.84% Flushes: 0 Stat Resets: 0 L3 Unified Cache: Load Hits: 315 Load Misses: 3469 Load Accesses: 3784 Load Miss Rate: 91.68% Store Hits: 0 Store Misses: 1711 Store Accesses: 1711 Store Miss Rate: 100.00% Total Hits: 315 Total Misses: 5180 Total Accesses: 5495 Total Miss Rate: 94.27% Flushes: 0 Stat Resets: 0 ## Mar 26, 2010 ### Install MPICH2-1.3a1 for CPU affinity I tested MPICH2-1.3a1, which uses hydra as the default process manager. The test env. is the IBM H22 Intel Nehalem blades. >>./configure --prefix=/home/mydir/mpich2-1.3a >>make >>make install No special configuration option is required. (In 1.2.1, we need --with-pm=hydra) Setup the 'myhost' file as intel01:1 binding=user:0 intel02:1 binding=user:0 intel03:1 binding=user:0 intel04:1 binding=user:0 >>LD_LIBRARY_PATH=../socIntel/goto:$LD_LIBRARY_PATH mpiexec -f myhost -n 4 ./main 62 62 tests/
I have to say that there is no process migration among cpus. However,
I cannot say this installation really has cpu affinity, because when I
use
-binding user:2,4 processes are not really binded to cpu2, and 4. Even
if I use intel01:4 binding=user:4,5,6,7. I see cpus 0,1,2,3, are busy.

Nevertheless, this is the best result I can get from the Bluegrit. On
it, the OpenMPI can do cpu affinity only on one node, because of TCP
firewall. Besides, MVAPICH2 cannot really support cpu affinity, since
there is no IB, iWARP, etc. Last, early version of MPICH does not
support core binding. It is really hard to get core mapping as a
non-root. I don't know why the admin are reluctant to install these
for the users. I wasted a lot of time on that!

### Install MVAPICH with HWLOC as a non-root

Here is descriptions of installing MVAPICH with hwloc as a non-root.
With hwloc, we can make mvapich support cpu affinity.
>>mkdir hwloc0.9.3
>>mkdir mvapich1.4.1
>>cd hwloc-0.9.3
>>make
>>make install
>>cd ../mvapich-1.4.1
>>make
>>make install
After that, add the mvapich1.4.1 path to .bashrc

## Feb 27, 2010

### Plot data file with PGFPlot

Tikz and pgfplot can plot data file in Tex.

\begin{figure}
\begin{center}
\begin{tikzpicture}[scale=0.3]
\begin{axis}[xlabel=nstep, ylabel=WaveHeight]
\addplot[mark=none] file {case14.data};   % Here is the data file
\legend{True, Predicted};
\end{axis}
\end{tikzpicture}
\begin{tikzpicture}[scale=0.3]
\begin{axis}[xlabel=nstep, ylabel=error(2-norm)]
\end{axis}
\end{tikzpicture}
\end{center}
\caption{(a)True and predicted wave height at the domain center; (b)2-norm of numerical error on the whole domain}
\end{figure}

The data file's format can be multiple columns separated by space. Each column is for one variable. Such as
# data file
0.00001 2.123243
0.00003 3.123452
......
Sometimes, Matlab's fprintf function may help to generate such data file.
For example:
fid = fopen('case14err.data','w');
b=1/650:6.5/650:6.5;
y=[b ; a];
fprintf(fid,'%1.5f %1.5f\n', y);
fclose(fid);

## Feb 26, 2010

### A script to create shared lib

A good script to create shared lib

1. With root privileges: create directories /usr/local/lib and /usr/local/include (if not exist)
2. add pass for /usr/local/lib in file /etc/ld.so.conf (if not exists)
3. run script: create_shared_lib.sh (text below)
f.e. you have money.c file that is prepared for creating library "money".
$sh create_shared_lib money 4. copy your money.h to /usr/local/include 5. to use your library: include money.h in global area of your program (#include &ltmoney.h>) gcc -Wall -g -o my_prog.exe my_prog.c -I/usr/local/include -lmoney 1. Without root privileges: create directory lib and include in your$HOME directory
2. change script's line 'mv "$libname".* /usr/local/lib/' for 'mv "$libname".* ~/lib/'
3. comment line 'ldconfig &'
4. run script (as I explained above)
include money.h in global area of your program (#include &ltmoney.h>)
gcc -Wall -g -o my_prog.exe my_prog.c -I$HOME/include -L$HOME/lib -lmoney

create_shared_lib.sh

#!/bin/bash

ARG=1
if [ "$#" -ne "$ARG" ]
then
echo -e "\t\tUsages: basename $0 name_of_shared_library" exit$BADARG
fi

libname=lib"$1" ########################################################################## # uncomment next 2 lines and comment 3-4 lines if you create C++ library # ########################################################################## #g++ -fPIC -c "$1".cpp
#g++ -shared -Wl,-soname,"$libname".so.1 -o "$libname".so.1.0 "$1".o gcc -fPIC -c "$1".c
gcc -shared -Wl,-soname,"$libname".so.1 -o "$libname".so.1.0 "$1".o ln -s "$libname".so.1.0 "$libname".so.1 ln -s "$libname".so.1 "$libname".so echo -e "\t\tSHARED LIBRARY$libname IS CREATED"
echo -e "\t\t==================================\n"
echo
mv "$libname".* /usr/local/lib/ echo -e "\t\tldconfig is working" ldconfig & exit 0 ## Feb 13, 2010 ### Node plot and position in Tikz It is difficult to create a neat, accurate and fancy scientific figure. Tikz/pgf, pstricks, metapost are not easy to learn, nor for me to have a full knowledge of their capabilities. On the other hand, gimp or photoshop is not suitable to produce figures from calculated data. I hope there will be a very handy tool that can help me using Tikz, or metapost easily. But I tested several java based front end. None of them would be a choice. Here is an example of plotting nodes and point them to each other with arrows, with position determination. It is mainly about '\node', '\draw' and their decoration. It is easy to understand how it is produced, hence, I put no comments here. ----------------------- \begin{tikzpicture} \draw[dash pattern=on 2pt off 3pt on 4pt off 4pt](2,.2) -- (2,3.2); \draw[dash pattern=on 2pt off 3pt on 4pt off 4pt](4,.2) -- (4,3.2); \draw[dash pattern=on 2pt off 3pt on 4pt off 4pt](6,.2) -- (6,3.2); \node at (2,2.9) [fill=blue!50,draw,circle, drop shadow, label=left:\tiny{$x_k^b$}] (n1){}; \node at (2,1.9) [fill=red!50,draw,circle, drop shadow,label=left:\tiny{$x_k^a$}] (n2){}; \node at (2,.9) [fill=green!50,draw,circle, drop shadow,label=left:\tiny{$y_k^o$}] (n3){}; \node at (4,1.9) [fill=red!50,draw,circle, drop shadow,label=left:\tiny{$x_{k+1}^a$}] (n4){}; \node at (4,2.9) [fill=blue!50,draw,circle, drop shadow,label=left:\tiny{$x_{k+1}^b$}] (n5){}; \node at (4,.9) [fill=green!50,draw,circle, drop shadow,label=left:\tiny{$y_{k+1}^o$}] (n6){}; \node at (6,1.9) [fill=red!50,draw,circle, drop shadow,label=left:\tiny{$x_{k+2}^a$}] (n7){}; \node at (6,.9) [fill=green!50,draw,circle, drop shadow,label=left:\tiny{$y_{k+2}^o$}] (n8){}; \node at (6,2.9) [fill=blue!50,draw,circle, drop shadow,label=left:\tiny{$x_{k+2}^b$}] (n9){}; \node at (8,2.9) [fill=blue!20, anchor=base](n10){$\cdots$}; \draw[->, line width=1pt] (1,0.2) -- (10,0.2) node[right]{time}; \node at (2,0.2)[below]{\tiny{$k$}}; \node at (4,0.2)[below]{\tiny{$k+1$}}; \node at (6,0.2)[below]{\tiny{$k+2$}}; \end{tikzpicture} \begin{tikzpicture}[overlay] \path[->] (n2) edge [bend right] (n5); \path[->] (n4) edge [bend right] (n9); \path[->] (n7) edge [bend right] (n10); \end{tikzpicture} \vspace{7mm} \begin{itemize} \item \tikz\node [fill=blue!50,draw,circle]{}; Background state$x^b$\item \tikz\node [fill=red!50,draw,circle]{}; Analysis state$x^a$\item \tikz\node [fill=green!50,draw,circle]{}; Observation$y^o$\end{itemize} ## Feb 12, 2010 ### Install TeXLive on Ubuntu To install texlive2008 1. sudo mount -o loop textlive2008-20080822.iso /mnt 2. cd /mnt 3. ./install-tl -gui If there is error about "Cannot load Tk, maybe something is missing" or "perl/Tk unusable, cannot create main windows.", Follow its suggestion, visit http://tug.org/texlive/distro.html#perltk 4. sudo apt-get install perl-tk 5. sudo ./install-tl -gui Make wise choice, default dir /usr/local/texlive/, click Install TeX Live Download TexMaker, Install it. In configure, add correct path for each executable file, such as /usr/local/texlive/2008/bin/i386-linux/latex. In this way, those programs can be found. 6. sudo vi ~/.bashrc and/or sudo vi /root/.bashrc Add one line at last export PATH=$PATH:/usr/local/texlive/2008/bin/i386-linux

To launch the tlmgr,
7. sudo su
8. tlmgr -gui
If the path is not set in the bashrc, there is error when you launch the tlmgr, even if you execute it from its directory. The error are like:
Can't exec "kpsewhich"
Can't locate TeXLive/TLPOBJ.pm in @INC
So just ass the path of TexLive installation directory.

It is better to install texlive to somewhere does not require root. i.e. /home/yourname/textlive . But the 7th step solves this problem.

## Feb 7, 2010

### mpdtrace and using multiple nodes to run mpi

In a cluster, which the user may need to launch the mpd manually, here are descriptions of to-dos. The situation that one may consider to do so, is when you launch a executable program on multiple nodes, however, there is only the local node is used. Or you see some error like MPI connection or communication error. It is the time to check if all the nodes listed in the hosts file are able to communicate with each other. Error like: mpiexec: unable to start all procs; may have invalid machine names remaining specified hosts.

Following description works for MPICH2, using mpiexec ot mpirun. That is how I tested.

On the node where you launch the program, type
>> mpdtrace -l
It gives you  <node name>_<port>(IP)

If not all nodes are listed here, for example, if blade46 is in the hosts file, and is available. ssh to blade46, type
>>mpd -h blade50 -p 51094 &
If you want to start more mpd
>>mpd -h blade50 -p 51094 -n &
Then the blade46 can be used.

To clean up mpd daemon, use mpdcleanup

Besides, if you want to launch m consecutive ranks on the same node, use mpd --ncpus=m
For example:
mpd --ncpus=2 &
or
mpd --ncpus=2 -h blade50 -p 51094 &

QUOTE"
If an mpd is started with the --ncpus option, then when it is its turn to start a process, it will start several application processes rather than just one before handing off the task of starting more processes to the next mpd in the ring. For example, if the mpd is started with
mpd --ncpus=4

then it will start as many as four application processes, with consecutive ranks, when it is its turn to start processes. This option is for use in clusters of SMP's, when the user would like consecutive ranks to appear on the same machine. (In the default case, the same number of processes might well run on the machine, but their ranks would be different.) (A feature of the --ncpus=[n] argument is that it has the above effect only until all of the mpd's have started n processes at a time once; afterwards each mpd starts one process at a time. This is in order to balance the number of processes per machine to the extent possible.)
"END of QUOTE

## Feb 5, 2010

### Animate image files

With a sequence of image files, we can code them into a video file. Suppose that they are named beginning from 1, use

ffmpeg -qscale 1 -r 20 -b 96000 -i %08d.png animate.mp4

If the file names do not start from 1, we can use a script do rename them in batch.

#!/bin/bash

d=1
for fname in *.png
do
mv $fname printf "%08d.png"$d
d=$(($d+1))
done

To enable mp3 etc.
>>sudo apt-get install ffmpeg libavcodec-extra-52
One example of extracting audio from mp4:
>>ffmpeg -ss 00:05:00:00 -t 00:02:00:00 -i input.mp4 -acodec libmp3lame -ab 128k output.mp3
ss: time offset from beginning of input in hh:mm:ss:frames.
t: duration of encode
ab: audio bitrate

## Jan 16, 2010

### GotoBlas and Lapack_wrapper

Gotoblas + lapack + lapack_wrapper
Rod Heylen has a good lapack wrapper, which can be used with ATLAS.
Find it here (http://itf.fys.kuleuven.be/~rob/computer/lapack_wrapper/index.html)
However, ATLAS is very hard and time consuming (>6hr for me) to
install. While, GotoBlas is said to have better performance than ATLAS
does, and is easy to install.
I tested to replace ATLAS with GotoBlas, and compile lapack_wrapper
examples successfully, on Ubuntu9.10, 32bit intel Core2DUO 2.2GHz,
gcc4.4. Of course, we need to change the lapack_wrapper a little. This
is because of some functions and variable declarations are
incompatible in ATLAS and GotoBlas.
Untar it to a directory, i.e. /home/shiming/GOTO
>>tar xvfz GotoBLAS2-1.10.tar.gz
cd to that directory
>> make
>>sudo cp libgoto2_* /usr/lib
>>sudo mkdir /usr/local/include/goto
>>sudo cp cblas.h /usr/local/include/goto
and CLAPACK-3.1.1/INCLUDE/clapack.h to /usr/local/include/goto
>>cd /usr/local/include/goto
>>sudo chmod 755 *
>>cd /usr/lib
>>sudo ln -s libgoto2_*.a libgoto.a
>>sudo ln -s libgoto2_*.so libgoto.so
Change a little in lapack_wrapper.c. I do not list them here. To see
The changed version is available here

Also edit the cblas.h by adding one line
#define blasint int
And edit the f2c.h by changing the line on the 10th line to
typedef int integer;

To compile, use
>>gcc -o lapack_example lapack_example.c lapack_wrapper.c -lm -lgoto -lgfortran
>>./lapack_example
Everything works well.

In a machine that I have no 'root' right, there is a little difference to run the code.
For example, a PPC blade, RHEL, 64bit, gcc4.1.2. Just compile to GotoBlas and copy necessary header files and lib files to somewhere, i.e. /home/shiming1/goto, and chmod 755 *.

In the directory where lapack_example.c exists, compile with
>> gcc -o test lapack_example.c lapack_wrapper.c -lm -L../soc/goto -lgoto2 -lgfortran
Run it with
>>LD_LIBRARY_PATH=goto/:\$LD_LIBRARY_PATH ./test

Here is a good article about using dynamic library
(http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html)
--------------------------------------------------------------------
One example about cblas_dgemv. Suppose matrix A is a 3 by 4 double, stored as a 'cvec' (row major).
A=[1 3 2; 6 4 1; 2 8 7, 3 4 5].
x1=[1; 1; 1]
x2=[1; 1; 1; 1]
y =Ax1

cblas_dgemv(CblasRowMajor, CblasNoTrans, 4, 3, 1.0, A, 4, x1, 1, 0.0, y, 1);
If y= A'x2
cblas_dgemv(CblasRowMajor, CblasTrans, 3, 4, 1.0, A, 3, x2, 1, 0.0, y, 1);

## Jan 6, 2010

### A Matlab file saving format

Let me take this as a good start.
There is one special requirement for saving Matlab vectors into a file with certain format: updated vector Vec in each step should be saved like,
[1, 2, 3, 4, 5, 6]
[2, 3, 4, 5, 6, 7]
......
Which mean each vector is enclosed with [ and ], each entry separated by a comma. No comma after the last entry.
There could be more special requirements for the file format. Other commands' combination can accomplish more complicated task, such as "dlmwrite", "csvwrite".

A snippet Matlab code is:

fid = fopen('Hdata.dat','wb');
% Inside a loop body
% vec is updated
fprintf(fid, '[');
fprintf(fid,'%12.8f,', vec); % add a comma after each entry
fseek(fid,-1,0); % file pointer rewind one
fprintf(fid, ']\n'); % cover the last comma with a ]
% end of a loop body

fclose(fid);