First of all, this implementation is not done by me. The original page is here, by Jan Simon. Putting some log here is only for future reference.
In recent data processing, a large amount of date strings are parsed into Matlab cell array. It is painful to use 'datenum()' to convert a long cell array (>hundred thousands elements). Again, no one wants to bother with parallel code. So I found Jan Simon's function.
First, unfortunately that our date string format is different from those cases implemented in Jan's function. Our date format is
'mm/dd/yyyy hh:mm:ss AP', where the month, day, hour can be one digit or two digits with the last two chars representing AM or PM.
Second, fortunately, Jan's code is well structured. I can easily add a case 2 to handle our date string, since I am sure that all strings are of the same format. This is done with my very very limited knowledge of Matlab-C coding. I love such easy-to-read code. So the code is adapted to deal with the new format.
The result is very encouraging and exciting. For a cell array with ~260000 elements, the adapted DateStr2Num takes 0.021095 seconds elapsed time, while the build-in datenum takes 87.961205 seconds, which is about 4170 speedup! Amazing. Results are identical. Another cell array with ~176000 elements, the DateStr2Num takes 0.014715 seconds; the datenum takes 59.845605 seconds, which is about 4067 speedup! Consistent performance!
The calculation is done with Matlab R2013b, Mex compiler is VC++2008 express; win7-64bit + 8Gb memory with Intel Core i5-3570@3.40GHz.
Don't want to belabor too much about the principle of code optimization. It is clear that the build-in Matlab function has to handle many situations. But when you are sure about the homogeneous of the data and the time efficiency is critical, then it is better to remove those unnecessary various situation handling. This is also applicable to R functions.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment