Saturday, September 16, 2006

Data Analysis Programming Languages

My specialty is data crunching software, the nitty gritty work involved in data analysis, preparing datasets for analysis, transforming and cleaning up dirty data, the stuff that is a lot more time consuming than running the actual statistical analyses themselves.

There's an interesting comparison of SAS vs R on the internet, but it kind of misses out on the fact that data analysis is actually two things:

data analysis = data crunching + statistical analysis

Statistical analysis is doing the actual statistical tests, and using a statistician's expertise to decide what type of mathematical analysis is appropriate. Of course it does involve some coding, but much of the work is with pen and paper, or just plain thinking(what are the strengths and possible risks of this statistical model, etc.).

Data crunching , by comparison , is grunt work. It involves much greater amounts of time writing and maintaining code. It involves transforming, merging, integrating, conforming, cleaning data where the data can be dirty and of varied formats, and the transformations required can be complex. It can be extraordinarily time-consuming.

Creating new innovations to reduce the time needed for data-crunching tasks is my specialty. Just check out my free download, my new data crunching programming language

0 Comments:

Post a Comment

<< Home