Thursday, September 21, 2006

Innovation: Incremental and Quantum Leap

Vilno was not what I had envisioned in the late 1990's. But SAS was too proprietary and too expensive and SQL SELECT did not have the features I wanted. Instead of taking an existing data-crunching standard and creating a high-level productivity explosion tool on top of it, I opted to create a new data crunching standard from scratch.

It turns out, however, that this approach has some advantages. With Vilno developed, I can now approach the innovation puzzle in two directions.

Vilno 0.85 already shows much of this, and the core infrastructure allows tremendous room to grow - for example, a lot of data crunching tasks can be done with much less code if the GRIDFUNC transform can use a composite where clause. Vilno was designed from the beginning to get better and better. With just incremental innovations, in time, Vilno will leave SAS and SPSS in the dust.

A tool that sits on top of Vilno that implements a quantum explosion in productivity. Kind of like comparing Python to Fortan, if you get my drift. I know , intuitively , that this problem is largely solvable, and solve it I will.

I still have a deep familiarity with the nitty gritty problems of biostatistical programming(and data crunching in particular), and a healthy respect for the power and limitations of mathematical models, so the odds of success are very good.

Tuesday, September 19, 2006

Oracle and Ingersoll-Rand

An interesting Wall Street Journal article the other day, said that an industrial-products company , Ingersoll-Rand, five years ago spent 18 million dollars on Oracle database software. It's clear from the context that this was not for services , just for pure software.

From the view-point of a middle-class software developer, that is simply astonishing!
If it was total sales of software across a vertical industry, say all industrial-goods companies that are Oracle customers, then that would be one thing, but for a single customer that is truly eye-popping! I hope they're getting their money's worth.

I would love to know the size of the annual check that Pfizer, Merck, and company send to SAS Institute, alas I have not seen that in a newspaper. It's got to be in the millions(per customer). It doesn't really matter though, because my pitch to them relies not so much on getting the price down(important, and easy to explain) but on altering the slowdown in technological innovation and productivity(extremely important, and more subtle than the first point).

Which leads to the question: when does it make sense for a software customer ( a bank , a pharmaceutical company , or a government department) to, instead of playing a passive role(waiting for the salesman to come and just saying yes or no) to play a more active role, a deliberate venture to change the future of the software industry?

Obviously that depends on how much they spend each year on software. But it depends on other things as well.
As I've argued previously, when a certain software category is controlled by a monopoly, that can have long-term negative effects that go beyond price. In the case of statistical software for the pharmaceutical companies, the negative effects are very serious indeed, as I've argued elsewhere. And the easiest way to change the future of the statistical software industry, to make it more competitive and more innovative, is an open source buyout, the financing of a conversion of an already existing proprietary software product to open source status.

Saturday, September 16, 2006

Data Analysis Programming Languages

My specialty is data crunching software, the nitty gritty work involved in data analysis, preparing datasets for analysis, transforming and cleaning up dirty data, the stuff that is a lot more time consuming than running the actual statistical analyses themselves.

There's an interesting comparison of SAS vs R on the internet, but it kind of misses out on the fact that data analysis is actually two things:

data analysis = data crunching + statistical analysis

Statistical analysis is doing the actual statistical tests, and using a statistician's expertise to decide what type of mathematical analysis is appropriate. Of course it does involve some coding, but much of the work is with pen and paper, or just plain thinking(what are the strengths and possible risks of this statistical model, etc.).

Data crunching , by comparison , is grunt work. It involves much greater amounts of time writing and maintaining code. It involves transforming, merging, integrating, conforming, cleaning data where the data can be dirty and of varied formats, and the transformations required can be complex. It can be extraordinarily time-consuming.

Creating new innovations to reduce the time needed for data-crunching tasks is my specialty. Just check out my free download, my new data crunching programming language

Wednesday, September 13, 2006

new to blogging!

This is only my second blog. (the other one is at, same name).
So, got a lot to learn about the blogosphere, how to talk to other bloggers,
how to get their attention, how to create buzz.

I'm a software developer, I've created a new data crunching programming

Got a question folks( is there anybody out there listening? ). If you think
you can enlighten me on this issue, please leave a comment on my blog.

All-purpose programming languages ( C, Perl, etc. ) get a ton of comment
(even flamewars) on the internet. Specialized programming languages - specific
to a certain usage, such as row/column data manipulation, do not get a lot
of talk on the internet, but they can be very important for getting the job done.
(By far the most famous specialized programming language is SQL, it does just
one thing, and does it well - like Kentucky Fried Chicken, but there are many

Any ideas or thoughts , folks?

Robert (i.e. datahelper)

Tuesday, August 29, 2006

my new blog!

I have a new blog.

Let's try it out