Yesterday I tried to figure out what a proper, scalable (like OpenGL) vector/array/matrix math API would look like. Its function would be, like OpenGL, to abstract the underlying hardware and provide async math operations on small to huge data sets.
Now, I just found out that Apple and Khronos are designing/finalizing OpenCL, which might be just that API, but so far I have not been able to find any real definitions of its scope/philosophy, other than “utilizing the GPU”.
To me that sounds a bit narrow-minded, given that you may very well want to use such an API to abstract anything from single-core CPUs, to networked clusters of PS3’s. I also see a big potential for CPU manufacturers (AMD, Intel, IBM) to include custom on chip multi-core co-processors that might be a mix between a GPU and the vector cores found in the Cell-chip, that can be used via the OpenCL API.