I got an email from Intel yesterday, announcing what amounts to 256-bit SSE instructions. They call it AVX for Advanced Vector eXtensions. I presume this going to eventually rely on the forthcoming technology currently code-named Larrabee, which seeks to push the CPU and GPU closer together and addressable with everyday tools, unlike having to write DX shaders, or use some special C++-like language (e.g. Brook or CUDA). Doing a little sleuthing turns up some hack's report on the spring IDF which mentions that AVX will be a post-Nehalem (Sandy Bridge) CPU feature. However, I suspect we'll see more of a CPU-GPGPU blend once Nehalem is out.
The only thing I can really get my hands on right now is a marketing site: http://softwareprojects.intel.com/avx/, which has some interesting PDFs about how you can use these extensions, but no engineering samples to give away. Why not?
AVX should make graphics more fun, since we can pack 4 doubles into a register and do a matrix multiply on it. Since 4 doubles makes a 3D coordinate in projective space, highly accurate spatial processing should get a lot faster.
On a related note, I hope to get a Penryn soon, and will be able to finally use the DDOT SSE4.1 instruction on some vertex data. Projections should get a lot faster... we'll see!