I've just finished reading a book entitled, "The Connection Machine" -- it's about a massively parallel machine that was built in the early 1980's consisting of 65384 single-bit processors organized into a ~1m square cube. The architecture was designed by an MIT graduate student who wanted to try to both model intelligence to a level yet unattained, and to design an architecture that departed from the traditional Von Neumann architecture that separates memory and processing resouces into distinct areas with a very low bandwidth connection between the two. As such, the author notes, while one may have a huge number of gates on a modern processor, at any given time only a very small fraction of the gates devoted to 'processing' are active, while many orders of magnitude fewer gates yet are active at a given moment in the memory portion of the processor -- only a few bytes out of many kilo or megabytes.
The Connection Machine was designed to be something much different -- a huge number of extremely simple processors with very little memory per processor, designed with a flexible network topology such that the structure of the network could be modified to resemble an actual physical or structural arrangement of a problem -- individual processors representing individual neurons in a neural network, or individual particles in fluid dynamics or cellular automata. From what I can understand from reading the book (which was based on Hillis' dissertation), each processor is essentially a simple 1-bit Arithmetic Logic Unit, and each processor recieves the same instructions (they have the option not to execute a given instruction), which are carried out on the data in their very small local memory buffers (~4k). With 64k processors, one essentially has 65384 very simple arithmetic logic units acting in parallel on a huge ammount of data! To picture this in a conceptually different way, I sometimes have tried to think of it as though someone had taken a traditional array of memory, and added a very simple processor to arbitrarily small chunks of it. As such, a greater number of gates are active at any given time in the processor, and a great deal of data can be throughput.
I am quite fascinated by this architecture, and it reminds me a lot of modern FPGA stuff. It also has me thinking of the efficacy of either emulating these ideas in a cluster of microcontrollers (for instance, PIC's), or building a small, interesting, fun cluster of PIC's (or some other microcontroller) to sit on a desk off in a corner and chug away at problems. Does this idea fascinate anyone else? Is anyone familiar with efforts to make computing clusters of microcontrollers, and any idea of their conceptual design or performance? I have tried some initial googling, but haven't found anything very promising -- it seems like a relatively non-picked-over problem.