February 8, 2012

Postcards from the Strange New World Beyond the Moore's Law Horizon


I'm sure everyone has heard of Moore's Law; first described in 1965 by Gordon Moore, it's held true for close to 50 years. But recent evidence is showing that we are nearing the end of this remarkable time:

The major processor manufacturers and architectures, from Intel and AMD to Sparc and PowerPC, have run out of room with most of their traditional approaches to boosting CPU performance. Instead of driving clock speeds and straight-line instruction throughput ever higher, they are instead turning en masse to hyperthreading and multicore architectures.

But the coming end of Moore's Law is, by now, pretty old news. What's more interesting is: what lies beyond? From what I've been reading recently, the future is likely to involve significant changes, with lots of opportunity but also lots of unexpected new situations to be aware of.

One technique that has received a fair amount of attention is to use some of the other spare capacity in the typical computer: namely, the powerful graphics processors that manage your computer display. GPU capabilities are so advanced at this point that top-end supercomputers now use GPU acceleration routinely:

in the top 500 ranking of the world’s fastest supercomputers, 39 systems use GPUs as accelerators and 35 of these use Nvidia chips. The graphics processors are used in supercomputers because they can handle massively parallel tasks that high-end computing requires while using less energy than the typical CPUs.

Until recently, taking advantage of GPU power in this way was quite complex, but a recent project called OpenCL is trying to lower the bar for using alternate CPU power in general-purpose applications.The guys over at Object Computing, Inc. have written a nice introductory paper on using OpenCL: GPU Computing with OpenCL.

The OCI paper is a great introduction to OpenCL programming, with a good overview of the library, a lot of examples of code using the library, and some benchmarking results. Interestingly, the paper presents both a case where using the available GPU power results in a five-fold speedup, but also a case where parallelizing a weaker algorithm did not result in the fastest overall performance; using a better algorithm turned out to be superior:

The OpenCL kernel running on the CPU device was the slowest of all, but the same code on the GPU device performed the best of the bitonic sort variants. The OpenMP variation, using all CPUs, was not far behind. The fastest sort, however, was std::sort(), which demonstrates that the selection of the best algorithm may not be intuitive — for this problem size and platform, against conventional wisdom, a parallel algorithm was not faster than a sequential one.

Programming language designers are also considering the implications of widespread parallelism. For a great recent article, check out Herb Sutter's Welcome to the Jungle:

2011 was special: it’s the year that we completed the transition to parallel computing in all mainstream form factors, with the arrival of multicore tablets (e.g., iPad 2, Playbook, Kindle Fire, Nook Tablet) and smartphones (e.g., Galaxy S II, Droid X2, iPhone 4S). 2012 will see us continue to build out multicore with mainstream quad- and eight-core tablets (as Windows 8 brings a modern tablet experience to x86 as well as ARM), and the last single-core gaming console holdout will go multicore (as Nintendo’s Wii U replaces Wii).

As Sutter observes in his article, the heterogeneity of these systems is increasing rapidly:

As software developers, we will be expected to enable a single application to exploit a “jungle” of enormous numbers of cores that are increasingly different in kind (specialized for different tasks) and different in location (from local to very remote; on-die, in-box, on-premises, in-cloud).

As "Welcome to the Jungle" observes, most current programming languages aren't prepared to deal with the upcoming new world of heterogenous processors with weak memory models:

On the software side, all of the mainstream general-purpose languages and environments (C, C++, Java, .NET) have largely rejected weak memory models, and require a coherent model

For an idea of what it might be like to challenge some of these assumptions, and envision a new style of programming, have a look at the work being done by the Renaissance project underway at IBM Research, Portland State University, and Vrije University of Brussels.

Some of the ideas being considered by this team are truly startling. For example, in their paper: Harnessing emergence for manycore programming: early experience integrating ensembles, adverbs, and object-based inheritance, the authors look for solutions in natural systems:

Nature provides us with many examples of massively parallel systems. Without the benefit of any global synchronization at all, these systems manage to solve complex problems and achieve robust behavior.

And in their paper: Which Problems Does a Multi-Language Virtual Machine Need to Solve in the Multicore/Manycore Era?, the authors suggest that the implications for programming languages will be extensive:

On the language level, the support for different concurrency models needs to allow language designs to vary the given guarantees with respect to encapsulation, scheduling, and properties like immutability.

And in a recent presentation at the Dynamic Languages Symposium 2011 conference: Everything You Know (about Parallel Programming) Is Wrong! A Wild Screed about the Future, the authors suggest that we may even find ourselves giving up on the idea of program "correctness", and instead designing software that can get the wrong answer, but can then detect that situation and repair itself. They call this "race and repair", and have written more about this idea here, saying:

To the extent that an application remains acceptable to its end users with respect to their business needs, we aim to permit – or no longer attempt to prevent – inconsistency relating to the order in which concurrent updates and queries are received and processed.

There was even an entire conference organized to discuss such things: Inconsistency Robustness 2011 and an International Society for Inconsistency Robustness. Crazy!

What I see that connects all these various efforts is a sort-of three-pronged approach to address the problems posed by the end of Moore's Law and the transition to massive parallelism as a standard computing environment:

  • Library and framework providers, such as OpenCL, can offer tools for programmers building parallel applications
  • Programming language designers, such as C++'s Sutter, will continue to design and prototype new programming languages for distributed/parallel environments
  • And theorists such as those at the Renaissance Project are exploring radically new ways to define problems in such a way that parallel algorithms can be entirely freed from the bounds of our previous ways of thinking about computation.

I hope you enjoy reading about these issues as much as I did!