Concurrency is hard because we haven't figure out how to make it easy. For most developers, specifically web developers, concurrency doesn't really matter. I envy that assuasive confident feeling of a sequential execution of http requests. The number of cores on my machine quadrupled in last three years and I don't know a single reliable, comforting (easy) way of harnessing it as much as possible, I feel a little sad about current state of concurrency support.
Utilizing all the processing power consistently is a lot easier for well defined and not so concurrent tasks such as map-reduce. I have done it a lot, processing gigabytes of data by reducing the problem to independent subsets is programmatic triviality. On the other hand, I have always found developing a relatively concurrent application the "right way" to be a nightmare. Concurrency applications come in two mutually exclusive flavours: slow or complex.
At this point enthusiasts will point out java.util.concurrent and move on. While j.u.concurrent is nice and a significant improvement over explicit synchronization, it still mandates that API users be concurrency wizards and its complexity exposure is nearly at par with explicit synchronization. Here's one example blog post explaining common gotcha with ConcurrentHashMap. The only benefit j.u.concurrency provides is finer grained control over where to do CAS. I am a huge fan of j.u.concurrent and have been using it pre-1.5 but I still don't think it makes concurrency so easy. For one more example,
synchronized(this){ aRef = newVal;  return aRef;}
v/s
 while (true) {
            V x = atomicRef.get();
            if (atomicRef.compareAndSet(x, newValue))
                return atomicRef.get();
 }
Which one do you think is easier to grasp?
Many people think that Actors are the next big thing to tackle concurrency monster and complexities introduced by these shared memory model primitives. I too initially thought so, but then I found that Actor model isn't really the sweet spot in practice as it is touted. The very notion that Actors can fail and code must handle the tricky bits to recover from it makes it even more complex than using locks/mutexes etc. I am in constant a awe to see people talking so lightly about  fault tolerant/fail safe systems without giving thought on the amount of complexity it adds. I am not necessarily protesting that philosophy but that behaviour is just not common in yer average regular applications (will your user be happy if one actor failed to process her payment and was asked to retry?). We still live in dark ages of transparent concurrency.
I remain as ignorant and unsatisfied about concurrency support as I was several years ago. For me, concurrency is hard so I am off to shopping!
