Did some profiling during my lunch break, since I was curious to see how a one-day implementation of a parallel library would perform against a super rival like Microsoft's TPL. This simple console app has been run in release mode, so no debug stuff lying around. My work laptop's a Fujitsu-Siemens Amilo Pro, Intel Core Duo 1.86 GHz, 1.75 GB RAM.
100000 iterations Bessel function.
Parallel (RNW): 21076 msecs elapsed.
Parallel (TPL): 15877 msecs elapsed.
Sequential: 44302 msecs elapsed.
Ratio: Parallel RNW is 2,1020117669387 times faster.
Ratio: Parallel TPL is 2,79032562826731 times faster.
The results are pretty self explanatory, considering noise and everything. While this makes me a bit sad, I'm still happy to see that the difference is not that humiliating - at least, that's how I see it, and it's granted it could be oh so much better. All in all, my implementation is not optimized nor has been given that much forethought, so I can consider this a small step towards Nirvana and a great first-hand experience in parallel coding, which is invaluable.
I'd like to congratulate the TPL team on an awesome job. Can't wait for the release of this great library.