Have you seen significant performance gains after vectorizing and threading your code?