Modern Intel/AMD processors have instruction
FSINCOS for calculating sine and cosine functions simultaneously. If you need strong optimization, perhaps you should use it.
Here is a small example: http://home.broadpark.no/~alein/fsincos.html
Here is another example (for MSVC): http://www.codeguru.com/forum/showthread.php?t=328669
Here is yet another example (with gcc): http://www.allegro.cc/forums/thread/588470
Hope one of them helps. (I didn't use this instruction myself, sorry.)
As they are supported on processor level, I expect them to be way much faster than table lookups.
Wikipedia suggests that
FSINCOS was added at 387 processors, so you can hardly find a processor which doesn't support it.
Intel's documentation states that
FSINCOS is just about 5 times slower than
FDIV (i.e., floating point division).
Please note that not all modern compilers optimize calculation of sine and cosine into a call to
FSINCOS. In particular, my VS 2008 didn't do it that way.
The first example link is dead, but there is still a version at the Wayback Machine.