Modern Intel/AMD processors have instruction FSINCOS for calculating sine and cosine functions simultaneously. If you need strong optimization, perhaps you should use it.

Here is a small example: http://home.broadpark.no/~alein/fsincos.html

Here is another example (for MSVC): http://www.codeguru.com/forum/showthread.php?t=328669

Here is yet another example (with gcc): http://www.allegro.cc/forums/thread/588470

Hope one of them helps. (I didn't use this instruction myself, sorry.)

As they are supported on processor level, I expect them to be way much faster than table lookups.

Edit:
Wikipedia suggests that FSINCOS was added at 387 processors, so you can hardly find a processor which doesn't support it.

Edit:
Intel's documentation states that FSINCOS is just about 5 times slower than FDIV (i.e., floating point division).

Edit:
Please note that not all modern compilers optimize calculation of sine and cosine into a call to FSINCOS. In particular, my VS 2008 didn't do it that way.

Edit:
The first example link is dead, but there is still a version at the Wayback Machine.