on the 11/July/2000
published this review of the 3Dfx Voodoo 5 5500 by Mike Andrawes
Introduction
Last year's Fall Comdex in Las Vegas was the big debut of 3dfx's VSA-100 architecture used in the Voodoo 4/5 series of cards. 3dfx had kept everything under tight wraps before the announcement was finally made at the wax museum in the Venetian. After the launch, we published the following thoughts in our 3dfx Voodoo4 / Voodoo5 Comdex 99 Preview:
We all expected a fill rate monster out of 3dfx and the products they announced based on the VSA-100 managed to fulfill every last one of our expectations.
Of course, that was almost nine months ago and things have changed quite a bit since then. The "monster fill rate" of 1.33 gigapixels/s promised by the top of the line Voodoo5 6000 AGP has yet to actually appear anywhere - in the hands of consumers or reviewers alike. The Voodoo5 6000, originally scheduled to be released shortly after the 5500 model, has been delayed yet again with a new release date of summer 2000. 3dfx blames the situation on "component shortages," which is pretty vague. However, our sources tell us that the components that are in short supply are the VSA-100 chips themselves, which makes sense given the situation at hand. Similarly, the Voodoo4 4500 is not being launched at this time because 3dfx would rather divert the available VSA-100 chips to the higher margin Voodoo5 5500.
The Voodoo5 5500 AGP model is, however, available now and has delivered the promised 667 megapixel/s fillrate originally announced at Comdex. At the time of the announcement, 667 megapixels seemed massive with NVIDIA's just released GeForce pumping out 480 megapixels/s. But now, NVIDIA's top of the line GeForce 2 GTS cranks out a solid 800 megapixels/s. More importantly, however, the GeForce 2 GTS' texel fillrate is 1.6 gigatexels/s, which is more than double that of the Voodoo5 5500 and even the unreleased Voodoo5 6000, where pixel and texel fillrate are identical for the 3dfx cards. Times have certainly changed - there's no doubt about that.
But 3dfx still believes they have better features on their side, despite the lack of hardware T&L on their cards. The key is 3dfx's T-Buffer, named for its creator Gary Tarolli, that provides a number of special effects, including FSAA, motion blur, depth of field, and soft shadows. NVIDIA has jumped on the bandwagon with their own FSAA implementation but 3dfx argues their solution provides better image quality. We'll examine the FSAA image quality debate in an upcoming article focused specifically on that, but for now we'll briefly say that our initial impressions give 3dfx the edge here.
Funny how times have changed, with NVIDIA becoming the speed king and 3dfx the biggest proponent of image quality. It wasn't too long ago that NVIDIA was proclaiming 32-bit color as the next key feature for 3D accelerators, while 3dfx kept insisting that speed was king.
Had 3dfx hit their original target release of Fall 1999, the Voodoo5 would be competing with the GeForce SDR and GeForce DDR. In such a situation, the Voodoo5 would have the clear advantage from the standpoint of raw power. In terms of features, 3dfx wouldn't be hurting nearly as bad as they would be the only one with a reasonable FSAA implementation since the GeForce 256 is simply not powerful enough to perform FSAA at playable frame rates in most games. Although the Voodoo5 doesn't have T&L, titles taking advantage of T&L are just now appearing in the market - so while it would have made little difference in the Fall, it is a feature that is becoming increasingly desirable now.
Could have, would have, should have doesn't matter for today. What we have today is the Voodoo5 5500 that 3dfx needs to do reasonably well to keep them afloat until their next generation product. We previewed the Voodoo5 5500 back in April, but final silicon and drivers have finally arrived in the AnandTech lab and boards are now available after one last compatibility related delay. Let's take a look and see if the Voodoo5 5500 has a chance.
VSA-100
The VSA-100 is the first in a line of "Voodoo Scalable Architecture" chips and is the heart and soul of 3dfx’s Voodoo4 and Voodoo5 products. The word "Scalable" refers to the fact that multiple chips can be run in parallel, signifying the long awaited return of SLI to 3dfx's product line. There is support for up to 32-way SLI, which Quantum 3D will likely produce for extremely high-end applications. Each chip has its own 128-bit wide interface to memory, meaning that, unlike NVIDIA’s GeForce, the VSA-100 should not be limited by memory bandwidth. Although the memory is setup to be completely shared, texture data has to be repeated for each VSA-100 chip. This is typical of all current multi-chip solutions, such as 3dfx’s own Voodoo2 and ATI’s Rage Fury MAXX. However, the VSA-100 does have the advantage of FXT1/DXTC texture compression in hardware, which will minimize the amount of wasted space.
The SLI on the VSA-100 is actually a little bit different from the setup used on the Voodoo2 and is in fact quite a bit improved. The first change is that the number of scan lines that each chip renders can be adjusted from 1 to 128 lines depending on a number of factors, including number of chips, resolution, fillrate required, etc. This is something 3dfx will have to tweak for optimal performance. They have yet to determine whether this will be something fixed in the driver or if it will dynamically adjust depending on the load or the application involved. Since it is definitely software programmable, chances are that it will be a variable that end users will be able to tweak through the registry or any number of tweaking utilities.
The other big change is the SLI interface. On the Voodoo2, the interface was a multi-board, analog solution that required very strict timing control and identical boards. Furthermore, the analog signal was prone to degradation as it passed over a short cable inside the PC. VSA-100 features a 30-bit digital interface between the chips, which are all on one board this time. There are 28-bit data bits, a data valid bit, and a clock.
One advantage of the Voodoo2’s multi board SLI setup is that you could buy one Voodoo2 board to get started and add a second one at a later date to nearly double performance. Of course, this time that’s not possible as all chips must be on one board.
Compared to the Voodoo3, the VSA-100 adds support for 32-bit color rendering, 32-bit textures, 32/24-bit Z & W, and an 8-bit stencil buffer. Furthermore, the VSA-100 can also render two single-textured pixels per clock or one dual-textured pixel per clock. Support for 2048 x 2048 textures has now been implemented into the VSA-100, thus the VSA-100 offers essentially everything the Voodoo3 lacked and was criticized for tremendously.
The chip is an AGP 4X part, with support for AGP 2X, AGP 1X and PCI operating modes. In spite of this the VSA-100 does not support AGP texturing. 3dfx still feels that AGP texturing is not truly beneficial and thus there is no reason to pursue support for it with their products. The chip itself is composed of 14 million transistors, a little more than half the count of the original GeForce, and is manufactured on an enhanced 0.25-micron, 6-layer metal process. The "enhanced" 0.25-micron process just means that it takes advantage of shorter gate lengths, which allow for faster switching, thus allowing for higher frequencies and greater yields at those frequencies.
At the launch, 3dfx claimed that they would get better yields out of the tried and true 0.25 micron process than they would by moving to a 0.22 or 0.18 micron process like their competitors. Thanks to the long delays in getting the VSA-100 products to market, this strategy has more or less backfired on 3dfx, leaving them with a slower, hotter, more expensive chip. And to compound things, they're apparently not able to get enough chips out of their fab plant, TSMC in Taiwan, even though that's the same fab that NVIDIA uses.
The VSA-100 supports all T-Buffer effects, Full Screen Antialiasing, FXT1/DXTC texture compression and all of the other features 3dfx has been talking about for the past few months. For more information on those technologies read our in depth coverage of the T-Buffer here.
The VSA-100 supports anywhere from 4MB to 64MB of memory per chip, whose clock is synchronized with the core clock, just like the Voodoo3. The memory bus is 128-bits wide and will offer 2.7GB/s of memory bandwidth per chip. The excellent 350MHz RAMDAC of the Voodoo3 is carried over to the VSA-100, so 2D image quality is up there with the best.
From the above description the VSA-100 doesn’t appear to be much more than a Voodoo3 with support for a few new visual features and 32-bit color rendering support, but the chip’s support for up to 32-way SLI scalability (hence the name Voodoo Scalable Architecture) is what truly defines it and sets it apart from the Voodoo3.
The Card
The only VSA-100 based product currently available is the Voodoo5 5500 AGP, featuring two VSA-100 chips and a total of 64MB of SDRAM. The core and memory are both clocked at 166MHz. Because the two chips are working together in SLI mode, the 64MB of memory is split evenly between the two, and since they are essentially independent of one another, the textures in any scene must be duplicated in each set of 32MB of SDRAM. This means that if you have a scene with 10MB of textures, it occupies a total of 20MB of memory out of the 64MB on board since each chip requires those 10MB of textures to be available to it locally. Frame buffer memory is not duplicated between both banks, however, so the Voodoo5 effectively has more than 32MB of memory to work with but less than 64MB. The exact amount varies the resolution and color depth in use.
Each chip can render two single textured pixels per clock or one dual textured pixel per clock. This gives a single VSA-100 chip a fill rate of 333 megapixels per second when dealing with a single textured game, or 166 megapixels per second when running a dual textured game. For the Voodoo5 5500 AGP, this results in a fill rate of 667 megapixels per second for a single textured game or 333 megapixels per second for a dual textured game. Seemingly ages ago, when single textured games were the only things available, this sort of a fill rate made the most sense but as most of today’s games are dual textured, this sort of flexibility is not as useful as it once was. In fact, we once again see the roll reversal of 3dfx and NVIDIA - 3dfx always had the cards with the texel fill rate double the pixel fill rate, while NVIDIA always kept them equal. With the launch of the VSA-100 and the GeForce 2 GTS, those rolls are now reversed.
Each chip has its own 128-bit pathway to its 32MB of SDRAM, meaning that each chip has the bandwidth of an SDR GeForce 256. When put together, the card theoretically has about 5.3GB/s of available memory bandwidth, exactly as much bandwidth as a GeForce 2 GTS. While some of that bandwidth is wasted during texture upload, there is no real waste when rendering since each chip will only access the textures it needs.
So while 3dfx falls considerably behind in terms of raw (theoretical) fillrate, 3dfx still has a fighting chance as the NVIDIA cards are severely limited by memory bandwidth. Coincidentally, the Voodoo5 has the same memory bandwidth of the GeForce 2 GTS, 5.3GB/s, although they go about achieving it in different ways. Both manufacturers use a 128-bit data path running at 166 MHz to memory. This would result in 2.66GB/s of memory bandwidth, but both companies double this - 3dfx by using two independent paths to memory on two separate chips to transfer twice as much data at once, while NVIDIA uses DDR memory to transfer data twice per clock.
If NVIDIA is limited by this 5.3GB/s of bandwidth, it stands to reason that 3dfx could be limited similarly, and if the bottleneck is the same on both cards, it's possible the performance would be the same. Of course, all this theoretical talk is worthless if the results aren't borne out in the real world. We'll take a look at the real world performance in just a bit.
Our evaluation board featured eight 8MB 6ns SDRAM chips manufactured by Toshiba. The 6ns rating means that these chips should be able to work at 166MHz (which is what they’re clocked at) and not much higher. However, SDRAM chips are generally rated pretty liberally, meaning that a chip rated at 166MHz might be able to hit 183MHz. We'll take a more detailed look at overclocking in a moment as well.
Each VSA-100 chip is covered by a high quality AAVID heatsink and fan. Attached with a bit too much heatsink glue, the cooling on the Voodoo5 5500 is still quite good. To ensure that the dual VSA-100 chips get enough power, it draws its power from the +5V power rail of your power supply courtesy of the 4-pin power connector present on the board. The reason for the board’s incredible length is because all the components required to regulate the power supplied to the board must be present, instead of relying on the AGP slot to provide the power and the motherboard to regulate the power supplied.
Despite claimed support for AGP 4x in the VSA-100 chip, our board apparently does not support it. As we've shown in the past, AGP 4x offers virtually no performance advantage over AGP 2x, so it's not really a big deal, but still worth noting.
The beauty of the Voodoo5 5500 AGP as an evaluation sample is that, by disabling one of the chips, we essentially have a 32MB Voodoo4 4500 AGP card that we can also use to illustrate the performance we can expect out of that solution. As mentioned previously the Voodoo4 is not yet as available thanks to a component shortage.
Drivers
When we first previewed a preproduction Voodoo5 5500, we proclaimed that:
This is where the biggest performance improvement will lie, in the drivers. It is quite obvious that the drivers we were provided with weren’t optimized for performance across the board, since the performance of the Voodoo5 at lower resolutions such as 640 x 480 was sub-par.
We have definitely seen a slight improvement with the final drivers, the improvement is not as great as we had expected or hoped for. Overall we saw about 1-3 fps increase at most resolutions in Quake 3 and Unreal Tournament - less than 5% - and that can be partially attributed to final silicon. We did occasionally see drops in performance, but this was mainly at lower resolutions where it's not even noticeable.
We'd also like to address the much publicized "correct method" to install the Voodoo5 drivers that claim to increase performance up to three times. It was originally posted on 3dfxgamers.com's messageboards and discussed quite a bit in the AnandTech Forums as well. The problem appears when the card is identified as a "3dfx Voodoo Series," usually due to a previous 3dfx adapter in the system that was not uninstalled properly. If you did properly uninstall your old 3dfx card, or you never had one, this isn't an issue. When we originally tested the Voodoo5 for our preview, we used a cleanly formatted hard drive as we always do in our testing, so those performance numbers were not affected by this problem with the 3dfx installer. Needless to say, we've done the same thing in this final review.
There's also been some talk of refresh rates having a significant impact on performance. As with most recent cards we've tested, this is not the result that we have found in our testing. All our gaming benchmarks are performed at 60Hz with v-sync disabled.
Other than that, 3dfx tools remains relatively unchanged from the prerelease version. In order to obtain WHQL (Windows Hardware Quality Labs) certification, Microsoft does not allow vendors to include overclocking utilities or the ability to disable vsync. 3dfx's solution? Include a link in the drivers to download their overclocking utility that also adds check boxes to disable vsync. This way, 3dfx gets their WHQL certification to keep OEM's happy, while end-users get their tweaking utilities - that way everybody ends up happy
Kudos for 3dfx here for not requiring registry hacks or 3rd party utilities.
The Test
Quake 3 Arena Performance - AMD Athlon 750
The GeForce line absolutely dominates the benchmark standings at 640x480, most likely due to their hardware T&L engine. Note the minimal performance drop when moving to 32-bit for the Voodoo5. At least at this res, it has plenty of memory bandwidth to go around. Interestingly, even the venerable TNT2 Ultra is able to beat out the Voodoo5 5500 at this low resolution, showing that the Voodoo5 drivers are probably still immature.
At 800x600, the Voodoo5's memory bandwidth begins to flex its muscle as its able to beat out the GeForce SDR and GeForce 2 MX in 32-bit color. The GeForce DDR is still faster, however, thanks to the combination of T&L and better drivers.
At 1024x768x16, the Voodoo5 5500 beats out everything but the DDR based GeForce models. It actually steps slightly ahead of the GeForce DDR in 32-bit color, once again thanks to better memory bandwidth.
At 1280x1024x32, we see the Voodoo5 approaching the performance of the GeForce 2 GTS. The similar performance levels we see here are likely due to the fact that these two cards have equal memory bandwidth. The GeForce 2 GTS still dominates in 16-bit color though thanks to good old raw fillrate.
Nothing much changes at 1600x1200 as the Voodoo5 still can't overtake the GeForce 2 GTS. The GeForce DDR is not too far behind.
Quake 3 Arena Performance - Intel Pentium III 550
The results with the Pentium III 550E are similar to those of the Athlon 750. The biggest change is actually the fall of the TNT2 Ultra, although its not exactly clear why.
Once again, similar results to the Athlon 750 test bed - the Voodoo5 overtakes the low end GeForce models in 32-bit color as memory bandwidth begins to become a significant factor.
At 1024x768, the Voodoo5 5500 finally steps ahead of the two low end GeForce models in 16 and 32-bit color, while getting dangerously close to the performance of a GeForce DDR. Thanks raw fillrate for the lead in 16-bit color and memory bandwidth for the lead in 32-bit.
Quake III Performance - Pentium III 550E (HiRes)
Just like in the Athlon test bed, the Voodoo5 5500 overtakes the GeForce DDR by a healthy margin at 1280x1024 in 16 and 32-bit color. Only the GeForce 2 GTS can beat out the top of the line 3dfx card, and only by a measly 2.8 fps in 32-bit color. Raw fillrate keeps the GTS out in front by a large margin in 16-bit mode.
Nothing much changes at 1600x1200. The GTS's raw fillrate maintains the large lead in 16-bit, while memory bandwidth limitations keep them neck and neck in 32-bit color.