3Dfx Voodoo 5 5500 - Anandtech - 11/07/2000

  • Martin
  • Martin's Avatar Topic Author
  • Offline
  • Administrator
  • Administrator
More
6 years 3 months ago - 6 years 3 months ago #1 by Martin
on the 11/July/2000 published this review of the 3Dfx Voodoo 5 5500 by Mike Andrawes



Introduction
Last year's Fall Comdex in Las Vegas was the big debut of 3dfx's VSA-100 architecture used in the Voodoo 4/5 series of cards. 3dfx had kept everything under tight wraps before the announcement was finally made at the wax museum in the Venetian. After the launch, we published the following thoughts in our 3dfx Voodoo4 / Voodoo5 Comdex 99 Preview:
We all expected a fill rate monster out of 3dfx and the products they announced based on the VSA-100 managed to fulfill every last one of our expectations.
Of course, that was almost nine months ago and things have changed quite a bit since then. The "monster fill rate" of 1.33 gigapixels/s promised by the top of the line Voodoo5 6000 AGP has yet to actually appear anywhere - in the hands of consumers or reviewers alike. The Voodoo5 6000, originally scheduled to be released shortly after the 5500 model, has been delayed yet again with a new release date of summer 2000. 3dfx blames the situation on "component shortages," which is pretty vague. However, our sources tell us that the components that are in short supply are the VSA-100 chips themselves, which makes sense given the situation at hand. Similarly, the Voodoo4 4500 is not being launched at this time because 3dfx would rather divert the available VSA-100 chips to the higher margin Voodoo5 5500.

The Voodoo5 5500 AGP model is, however, available now and has delivered the promised 667 megapixel/s fillrate originally announced at Comdex. At the time of the announcement, 667 megapixels seemed massive with NVIDIA's just released GeForce pumping out 480 megapixels/s. But now, NVIDIA's top of the line GeForce 2 GTS cranks out a solid 800 megapixels/s. More importantly, however, the GeForce 2 GTS' texel fillrate is 1.6 gigatexels/s, which is more than double that of the Voodoo5 5500 and even the unreleased Voodoo5 6000, where pixel and texel fillrate are identical for the 3dfx cards. Times have certainly changed - there's no doubt about that.
But 3dfx still believes they have better features on their side, despite the lack of hardware T&L on their cards. The key is 3dfx's T-Buffer, named for its creator Gary Tarolli, that provides a number of special effects, including FSAA, motion blur, depth of field, and soft shadows. NVIDIA has jumped on the bandwagon with their own FSAA implementation but 3dfx argues their solution provides better image quality. We'll examine the FSAA image quality debate in an upcoming article focused specifically on that, but for now we'll briefly say that our initial impressions give 3dfx the edge here.
Funny how times have changed, with NVIDIA becoming the speed king and 3dfx the biggest proponent of image quality. It wasn't too long ago that NVIDIA was proclaiming 32-bit color as the next key feature for 3D accelerators, while 3dfx kept insisting that speed was king.
Had 3dfx hit their original target release of Fall 1999, the Voodoo5 would be competing with the GeForce SDR and GeForce DDR. In such a situation, the Voodoo5 would have the clear advantage from the standpoint of raw power. In terms of features, 3dfx wouldn't be hurting nearly as bad as they would be the only one with a reasonable FSAA implementation since the GeForce 256 is simply not powerful enough to perform FSAA at playable frame rates in most games. Although the Voodoo5 doesn't have T&L, titles taking advantage of T&L are just now appearing in the market - so while it would have made little difference in the Fall, it is a feature that is becoming increasingly desirable now.
Could have, would have, should have doesn't matter for today. What we have today is the Voodoo5 5500 that 3dfx needs to do reasonably well to keep them afloat until their next generation product. We previewed the Voodoo5 5500 back in April, but final silicon and drivers have finally arrived in the AnandTech lab and boards are now available after one last compatibility related delay. Let's take a look and see if the Voodoo5 5500 has a chance.

VSA-100

The VSA-100 is the first in a line of "Voodoo Scalable Architecture" chips and is the heart and soul of 3dfx’s Voodoo4 and Voodoo5 products. The word "Scalable" refers to the fact that multiple chips can be run in parallel, signifying the long awaited return of SLI to 3dfx's product line. There is support for up to 32-way SLI, which Quantum 3D will likely produce for extremely high-end applications. Each chip has its own 128-bit wide interface to memory, meaning that, unlike NVIDIA’s GeForce, the VSA-100 should not be limited by memory bandwidth. Although the memory is setup to be completely shared, texture data has to be repeated for each VSA-100 chip. This is typical of all current multi-chip solutions, such as 3dfx’s own Voodoo2 and ATI’s Rage Fury MAXX. However, the VSA-100 does have the advantage of FXT1/DXTC texture compression in hardware, which will minimize the amount of wasted space.
The SLI on the VSA-100 is actually a little bit different from the setup used on the Voodoo2 and is in fact quite a bit improved. The first change is that the number of scan lines that each chip renders can be adjusted from 1 to 128 lines depending on a number of factors, including number of chips, resolution, fillrate required, etc. This is something 3dfx will have to tweak for optimal performance. They have yet to determine whether this will be something fixed in the driver or if it will dynamically adjust depending on the load or the application involved. Since it is definitely software programmable, chances are that it will be a variable that end users will be able to tweak through the registry or any number of tweaking utilities.

The other big change is the SLI interface. On the Voodoo2, the interface was a multi-board, analog solution that required very strict timing control and identical boards. Furthermore, the analog signal was prone to degradation as it passed over a short cable inside the PC. VSA-100 features a 30-bit digital interface between the chips, which are all on one board this time. There are 28-bit data bits, a data valid bit, and a clock.
One advantage of the Voodoo2’s multi board SLI setup is that you could buy one Voodoo2 board to get started and add a second one at a later date to nearly double performance. Of course, this time that’s not possible as all chips must be on one board.



Compared to the Voodoo3, the VSA-100 adds support for 32-bit color rendering, 32-bit textures, 32/24-bit Z & W, and an 8-bit stencil buffer. Furthermore, the VSA-100 can also render two single-textured pixels per clock or one dual-textured pixel per clock. Support for 2048 x 2048 textures has now been implemented into the VSA-100, thus the VSA-100 offers essentially everything the Voodoo3 lacked and was criticized for tremendously.
The chip is an AGP 4X part, with support for AGP 2X, AGP 1X and PCI operating modes. In spite of this the VSA-100 does not support AGP texturing. 3dfx still feels that AGP texturing is not truly beneficial and thus there is no reason to pursue support for it with their products. The chip itself is composed of 14 million transistors, a little more than half the count of the original GeForce, and is manufactured on an enhanced 0.25-micron, 6-layer metal process. The "enhanced" 0.25-micron process just means that it takes advantage of shorter gate lengths, which allow for faster switching, thus allowing for higher frequencies and greater yields at those frequencies.
At the launch, 3dfx claimed that they would get better yields out of the tried and true 0.25 micron process than they would by moving to a 0.22 or 0.18 micron process like their competitors. Thanks to the long delays in getting the VSA-100 products to market, this strategy has more or less backfired on 3dfx, leaving them with a slower, hotter, more expensive chip. And to compound things, they're apparently not able to get enough chips out of their fab plant, TSMC in Taiwan, even though that's the same fab that NVIDIA uses.
The VSA-100 supports all T-Buffer effects, Full Screen Antialiasing, FXT1/DXTC texture compression and all of the other features 3dfx has been talking about for the past few months. For more information on those technologies read our in depth coverage of the T-Buffer here.
The VSA-100 supports anywhere from 4MB to 64MB of memory per chip, whose clock is synchronized with the core clock, just like the Voodoo3. The memory bus is 128-bits wide and will offer 2.7GB/s of memory bandwidth per chip. The excellent 350MHz RAMDAC of the Voodoo3 is carried over to the VSA-100, so 2D image quality is up there with the best.
From the above description the VSA-100 doesn’t appear to be much more than a Voodoo3 with support for a few new visual features and 32-bit color rendering support, but the chip’s support for up to 32-way SLI scalability (hence the name Voodoo Scalable Architecture) is what truly defines it and sets it apart from the Voodoo3.

The Card
The only VSA-100 based product currently available is the Voodoo5 5500 AGP, featuring two VSA-100 chips and a total of 64MB of SDRAM. The core and memory are both clocked at 166MHz. Because the two chips are working together in SLI mode, the 64MB of memory is split evenly between the two, and since they are essentially independent of one another, the textures in any scene must be duplicated in each set of 32MB of SDRAM. This means that if you have a scene with 10MB of textures, it occupies a total of 20MB of memory out of the 64MB on board since each chip requires those 10MB of textures to be available to it locally. Frame buffer memory is not duplicated between both banks, however, so the Voodoo5 effectively has more than 32MB of memory to work with but less than 64MB. The exact amount varies the resolution and color depth in use.


Each chip can render two single textured pixels per clock or one dual textured pixel per clock. This gives a single VSA-100 chip a fill rate of 333 megapixels per second when dealing with a single textured game, or 166 megapixels per second when running a dual textured game. For the Voodoo5 5500 AGP, this results in a fill rate of 667 megapixels per second for a single textured game or 333 megapixels per second for a dual textured game. Seemingly ages ago, when single textured games were the only things available, this sort of a fill rate made the most sense but as most of today’s games are dual textured, this sort of flexibility is not as useful as it once was. In fact, we once again see the roll reversal of 3dfx and NVIDIA - 3dfx always had the cards with the texel fill rate double the pixel fill rate, while NVIDIA always kept them equal. With the launch of the VSA-100 and the GeForce 2 GTS, those rolls are now reversed.
Each chip has its own 128-bit pathway to its 32MB of SDRAM, meaning that each chip has the bandwidth of an SDR GeForce 256. When put together, the card theoretically has about 5.3GB/s of available memory bandwidth, exactly as much bandwidth as a GeForce 2 GTS. While some of that bandwidth is wasted during texture upload, there is no real waste when rendering since each chip will only access the textures it needs.


So while 3dfx falls considerably behind in terms of raw (theoretical) fillrate, 3dfx still has a fighting chance as the NVIDIA cards are severely limited by memory bandwidth. Coincidentally, the Voodoo5 has the same memory bandwidth of the GeForce 2 GTS, 5.3GB/s, although they go about achieving it in different ways. Both manufacturers use a 128-bit data path running at 166 MHz to memory. This would result in 2.66GB/s of memory bandwidth, but both companies double this - 3dfx by using two independent paths to memory on two separate chips to transfer twice as much data at once, while NVIDIA uses DDR memory to transfer data twice per clock.
If NVIDIA is limited by this 5.3GB/s of bandwidth, it stands to reason that 3dfx could be limited similarly, and if the bottleneck is the same on both cards, it's possible the performance would be the same. Of course, all this theoretical talk is worthless if the results aren't borne out in the real world. We'll take a look at the real world performance in just a bit.
Our evaluation board featured eight 8MB 6ns SDRAM chips manufactured by Toshiba. The 6ns rating means that these chips should be able to work at 166MHz (which is what they’re clocked at) and not much higher. However, SDRAM chips are generally rated pretty liberally, meaning that a chip rated at 166MHz might be able to hit 183MHz. We'll take a more detailed look at overclocking in a moment as well.
Each VSA-100 chip is covered by a high quality AAVID heatsink and fan. Attached with a bit too much heatsink glue, the cooling on the Voodoo5 5500 is still quite good. To ensure that the dual VSA-100 chips get enough power, it draws its power from the +5V power rail of your power supply courtesy of the 4-pin power connector present on the board. The reason for the board’s incredible length is because all the components required to regulate the power supplied to the board must be present, instead of relying on the AGP slot to provide the power and the motherboard to regulate the power supplied.
Despite claimed support for AGP 4x in the VSA-100 chip, our board apparently does not support it. As we've shown in the past, AGP 4x offers virtually no performance advantage over AGP 2x, so it's not really a big deal, but still worth noting.


The beauty of the Voodoo5 5500 AGP as an evaluation sample is that, by disabling one of the chips, we essentially have a 32MB Voodoo4 4500 AGP card that we can also use to illustrate the performance we can expect out of that solution. As mentioned previously the Voodoo4 is not yet as available thanks to a component shortage.


Drivers
When we first previewed a preproduction Voodoo5 5500, we proclaimed that:
This is where the biggest performance improvement will lie, in the drivers. It is quite obvious that the drivers we were provided with weren’t optimized for performance across the board, since the performance of the Voodoo5 at lower resolutions such as 640 x 480 was sub-par.
We have definitely seen a slight improvement with the final drivers, the improvement is not as great as we had expected or hoped for. Overall we saw about 1-3 fps increase at most resolutions in Quake 3 and Unreal Tournament - less than 5% - and that can be partially attributed to final silicon. We did occasionally see drops in performance, but this was mainly at lower resolutions where it's not even noticeable.
We'd also like to address the much publicized "correct method" to install the Voodoo5 drivers that claim to increase performance up to three times. It was originally posted on 3dfxgamers.com's messageboards and discussed quite a bit in the AnandTech Forums as well. The problem appears when the card is identified as a "3dfx Voodoo Series," usually due to a previous 3dfx adapter in the system that was not uninstalled properly. If you did properly uninstall your old 3dfx card, or you never had one, this isn't an issue. When we originally tested the Voodoo5 for our preview, we used a cleanly formatted hard drive as we always do in our testing, so those performance numbers were not affected by this problem with the 3dfx installer. Needless to say, we've done the same thing in this final review.
There's also been some talk of refresh rates having a significant impact on performance. As with most recent cards we've tested, this is not the result that we have found in our testing. All our gaming benchmarks are performed at 60Hz with v-sync disabled.
Other than that, 3dfx tools remains relatively unchanged from the prerelease version. In order to obtain WHQL (Windows Hardware Quality Labs) certification, Microsoft does not allow vendors to include overclocking utilities or the ability to disable vsync. 3dfx's solution? Include a link in the drivers to download their overclocking utility that also adds check boxes to disable vsync. This way, 3dfx gets their WHQL certification to keep OEM's happy, while end-users get their tweaking utilities - that way everybody ends up happy :) Kudos for 3dfx here for not requiring registry hacks or 3rd party utilities.





The Test


Quake 3 Arena Performance - AMD Athlon 750


The GeForce line absolutely dominates the benchmark standings at 640x480, most likely due to their hardware T&L engine. Note the minimal performance drop when moving to 32-bit for the Voodoo5. At least at this res, it has plenty of memory bandwidth to go around. Interestingly, even the venerable TNT2 Ultra is able to beat out the Voodoo5 5500 at this low resolution, showing that the Voodoo5 drivers are probably still immature.



At 800x600, the Voodoo5's memory bandwidth begins to flex its muscle as its able to beat out the GeForce SDR and GeForce 2 MX in 32-bit color. The GeForce DDR is still faster, however, thanks to the combination of T&L and better drivers.


At 1024x768x16, the Voodoo5 5500 beats out everything but the DDR based GeForce models. It actually steps slightly ahead of the GeForce DDR in 32-bit color, once again thanks to better memory bandwidth.

At 1280x1024x32, we see the Voodoo5 approaching the performance of the GeForce 2 GTS. The similar performance levels we see here are likely due to the fact that these two cards have equal memory bandwidth. The GeForce 2 GTS still dominates in 16-bit color though thanks to good old raw fillrate.

Nothing much changes at 1600x1200 as the Voodoo5 still can't overtake the GeForce 2 GTS. The GeForce DDR is not too far behind.

Quake 3 Arena Performance - Intel Pentium III 550

The results with the Pentium III 550E are similar to those of the Athlon 750. The biggest change is actually the fall of the TNT2 Ultra, although its not exactly clear why.


Once again, similar results to the Athlon 750 test bed - the Voodoo5 overtakes the low end GeForce models in 32-bit color as memory bandwidth begins to become a significant factor.

At 1024x768, the Voodoo5 5500 finally steps ahead of the two low end GeForce models in 16 and 32-bit color, while getting dangerously close to the performance of a GeForce DDR. Thanks raw fillrate for the lead in 16-bit color and memory bandwidth for the lead in 32-bit.

Quake III Performance - Pentium III 550E (HiRes)

Just like in the Athlon test bed, the Voodoo5 5500 overtakes the GeForce DDR by a healthy margin at 1280x1024 in 16 and 32-bit color. Only the GeForce 2 GTS can beat out the top of the line 3dfx card, and only by a measly 2.8 fps in 32-bit color. Raw fillrate keeps the GTS out in front by a large margin in 16-bit mode.

Nothing much changes at 1600x1200. The GTS's raw fillrate maintains the large lead in 16-bit, while memory bandwidth limitations keep them neck and neck in 32-bit color.

Martin,
Hardware Hackers Team
Last edit: 6 years 3 months ago by Martin.

Please Log in or Create an account to join the conversation.

  • Martin
  • Martin's Avatar Topic Author
  • Offline
  • Administrator
  • Administrator
More
6 years 3 months ago #2 by Martin
Replied by Martin on topic 3Dfx Voodoo 5 5500 - Anandtech - 11/07/2000
Quake 3 Arena quaver.dem Performance - Athlon 750

The quaver demo is designed to show how a card handles a heavy texture load. Notice that even at 640x480x32, the cards without texture compression drop rapidly in performance.

The ATI cards are a perfect example of this. Apparently NVIDIA's texture management on the TNT2 is quite good, at least good enough to keep it performing better than most cards in 16 and 32-bit color. The Voodoo3 also does surprisingly well, but this is probably because it is limited to 256x256 textures, while the other cards have to deal with the larger textures found in the quaver demo.
The Voodoo5 is pretty far down the list, but it's 32-bit performance is still decent.



At 800x600, we see standings that look a bit more normal. All the cards without texture compression continue to fall off rapidly. The Voodoo5 continues on strong, slotting right between the GeForce 2 MX and GeForce SDR in 32-bit color.



The Voodoo5 5500 follows its previous pattern of moving up the standings as resolution goes up. At 1024x768, it's a fair amount slower than the GeForce 2 MX in 16-bit color but also quite a bit faster in 32-bit color, which is where quaver really stresses the cards. It's actually noticeably faster than a GeForce DDR in 32-bit color as well.



The trend continues at 1280x1024 with the Voodoo5 able to top the GeForce DDR now and approaches the GeForce 2 GTS in 32-bit color. Cards without texture compression really hit the wall in 32-bit color at this higher resolution as the space left for textures is quite small.



UnrealTournament Performance
Quake III Arena is still the best gaming benchmark because it scales properly with CPU speed as well as the resolution it is run at. It also implements most of the features that upcoming games (first person shooters) will be using and thus provides an excellent metric for card performance under Quake III Arena, as well as the performance of the card in general.
Unfortunately, there is no Direct3D equivalent of Quake III Arena in terms of a good benchmark, as UnrealTournament, while it is a great game, is a horrible benchmark. Results in UnrealTournament vary greatly and the game does not scale very well with CPU speed or with resolution. We included benchmarks using our own UnrealTournament benchmark, but the results aren’t nearly as reliable as those from Quake III Arena.

In general, the performance of UnrealTournament on a system is just fine with a TNT2/Voodoo3 at resolutions of 1024 x 768 x 16 and below; once you get above that mark, you begin to hit the fill rate limitations of the TNT2/Voodoo3.
In the end, the benchmarks you should pay the most attention to are the Quake III Arena benchmarks, because those say the most about the performance of the card. If you’re a big UT fan, you should be fine with something that’s around TNT2 speed as long as you’re going to keep the resolution below 1024 x 768. If you go above that, you’ll need something that has a higher fill rate than a TNT2 (i.e. GeForce or Voodoo4/5). If you’re going to draw any conclusions from the UnrealTournament benchmarks, be sure to pay the most attention to the scores above 800 x 600 because the game is limited by more than one factor at lower resolutions.
It should be noted that the Unreal Tournament scores are not only useful for looking at UT performance, but also for games based on the UT engine, such as the recently released Deus-Ex.
While UnrealTournament does offer native support for Glide, we refrained from testing the Voodoo5 in Glide. Why? Have a look at the performance numbers taken from a Voodoo5 running UnrealTournament in Glide vs Direct3D:



The first thing to notice is that there is no performance difference between 16 and 32-bit color when running in Glide, this is most likely due to UnrealTournament not allowing 32-bit color/textures when running in Glide, especially since the scores were perfectly identical between 16 and 32-bit color modes.
Secondly, as the resolution increases, the performance of the Voodoo5 running in Glide mode drops below that of Direct3D indicating that it wasn't meant to be run at such high resolutions, which is one possible explanation.
Regardless, the UnrealTournament scores were already difficult to explain because of the number of limitations acting on the UnrealTournament engine (look back at our reasons that UnrealTournament isn't a good benchmark) and adding Glide scores wouldn't do much good other than adding two more lines to the graphs.


While all the cards are perform pretty close to each other, only the Voodoo5 5500 is really able to stand out from the pack. In 32-bit color, it's faster than everything else out there in 16-bit. The results are more or less the same as resolutions below 1024x768, although the older cards do take a larger performance hit due to fillrate limitations.

Once again, the Voodoo5 5500 is out on top, with it's 1280x1024x16 score faster than most cards at 1024x768x16. In 32-bit color, the Voodoo5 is head and shoulders above the rest, over 30% ahead of the nearest competitor, the GeForce 2 GTS.

At 1600x1200 the Voodoo5 5500 takes a sudden dive in performance. It's actually running at the same speed as the Voodoo4, despite having twice the fillrate and twice the memory bandwidth. It's not clear exactly what's going on here, but it could be driver immaturity cropping up again.

UnrealTournament Performance - Intel Pentium III 550E





The results for the Pentium III 550E pretty much mirror those under the Athlon 750.

Overclocking
Overclocking the Voodoo5 5500 is a bit different than other cards for two reasons - first, you're dealing with trying to overclock two chips at once With a dual chip solution, such as the 5500, the core on both chips must be able to achieve the desired speed for a successful overclock to occur. Thus, your chance of getting a poor overclocking chip is doubled. Granted, your chance of getting a chip good at overclocking is also doubled, but you're limited by the weaker chip. Further, you have interactions between two chips now, which can complicate things further. It's just like trying to overclock an dual CPU system - it's always trickier when you introduce more factors.
The next big problem is that the core and memory clock speeds are synchronous, meaning they run at the same frequency. Once again, this makes the weak link the limiting factor. For example - say your core is capable of hitting 183 MHz, but your memory only runs at 170 MHz, you'll be stuck at 170 MHz for both core and memory clocks. The same goes if the memory is able to overclock further than the core, you'll be limited by the core clock.


By no means were these potential limitations going to stop us from pushing our evaluation sample to the max. As mentioned previously, 3dfx was kind enough to include in their drivers a download link to their overclocking utility. Once installed, a new tab in 3dfx tools appeared, not surprisingly labeled "3dfx Overclock." From there you just adjust the slider to the desired speed and press OK, at which point it's necessary to reboot the system. After rebooting, a dialog box pops up letting you know that you need to click a "confirm" button on the 3dfx Overclock driver tab if you want the settings to be applied every time you boot. This is designed as a safety measure to ensure that you are always able to get your system up and running, even if you overclock too far. For that reason, make sure you thoroughly test your settings before clicking that "confirm" button.
Whatever speed you select, a warning pops up stating that overclocking could damage your card and 3dfx recommends not overclocking more than 10% - basically the standard warnings you get from everyone about overclocking. Interestingly enough, we were only able to push our card almost exactly 10% above the default 166 MHz clock to 183 MHz. Pushing beyond 183 MHz caused screen corruption in 2D mode immediately after booting up, suggesting that heat is probably not the problem. Of course, results will vary from card to card, as well as by RAM type (we had Toshiba 6ns chips on our board).

Martin,
Hardware Hackers Team

Please Log in or Create an account to join the conversation.

  • Martin
  • Martin's Avatar Topic Author
  • Offline
  • Administrator
  • Administrator
More
6 years 3 months ago #3 by Martin
Replied by Martin on topic 3Dfx Voodoo 5 5500 - Anandtech - 11/07/2000
Overclocked Performance - Quake III Arena


nterestingly enough, even at low resolutions overclocking the Voodoo5 results in a noticeable increase in performance, albeit a small one. Generally at these low resolutions, cards are CPU or driver limited, so we don't usually expect any performance improvements from overclocking.


At 800x600 and above, we see healthy improvements of up to 5-6 fps. Not a ton, but not bad for a 10% overclock. At 1024x768x32, the overclocked Voodoo5 5500 is able to edge out the stock GeForce DDR by a noticeable margin.


At 1600x1200, we see that the Voodoo5 5500 is entirely fillrate bound since a 10% increase in clock speed results in a direct 10% improvement in performance.


Overclocked Performance - Unreal Tournament




UnrealTournament offers too many bottlenecks on a system and generally prevents overclocking from having a significant effect on performance. At resolutions below 1024x768, we observed no difference in performance at all between our 183 MHz overclocked settings and the stock settings. At 1024x768 and above, we see the effects of fillrate limitations kicking in, especially in 32-bit color. This is where overclocking pays off the most of course, although it's still only a few fps.

Full Scene Anti-Aliasing (FSAA)
Before we dive into the FSAA support of the Voodoo5 we tested, let's take a look at what FSAA is and how it is accomplished on the Voodoo5 courtesy of its "T-Buffer"
We've all probably seen aliasing rear it's ugly head, even if you don't use any 3D graphics and that's because it's a problem even in the world of 2D computer graphics. This can be seen in the "jaggies" found in computer graphics around diagonal lines and round edges as shown below. This is usually what comes to mind when thinking about aliasing issues, and it's just as much of a problem in the 3D world, if not more. To get technical, this is known as spatial aliasing, where, as the name implies, the problem occurs in space.



Anti-aliasing is a technique that removes these "jaggies" by filling in with intermediate shades to smooth things out. This is relatively easy to implement in 2D and is even available from Windows 98 display properties for screen fonts. But in 3D, things become exponentially more complicated and no consumer solution can implement true anti-aliasing in hardware. Further, in 3D there's the additional problem of pixel popping certain distant objects end up being less than a pixel wide on screen and are sometimes shown, but other times not. This is known as pixel "popping," and is a potentially larger problem than just "jaggies."




Many cards claim support for anti-aliasing by implementing "edge" anti-aliasing or anti-aliasing through "oversampling." Edge anti-aliasing is accomplished by tagging which polygons are an edge and then going back and letting the CPU perform anti-aliasing on these edges after the scene is rendered. In order for a game to support this, it has to be designed with this in mind as the edges have to be tagged. The extra steps cause serious latency issues and sucks up all the CPU power.
Oversampling is simply rendering a scene at a higher resolution than the final output and then scaling it down. This technique is implemented by the PowerVR architecture and NVIDIA's GeForce line (using 5.xx Detonator Drivers). Only the GeForce 2 GTS really has enough power to take advantage of oversampled FSAA at reasonable resolutions and frame rates. If you listen to 3dfx, oversampling does not provide the same level of image quality that the T-Buffer does, and we tend to agree. The T-Buffer provides true full scene anti-aliasing that solves both pixel popping and jaggies. Perhaps the best thing about the T-Buffer is that it is simply turned on in the driver and is then automatically applied to any game ever written for any API. As a complete hardware solution, there is no software or driver overhead.

FSAA Performance
The one thing that 3dfx failed to mention when they first started talking about FSAA was the performance hit. The Voodoo5 offers two forms of FSAA, 2 sample and 4 sample (the Voodoo4 will only offer 2 sample FSAA for performance reasons). The 2 sample FSAA essentially renders the scene twice and blends the two scenes in order to remove some of the "jaggies" while 4 sample FSAA renders the scene four times in order to remove most of the "jaggies" present in that particular scene.
Because it is simply re-rendering a scene x-number of times, 2 sample FSAA will reduce the fill rate to 1/2 of what it was without FSAA enabled and 4 sample FSAA will reduce the fill rate to 1/4 of what it was without FSAA. Now if you're running a game at a resolution that isn't hitting the peak fill rate of the Voodoo5, then the performance hit caused by moving to 2 or 4 sample FSAA should be noticeable but will still allow your game to play smoothly.

Racing and Flight Simulator games would fall into this category since they generally aren't fill rate limited and aren't that demanding on the fill rate of a video card.
However if you have a game that is beginning to expose the fill rate limitations of your card, such as Quake III Arena, then enabling 2 or 4 sample FSAA could be deadly to your frame rate and render your game virtually unplayable.
We chose Quake III Arena's demo001 benchmark in order to illustrate the worst case scenario effects of enabling 2 and 4 sample FSAA and what kind of performance hit you'll be taking because of it. Keep in mind that a first person shooter isn't the best place for FSAA, but because a first person shooter is also the most fill rate demanding type of game it represents the largest performance hit you can expect when enabling 2/4 sample FSAA.



As you can see, the performance hit is incredible in Quake III Arena, making enabling FSAA not too realistic of an option for Quake III or any other first person shooters. At the same time, FSAA doesn't really make that big of a difference in first person shooters because the action is at such a fast paced level that you don't really have the time to notice whether or not the bloody steps you're walking on are jagged or not.
Switching to 32-bit color only worsened the performance hit caused by 2/4 sample FSAA. The tests wouldn't even complete in 4 sample FSAA mode at some of the higher resolutions in 32-bit color mode.
The games that FSAA truly shines in don't need a video card with a 667MP/s fill rate, they don't even need a card with a 480MP/s fill rate, they run just fine on something like a Voodoo3 or a TNT2 which makes the performance hit caused by 2 or 4 sample FSAA much easier to bear.
We played Need for Speed 5: Porsche Unleashed at 800 x 600 with 4 sample FSAA enabled at very reasonable frame rates. We estimated the performance at 60 - 80fps at 800 x 600 x 16 with 4 sample FSAA enabled. Unfortunately, moving over to 32-bit color dropped the performance noticeably, we estimated it at around 25 - 45fps at 800 x 600 x 32.
Is it worth it? We think so, at least in games like NFS5, but because the final decision is up to you, we put together some screenshot comparisons for you all to have a look at.

At 16-bit color the GeForce2 GTS's 1.5 x 1.5 setting is faster than the Voodoo5 5500 with FSAA off, however the move to 32-bit color hurts the GeForce2 GTS quite a bit. The Voodoo5 5500's 4X FSAA is the slowest setting out of the bunch but depending on your perspective it is potentially the best looking. 3dfx would like to claim that their 2X FSAA is better than NVIDIA's 2x2 FSAA in terms of image quality, and these numbers show its faster as well. We'll have a more in depth comparison of the GeForce 2 GTS FSAA and the Voodoo5's FSAA in an upcoming FSAA comparison article.

FSAA Image Quality in OpenGL
The most important metric with which to gauge FSAA "performance" is with the resulting image quality. For that, let's head to the screen shots.
All screenshots were taken using Hypersnap-DX available at www.hyperionics.com

4 Sample FSAA - Voodoo5 5500


FSAA Setting 2 (highest quality) - GeForce2 GTS


While looking a bit more washed out than the GeForce2 GTS (because of the default MIPMap LOD settings that we'll discuss later), the Voodoo5 does seem to have an advantage when it comes to FSAA but it is just a slight one under OpenGL. Although some will argue that at the middle and highest quality settings (1 & 2), the GeForce2 GTS looks just as good as the Voodoo5 5500 with FSAA enabled.
FSAA in OpenGL doesn't matter nearly as much as FSAA in Direct3D, which we will get to shortly.

Level of Detail (LOD)
A number of readers and websites have noticed how 3dfx cards have a tendency to look a bit blurry and/or washed out, regardless of resolution and now color depth. In the Voodoo3 days, this was partially blamed on the Voodoo3's 256x256 texture size limitation, it's 16-bit rendering, and the post frame buffer filter that 3dfx applies to all their 16-bit rendering to get their "22-bit equivalent" color. The Voodoo5 eliminated all of these limitations, but we still had blurry / washed out image quality. Thus, some investigation was necessary.
The guys that brought us the Quake 3 Quaver demo / benchmark, The Reverend's Pulpit, was one of the first sites to scoop the info on the 3dfx Level of Detail (LOD) settings. The secret is LOD Bias, which is apparently set differently by default on other manufacturers cards. LOD Bias essentially compensates for textures that would otherwise be under or over sampled. Mipmapped textures can be made sharper with a negative bias or blurrier with a positive bias. Unfortunately, by increasing the LOD (with a negative value), aliasing artifacts and texture shimmering in the distances is exacerbated such that it is virtually unbearable at the most detailed settings. The effect is not illustrated too well with screenshots, but it is very obvious as you are moving through the game. Of course, FSAA is the solution to that little problem.



The Reverend was able to get a hold of a "special .inf install file" from 3dfx that enables a LOD slider in the 3dfx Tools driver utility. Supposedly, the LOD slider will be included by default in future 3dfx driver releases, but we have no word on when exactly that will happen. Voodoo-Now has released the appropriate registry patch to enable the slider in the shipping driver for those that can't wait. There are a total of 17 settings, -8 through +8 with 0 the default. Of course, the -8 setting is the slowest and 3dfx believes that 0 is a good balance between performance and quality. At a LOD of -8, you almost have to run at 4X FSAA to eliminate all the texture shimmering. The secret is to achieve a good balance between LOD Bias and FSAA samples while maintaining an acceptable level of performance and image quality. As usual, it's easier said than done.
Just what type of performance hit are we talking about and where is that sweet spot? Let's take a look.

The performance hit that results from a higher quality LOD is pretty small, regardless of color depth or FSAA settings. So for those already running in 4X FSAA mode, we highly recommend enabling and using either the -8 or -4 LOD settings. Users running in 2X FSAA mode should probably stick to the -4 or -2 LOD settings. Without FSAA enabled at all, the only really useable setting is the -2, which helps image quality a bit. By the same token, the performance gained by reducing the image quality (positive LOD bias) is also minimal and simply not worth the excessively blurry image quality that results.

Conclusion
In most cases, the Voodoo5 5500 and GeForce DDR are neck and neck, while the Voodoo4 4500 is having quite a bit of difficulty keeping up with a GeForce SDR or GeForce2 MX. At higher resolutions, we do see the Voodoo5's raw fillrate allow it to surpass the GeForce DDR. Of course the raw power of the GeForce 2 GTS keeps NVIDIA in the lead in the vast majority of the benchmarks. To add insult to injury, the GeForce 2 GTS is actually cheaper than the Voodoo5 5500 with a bit of looking. Along the same lines, the GeForce 2 MX beats out the Voodoo4 4500 in most benchmarks and is cheaper as well. The exception to all this is Unreal Tournament and games based on its engine, where 3dfx dominates the standings, even with the old Voodoo3.
FSAA is where the Voodoo5 5500 really shines - it's performance is close to that of the GeForce 2 GTS, but the image quality is noticeably better, at least in our opinion. Whether you need the FSAA effects or not is entirely up to the individual gamer to decide on. If the focus of your game play is first person shooters where high frame rate is critical and there's little time to notice eye candy, then you probably won't get any real benefit from the FSAA support of the Voodoo5. On the other hand, if you're really into racing games or flight simulators where frame rate is less critical and there's more time to take in the visuals, then FSAA definitely comes in very useful.

The argument that running a game at 1280 x 1024 or 1600 x 1200 looks just as good as 2 or 4 sample FSAA is not entirely true. While a game running at 800 x 600 with 2 sample FSAA may remove just as many jagged edges as running it at 1600 x 1200, there is a clear difference between running at 1600 x 1200 without FSAA enabled and at 800 x 600 with 4 sample FSAA enabled. It is really a subjective question but the difference is noticeable, whether it is worth the performance penalty is an entirely different question all together.
It basically comes down to this - if you want the fastest frame rates possible, go with the GeForce 2 GTS. For the best price / performance ratio, the current leader appears to be the GeForce 2 MX. Once again, the trump card that 3dfx currently holds is their higher quality FSAA implementation. Let's hope that is enough to allow the Voodoo5 5500 to carry them over until the release of the Voodoo5 6000 and/or their next generation product. That next generation product will be the key to 3dfx's future in the 3D accelerator market - they must have it out in time to compete with the NVIDIA NV20, rumored to be coming this fall, and it must match the NV20's performance.

Martin,
Hardware Hackers Team

Please Log in or Create an account to join the conversation.

Time to create page: 0.217 seconds

About Hardware Hackers

Hardware Hackers, Modifying For The Greater Good! Follow us while we take you on a journey of custom computer modifying mayhem. We are available on Facebook & YouTube see our links below.

Why not join in the fun at our community forum? Visit forum or Create an account.

Subscribe For Updates

Subscribe to get our latest updates straight to your inbox and stay updated on the latest developments from the HwH Team!