Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
HP.com home

Scalable interactive ray-casting of volumes using off-the-shelf components

White paper: Advanced visualization collaboration
» 

Collaboration & Competency Network

Site information

» Join this collaboration
» Contact us

Collaborations

» Accelerator
» Advanced visualization
» Computational & data grids
» Global File System (SFS)
» HPC system management
» Related links

 

HP CCN network

Content starts here
   
Authors:

Santiago Lombeyda1 Mark Shand3
Laurent Moll 2 David Breen 1 Alan Heirich 2

1 California Institute of Technology
2 Formerly with Compaq Computer Corporation
3 Hewlett-Packard Corporation

   
 

Abstract

   
 

This paper describes an application of the second generation Sepia architecture implementation (Sepia-2) to support interactive visualization of scalar fields represented as very large 3D rectilinear grids. By employing pipelined sort-last associative blending operators a demonstration system yields scalable interactivity at 30 frames per second. We believe these results can be extended to support other types of structured and unstructured grids and a variety of GL rendering techniques.

We show how to extend our single-stage demonstration system to larger multi-stage networks. This requires solving a dynamic mapping problem for blending operators that are similar to Porter-Duff compositing operators. We conclude by discussing technical and fundamental issues to address in future work.

   
Skip links for this subsection
Advanced visualization
» Overview
» Member benefits
» Engagement model
» Resources
» Members
» HP contacts
» Join this collaboration
» Contact this collaboration
News
  » HPCwire's 2007 Readers' and Editors' Choice Awards for HP Scalable Visualization Array and The Parallel Compositing Library
  » Parallel Compositing Library Overview from SC07
  » HP Expands Open Source Offerings
  » Parallel Compositing Library @ SourceForge
  » Presentations from SC06 “Visualization Using Linux Clusters” BOF
  » Learn more about HP's Parallel Compositing Sample Implementation and specification
  » Budapest University of Technology and Economics
 
HP CCN annual meeting
Subsection links finish here
     

1. Introduction

In Heirich and Moll [hei99] a commodity-based architecture was presented for scalable visualization in distributed systems. This architecture allows large numbers of rendering servers to post images on rectilinear tiles of a large display, and to apply compositing, blending, and other pixel-level operators to those displayed images. This paper reports on the application of a second generation implementation (Sepia-2) of this architecture to structured volume visualization. The goal of this current work is to demonstrate that large volume datasets (3D rectilinear grids of scalars) can be interactively rendered with a networked configuration of off-the-shelf components. Volume rendering is performed by a proprietary ray-casting engine, the VolumePro 500 [pfi99]. This engine implements object-order ray casting using a shear-warp factorization of the viewing matrix [lac94].

We present an algorithm for concurrent subvolume rendering followed by pipelined view-dependent tiling and blending (Figure 1). This algorithm requires dynamic view-dependent remapping of the blending pipeline on each frame, similar to requirements posed by Porter-Duff compositing in XFree86 [por84]. We demonstrate how to map the algorithm onto a large multi-stage network configuration, consisting of PCs with blending/compositing (Sepia-2) boards and ray-casting engines, connected by a high speed network, and utilizing a GL-based graphics board for display (Figure 2). Our test configuration demonstrates interactive rendering at 30 frames per second which is the maximum frame rate of the ray-casting engine. Extrapolations from a small demonstration cluster suggest that interactivity will be sustainable in clusters of 1024 servers and larger.

The Sepia architecture is platform neutral beyond the unavoidable obligation to support device drivers and programming infrastructure. It is being prototyped for commercial use in various kinds of computing platforms. These range from clusters of workstations or PCs containing Intel CPUs, to clusters of Alpha servers, and large NUMA multiprocessor systems with hundreds of AGP graphics accelerators and Alpha CPUs. All of these environments support Linux and/or Unix. The configuration presented in this paper uses Pentium-III graphics workstations running Microsoft Windows NT. By clustering next generation ray casting engines we expect to visualize 3D gigavoxel data sets interactively using as few as 8 of these nodes.

2. Related work

A number of projects are exploring hardware, software, and algorithms for scalable rendering using commodity components [sto01,bla00,eld00,sam00,hum00,hei99]. The Sepia architecture is probably most similar to the Lightning-2 project [sto01]. Both architectures distribute image pixels across a network with switching between source and display. Sepia switches at the level of tiles, Lightning-2 at the level of scanline fragments. Unlike Lightning-2 Sepia’s approach does not produce visible artifacts and can support non-commutative operators, of which Porter-Duff operators are an example. The Metabuffer [bla00] is a more feature-rich approach to building a scalable display subsystem, with support for independent viewports that span tile boundaries and have independent multi-resolution scaling. Like Lightning-2 the Metabuffer is built on a half duplex mesh topology. In this paper we demonstrate how to implement pipelined blending operations in the Sepia architecture using a full-duplex multi-stage network. All three of these projects (Lightning-2, Metabuffer, Sepia) transfer pixels over a network and are complementary to coder-decoder approaches like WireGL [hum00] that use the network to distribute graphical primitives. Both of these architectural approaches are bandwidth limited in current generation components but will become practical in high bandwidth commodity technologies like HyperTransport and InfiniBand.

According to the taxonomy introduced by Molnar [mol94] the Sepia architecture is compatible with sort-first or sort-last approaches to building graphics systems. Our main interest has been in sort-last approaches due to their scaling advantages. The state of the art in algorithm development for both approaches is represented by companion papers in these symposium proceedings.

Figure 1
Figure 1. Shear-warp volume rendering (top), parallel clustering (bottom, left), Sepia-2 block diagram (bottom, right). Object-order ray casting provides optimal data throughput but results in a distorted image when the view vector is not axis-aligned with the data volume (top). In standard shear-warp rendering the distorted Base Plane (BP) is corrected by a 2D warp on an OpenGL accelerator. This approach can be applied in parallel with concurrent subvolume rendering, followed by blending and tiling (bottom, left) and then a final 2D warp. The result of tiling a set of BPs is a Superimage Base Plane (SBP). Blending is performed by the Sepia-2 card (bottom right) which merges a 30 Hz series of locally computed SBPs with SBPs arriving and departing through the network interface.

3. TeraVoxel rendering with shear-warp viewing factorization

Our scientific interest in this problem is driven by two projects in large data visualization funded by the National Science Foundation. The TeraVoxel project is constructing an electronic flow visualization system to capture snapshots of physical fluid experiments in real time followed by interactive visualization and exploration. The Kilo-Frame per Second (KFS) camera collects regularly gridded 1024 2 cross sectional samples of a fluid pressure field, corresponding to data a rate of one gigavoxel per second. Our current goal is to visualize one second (1024 samples) of this data interactively. Another scientific interest, and a focus of the Large Data Visualization initiative, is interactive rendering for full-resolution MRI data sets. Current MRI data sets of one quarter to one half gigavoxel are typically rendered as 256 3 reduced data sets. Both MRI systems and the KFS camera produce rectilinear data sets and can be supported by this demonstration.

Object-order ray casting makes efficient use of general purpose CPUs by organizing the ray casting computation for optimal data access. The same principle applies to a dedicated hardware ray casting engine like the VolumePro. In both cases a shear-warp factorization of the viewing matrix allows efficient image generation regardless of viewpoint [lac94]. Leaving aside issues of scaling and axis reordering, the shear matrix S defines a hexagonal projection of the volumetric data onto a volumetric axis-aligned Base Plane (BP). The warp matrix W corrects distortion in the Base Plane image that results from misalignment with the image plane, and is defined in terms of the standard modelview matrix M as W = M x S –1 . This warp is a 2D operation and is usually computed by mapping the BP image onto a polygon as an OpenGL texture.

Parallelizing this algorithm is straightforward. Data is mapped statically onto rendering servers (computers equipped with VolumePro cards) which all generate images simultaneously. When the images are ready they are merged in a view-dependent order using associative blending operators. The final tiled and blended result is corrected by the 2D warp just as in the serial algorithm.

3.1 Blending arithmetic

The ray casting computation blends the successive contributions of an ordered set of voxels that intersect a line of sight. If V(acc, s) corresponds to a blending operator that combines a new voxel sample s with a total accumulated result accT then the result of a front to back ray casting computation is:

Equation 1

where s1... sn are successive voxel samples and 0 represents the contribution from a transparent voxel. VolumePro [pfi99] implements the following front-to-back blending operator V for all color channels C (where α is the opacity channel, Cacc the accumulated color, αs the opacity of sample s, etc.)

Equation 2

Parallelization breaks (1) into two or more concurrent subcomputations, for example: acc1 = V(V(...V(V(0,s1),s2),...),sn/2) and acc2 = V(V(...V(0,sn/2+1),sn/2+2),...),sn). The blending computation in the Sepia-2 firmware must compute a function F such that F(acc1, acc2) = accT. In addition this F must be associative (in order to scale above two) with an identity element 0 that corresponds to a transparent pixel. We use the standard approach of pre-multiplication by α and define a function P on pixels s

Equation 3

We can easily verify that the following F is associative with identity element 0 (a transparent black voxel) and further that V(acc, s) = F(acc, P(s)).

Equation 4

We conclude that F(acc1, acc2) gives the same arithmetic result as the original ray casting computation.

Equation 4

We note that our computations use the 8 bit pixel values from the BP, but that VolumePro uses internal 12 bit blending. As a result our F will produce more rounding error than the original accT. This will affect the accuracy of the result at large scales.

3.2 Dynamic mapping for pipelines in full duplex multi-stage networks

Any distributed algorithm for volume visualization must solve a problem of routing pixel data according to a dynamically changing viewpoint. A less stringent form of this problem also occurs in supporting Porter-Duff compositing for XFree86. Blending operators are non-commutative and must be computed along a line of sight. When data is mapped statically onto computers, as it is in our system, the blending order will change as the viewing direction changes. In the Sepia architecture blending requires solving a Hamiltonian circuit problem on each frame: given the current viewing direction, find a contention-free mapping of the blending pipeline onto the network that visits every Sepia-2 card once and only once.

The network technologies that we work with (VIA and Infiniband) implement full duplex destination-based routing. Network switches are programmed with routing tables, lookup functions that provide the outgoing local port address for each network packet as a deterministic function of the destination address contained in the packet header. This enforces a condition that only a single path may exist between any source and destination pair, although aliasing relaxes this condition to a finite number of paths. A single routing table must be defined to support every permutation of node orderings. This contrasts with other technologies (e.g. Myrinet) that implement source routing, where every packet is prefixed with instructions for routing it through the network.

In a single-stage network the mapping problem is solved trivially. Any full bisection crossbar allows the full set of permutations of nodes 1...n such that each node i connects to node i+1 by a contention-free path, with node n identified arbitrarily with the display server. The resulting connectivity graph is a star, in which the Hamiltonian circuit is trivially embeddable.

We will solve the mapping problem on three-stage symmetric Clos networks [clo53] which are isomorphic to two-stage fully interconnected full duplex networks with identified inputs and outputs. The Slepian-Duguid theorem [sle90] defines conditions for contention-free mapping in a Clos network. A set of contention-free routing tables exists if the number of switches in stage two is greater than or equal to the number of links connecting each switch in stage two to switches in stage one. The theorem provides a constructive algorithm for generating a set of contention-free routing tables under these conditions.

In order to solve our Hamiltonian circuit problem in a full duplex network we note that the solution is equivalent to solutions of two concatenated Clos mapping problems, one in the forward direction and another in the reverse direction, if the outputs of the first problem are identified with the inputs of the second problem. The union of the routing table solutions for these two problems is a set of routing tables (possibly after aliasing) that allow any Hamiltonian circuit to be embedded without contention. This provides a constructive solution to mapping in two-stage fully interconnected full duplex networks using switches with any number of ports. This result can be extended inductively to as many stages as may be required by any size configuration.

4. Demonstration system

The algorithm for using the VolumePro in parallel, and a Sepia-2 block diagram, are illustrated in Figure 1. The data volume is distributed to a set of rendering servers which all generate images of their local subvolumes. Each of these images is acquired by a Sepia-2 card, which blends it in pipeline order with an intermediate blended result from the preceding Sepia-2 card. The fully blended result is delivered by a Sepia-2 card into the memory of a display server which applies the final warp and display (Figures 2 and 3).

The VolumePro ray casting engine generates a pre-warped image in the form of a Base Plane (BP). The BP is a smaller version of a tiled image known as the Superimage Base Plane (BP). Application software controls the VolumePro, converts each resulting BP into a larger SBP, and then controls the blending computation in the Sepia-2 card. Figure 1 illustrates the distorted BP (top, right) and it’s place within a larger tiled SBP (bottom, left). The VolumePro delivers a 512 x 512 BP containing RGBA pixels into memory, within which exists the hexagonal sheared image of a 256 3 subvolume. Software translates this BP by a view dependent vector within a larger tiled 1024 x 1024 SBP and also cleans up the BP by removing certain artifacts. This view dependent vector takes one of twenty-four possible values which represent all possible viewing orientations. The viewing orientations are modeled with respect to six cube faces, and four possible planar orientations (left, right, down, up) with respect to each face, resulting in twenty-four combinations (six faces times four orientations).

Figure 2 Figure 2. Demonstration cluster configuration, 8 rendering servers plus 1 display server. Every rendering server contains a VolumePro ray casting engine and a Sepia-2 card. BPs are delivered at 30 frames per second by the VolumePro card. These are converted in host memory to SBPs and then acquired by the Sepia-2 card. The pipelined result of blending a series of SBPs is delivered by another Sepia-2 card into the host memory of the display server. From here the final SBP is displayed by an OpenGL accelerator.

Our demonstration cluster is depicted schematically in Figure 2 and photographically in Figure 4. We used off-the-shelf graphics workstations (Compaq SP750) containing Intel i840 motherboard chipsets and Pentium-III Xeon CPUs. The i840 supports dual PCI busses and multiple banks of Rambus memory with peak aggregate bandwidth of 3.2 GB/s, in addition to an AGP-4x graphics interface. The VolumePro occupies a slot in a PCI bus (which may be either 33 or 66 MHz and 32 or 64 bits). To minimize interference the Sepia-2 card is in a 66 MHz 64 bit PCI bus, and the VolumePro is in a separate 33 MHz 32 bit bus.

Once the SBP is prepared software calculates the position of the rendering server within the pipelined blending sequence using a left-right, top-bottom, front-back ordering. This calculation is performed locally based on the current view and information about volume and subvolume partitioning. This gives the pipeline ordering, which is converted into a (statically defined) network address of a Sepia-2 card. This conversion from pipeline position to network address implements the dynamic pipeline remapping required by the blending operators. Software then initiates the SBP blending by the Sepia-2 card. Blending is pipelined, with the first pixels of the final blended result written to display server memory before the last pixels of the local unblended SBP have been acquired from the host memory on the rendering server. The final blended result is an SBP that is delivered into the memory of a display server where it is warped and displayed by an OpenGL accelerator (Figure 2).

In order to render at 30 frames per second it is necessary to overlap the VolumePro BP generation with the subsequent steps of SBP construction, blending, and display (Figure 3). Each of these processing stages must be no longer than 33 ms in order to sustain a continuous stream of BPs coming from the VolumePro. The result is up to four frames of interactive latency, from the time a viewpoint change is communicated to the VolumePros, until the time the result is seen on the display.

Figure 3 illustrates the flow of data through the system on each frame and the associated stages of processing. The VolumePro writes a new BP to host memory 30 times per second. A BP consists of one megabye of data (512 x 512 x RGBA). This stage is dominated by the computational time through the VolumePro and may require a full 1/30 second. When the BP is available it is converted in memory to an SBP, blended by the Sepia-2 cards, and delivered to the display server. These four stages occur sequentially with successive frames overlapped in time as shown in Figure 3 (right). The latency of the display stage can vary depending on AGP performance and monitor refresh rate, from a few milliseconds to a full frame.

Figure 2 Figure 3. Data flow and latency analysis for a single rendering server and single display server. The analysis for larger configurations is based on this simplified model. Left, the VolumePro writes a BP to host memory where it is converted by software to an SBP. The Sepia-2 card reads the SBP into the network where it is blended in transit. At the display server a Sepia-2 card writes the blended result SBP to host memory. Host software on the display server loads the SBP into GL texture memory for final warp and display. Right, a timing figure shows the overlapped stages of image generation, SBP construction, acquisition and blending, and final display. The interactive latency is dominated by the time through these overlapped stages regardless of node count, and is approximately four frames prior to optimizations.

5. Results

At the time this paper is written our demonstration cluster demonstrates 30 frames per second operation but with errors. The cluster runs without errors at 20 frames per second. We have identified the sources of the errors in the application software that converts the BP into the SBP. We are currently rewriting the critical software stages and believe that our final paper will report error-free interactive operation at 30 frames per second. We have not fully optimized the display server software, nor taken other steps that we could to reduce interactive latency, since the current latency is tolerable.

Figure 4 shows a series of images that were generated on the cluster at 512 3 voxel resolution, using 8 subvolumes of 256 3 voxels each. Upgrading the VolumePro cards will increase the capacity of this cluster to 1024 3 voxels.

6. Scalability

We have identified the steps required to scale this configuration both in object space and in image space. In object space the size of data volumes being visualized may be increased arbitrarily (in principle). In image space the size of the tiled image may be increased until it exceeds the pixel resolution of an individual display, leading to display partitioning. Both types of scaling can be supported by software-only methods, and also by firmware or hardware upgrades that increase the system performance.

In addition to scaling the cluster configuration the VolumePro ray casting engines are themselves increasing in performance, and both object and image space may be scaled just by upgrading these cards. Our demonstration cluster has been configured with first generation cards that support subvolumes of 256 3 each and render an aggregate volume of 512 3 voxels using eight servers. A second generation card that supports 512 3 subvolumes would render a gigavoxel using eight servers.

Object space scaling involves increasing the number of ray casting engines, and correspondingly, the size of the aggregate volume being visualized. The Sepia-2 architecture has been designed to maintain interactivity at scales of 1024 servers and more [hei99]. As a result increases in server count should only slightly increase the latency of the blending stage.

It is easy to scale in object space if the image size does not increase. The image size is determined by the software that converts each BP to an SBP. This software may rescale the BP into a smaller size in order to keep the SBP small enough for a single display. This rescaling operation would increase the time spent in the software conversion stage, and this would manifest an increased interactive latency. But it would allow arbitrary object space scaling for a fixed pixel resolution image.

The more interesting case involves simultaneous scaling in both object space and image space, growing the size of the aggregate data volume while simultaneously growing the pixel resolution of the displayed image. This introduces a new consideration because SBPs are no longer sent to a common display server. If images are mapped to arbitrary positions on a tiled display, images will often cross tile boundaries. If we assume that all tiles are equal sized and rectilinear then no individual image can ever appear on more than four tiles. This produces a requirement that the SBP must be segmentable in up to four contiguous portions with individual routing for each portion.

The most efficient way to support this would be for the Sepia-2 card to acquire the BP in up to four sections and merge these sections directly into their correct position in the SBPs as they flow through the network. This would require maintaining state for four different SBP streams, which might be distinguished by the use of different network aliases. Because of the near-stateless nature of our pixel operators it may be possible to support this very efficiently. An easier but slightly less efficient approach would be for software to partition the BP into four parts, move these four parts to separate SBPs in memory, then direct the Sepia-2 to acquire only the necessary portions of each of the four SBPs. The least efficient but easiest approach would be for software to replicate the full SBP four times and acquire all four copies. This increases the bandwidth requirements fourfold and would result in a significant drop in frame rate in our current hardware.

7. Discussion

From a scientific standpoint we are on course to provide a gigavoxel visualization system in a small footprint. In principle these results should be representative of a variety of structured and unstructured volume visualization techniques that make use of GL acceleration. Our larger scientific goals depend on supporting OpenGL in Linux environments, and this requires efficient DVI image acquisition. We believe our PCI design will make it easy to synchronize the DVI acquisition intervals with the software operations required for auxiliary (e.g. depth) buffer acquisition. We have conducted distributed programming experiments using glReadPixels() acquisition and this is clearly not adequate for a production system. We have seen the best Linux pixel path performance using Matrox G400Max accelerators with 32 MB frame buffers. In these cards the readback performance is over 300 MB/s and writeback (glDrawPixels) over 900 MB/s for color, depth, alpha, and stencil buffers. We continue to search for an ideal graphics engine, which would be a Linux-based OpenGL accelerator with full double buffering and binary frame buffer access through DVI.

From an architectural standpoint we continue to gain fundamental insights as we address application exercises and observe other projects. We have designed an approach to supporting general Xinerama-like capabilities to allow panning and clipping in tiled displays. This approach requires different physical packaging so that the pixel operations can be applied to two image streams in a display server rather than in a rendering server. The topology and capabilities of these operators are similar to the scalable mesh used in the Metabuffer and Lightning-2.

The most unexpected insight we have gained has been an appreciation for the value of planar operations on image fragments. We experimented with implementing a limited set of such operations in Sepia-1.

Examples included various forms of image tiling, fragment displacement, and image reassembly. The need for such operations has appeared in many guises, for example in the Xinerama-like capabilities described above. In the application presented in this paper they manifest twice, in the initial translation of the BP within the SBP, and in partitioning the SBP for multi-panel displays. We suspect that fragment operations represent a fundamental requirement for Sepia and related architectures. In Sepia this requirements might be addressed by an image prefix string that specifies a scissoring operation and routing for the scissored fragments. These fragments would then be processed by planar pixel operators, and could be routed across tile boundaries and later reassembled. Note that although Sepia can perform planar translations the pipelined nature of the architecture does not allow rotations. We are currently studying multi-scale issues. There are a number of performance enhancements we could pursue if we felt the effort were justified. Increasing the effective rate of SBP acquisition to 180 MB/s would allow the blending computation to occur at full network rates and thus reduce latency (see Figure 1, bottom right). The most efficient way to do this is for the Sepia-2 card to acquire the BP from memory and translate it appropriately into the SBP that is flowing through the network. This makes no change in the effective frame rate, because the full SBP still travels through the network. But it allows each BP to be acquired faster, or to be acquired as four segments (for multi-panel display) without increasing network bandwidth requirements. We believe the interactive latency could be cut approximately in half with limited effort.

8. Acknowledgements

We have had helpful discussions about VolumePro with Hugh Lauer, Andy Vesper, and Larry Seiler of RTViz. This work was supported by National Science Foundation grant #ACI-9982273, by the U.S. Department of Energy’s ASCI Center for Simulation of Dynamic Response of Materials, by Caltech’s Center for Advanced Computing Research, and by Compaq Computer Corporation’s Tandem Laboratories and Systems Research Center.

Bibliography

[pfi99] H. Pfister, J. Hardenbergh, J. Knittel, H. Lauer and L. Seiler. The VolumePro real-time ray-casting system. Proceedings of ACM SIGGRAPH (1999).

[lac94] P. Lacroute and M. Levoy. Fast volume rendering using a shear-warp factorization of the viewing transform. Proceedings of ACM SIGGRAPH (1994), pp. 451-457.

[hei99] A. Heirich and L. Moll. Scalable distributed visualization using off-the-shelf components. Proceedings of the IEEE symposium on parallel visualization and graphics (PVG’99) (1999) pp. 55-59.

[por84] T. Porter and T. Duff. Compositing digital images. Proceedings of ACM SIGGRAPH (1984).

[sle90] D. Slepian and A. M. Duguid, in Joseph Y. Hui, Switching and traffic theory for integrated broadband networks, pp. 53-84, Kluwer Academic Publishers (1990).

[bla00] Blanke, W. J., Bajaj, C., Fussel, D. and Zhang, X. The metabuffer: a scalable multiresolution multidisplay 3D graphics system using commodity rendering engines. Technical report TR2000-16, University of Texas at Austin, February 2000.

[sto01] Stoll, G. et al. Lightning-2, a high performance display subsystem for PC clusters. Proceedings of ACM SIGGRAPH (2001).

[hum00] Humphreys, G., Buck, I., Eldridge, M. and Hanrahan, P. Distributed rendering for scalable displays. Proceedings of Supercomputing (2000).

[mol94] Molnar, S., Cox, M., Ellsworth, D. and Fuchs, H. A sorting classification of parallel rendering. IEEE Computer Graphics and Applications 14 (1994).

[clo53] Clos, C. A study of non-blocking switching networks. Bell System Technical Journal 32, pp. 406-424 (1953).

[eld00] Eldridge, M., Igehy, H. and Hanrahan, P. Pomegranate: a fully scalable graphics architecture. Proceedings of ACM SIGGRAPH (2000).

[sam00] Samanta, R., et al. Hybrid sort-first and sort-last parallel rendering with a cluster of PCs. Proceedings of SIGGRAPH/Eurographics workshop on graphics hardware (2000).

Image gallery

Image 1
Image 2

Image 5
Figure 4. Rendered images and photographs of the hardware. Top, some representative images, some of which may be at lower resolution. These will be expanded and replaced with high resolution images in the final paper. Bottom left, the rendering cluster consists of eight Compaq SP750 graphics workstations with Pentium-III Xeon CPUs configured as shown in Figure 2. A ninth workstation functions as the display server. Bottom right, the Sepia-2 card contains two Xilinx XC4044XLA packages that support the host and network PCI interfaces, an XC4085XLA for the merge operator logic, memory (SRAM and DRAM), and a ServerNet-II network interface ASIC (Colorado-2). Three connectors allow future expansion for I/O daughtercards that could be used to support DVI.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2004 Hewlett-Packard Development Company, L.P.