Tachyum Introduces Prodigy ATX Platform to Democratize AI

Date: 13/02/2024

Tachyum has unveiled a new white paper detailing its Prodigy ATX Platform. The white paper is on how a single Prodigy system with 1 Terabyte (TB) of memory can run a ChatGPT4 model with 1.7 trillion parameters, whereas it requires 52 NVIDIA H100 GPUs to run the same model at significantly higher cost and power consumption.
Prodigy ATX Platform powered by Prodigy, an universal processor
aims to democratize AI by enabling access to cutting-edge AI models starting at $5,000 to broader range of users.

The Prodigy ATX Platform boasts a unique architecture designed to meet the demands of AI workloads efficiently. Its AI subsystem incorporates innovative features, delivering high performance and efficiency required for AI environments. Tachyum claims its platform offers significant cost savings and power efficiency compared to traditional solutions. For instance, a single Prodigy system with 1 Terabyte (TB) of memory can run a ChatGPT4 model with 1.7 trillion parameters, outperforming 52 NVIDIA H100 GPUs at a lower cost and power consumption, all that is explained in the white paper.

Prodigy's advanced AI subsystem supports leading-edge data types such as 4-bit TAI and effective 2-bit weights with FP8 per activation, reducing the memory footprint required for Large Language Models (LLMs). The Prodigy ATX Platform is equipped with key architectural components, including a single-socket 96-core Prodigy Universal Processor, 16 DDR5 memory controllers, PCIe 5.0 slots, M.2 NVMe slots, and a 1200W power supply.

Tachyum's "half die" solution allows a full 192-core device to function as two separate 96-core devices, increasing yield and lowering platform costs.
The platform supports a wide range of use cases, including language generation, code generation, virtual tutoring, content summarization, and fraud detection.

Tachyum explanation on how its solutions reduces use of memory:

Since the Prodigy ATX Platform is intended to leverage pre-trained models and focus on inference, Tachyum reviews the assumptions for the memory footprint required for inference. Assuming an LLM with 1 trillion parameters running with FP8, the memory required for weights is 1 TB. There is additional overhead for inference memory that is typically 0.2x the model size, or 200 GB, that is added for runtime calculations for activations. For FP8, the total memory required for a 1 trillion parameter model is approximately 1.2 TB.

Considering Tachyum’s 4-bit TAI with 4-bit weights, the memory needed for weights is reduced to 500 GB and the runtime inference memory is fixed at 200 GB for a total requirement of 700 GB. Now, considering running TAI sparse with 2-bit weights the memory required for weights is further reduced to 250 GB. With the 200 GB runtime inference memory, the total requirement is 450 GB.

If we repeat these steps for the 1.7 trillion parameter ChatGPT4 LLM: the inference memory is 0.2 x 1.7 trillion = 340 GB, and the total memory needed for FP8 is 1.7 TB for weights + 340 GB = 2.04 TB. Going from FP8 to 4-bit TAI the weights require 2x less memory, so the memory consumed by the weights is 1.7/2 = 850 GB, and the total memory requirement is 850 GB + 340 GB = 1.19 TB.

If we go to TAI sparse with 2-bit weights, the memory requirement for weights is reduced to 425 GB. If we now add the 425 GB to the memory required for inference, 425 GB + 340 GB, we come up with a total memory footprint requirement of 765 GB, which fits well below the 1 TB of commodity system memory that the Prodigy ATX Platform offers, and there is plenty of headroom, so even larger LLMs can be supported.

Key architectural components of the Prodigy ATX Platform outlined in the white paper include:

Single-socket 96-core Prodigy Universal Processor running up to 5.7 GHz with 8 DDR5 memory controllers
16 64 GB commodity DIMMs (2 DIMMs/channel) running up to DDR5-6400 with a total memory capacity of 1 TB
3 PCIe 5.0 slots that support up to full height and full-length form factors:
1 slot x16 with 16 lanes
2 slots x16 with 8 lanes
3 M.2 NVMe slots supporting 22x80mm form factor
1200W power supply

Tachyum says its Prodigy ATX Platform addresses an extensive range of use cases, such as language generation, language translation, code generation, virtual tutoring, content summarization, sentiment analysis, fraud or cyber-attack detection, and content filtering. The platform benefits the many pre-trained LLMs available today with support for both proprietary and open-source models.

Universal Processor handling all kind of workloads covering cloud and HPC/AI enable Prodigy powered servers to seamlessly and dynamically switch between computational domains. Tachyum claims Prodigy can deliver up to 4x the performance of the highest-performing x86 processors for cloud workloads, up to 3x that of the highest performing GPU for HPC, and 6x for AI applications.

Those interested in reading the “Tachyum’s Prodigy ATX Platform Democratizing AI for Everyone” white paper can download a copy from the company’s website: https://www.tachyum.com/resources/whitepapers/2024/02/08/tachyums-prodigy-atx-platform-democratizing-ai-for-everyone/

“Generative AI will be widely used far faster than anyone originally anticipated,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “In a year or two, AI will be a required component on websites, chatbots and other critical productivity components to ensure a good user experience. Prodigy’s powerful AI capabilities enable LLMs to run much easier and more cost-effectively than existing CPU+GPGPU-based systems, empowering organizations of all sizes to compete in AI initiatives that otherwise would be dominated by the largest players in their industry.”

Source: Tachyum

Tweet Follow @ecewire