Skip to main content

Zyphra debuts Zyda, a 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv

Zyphra's Zyda is a 1.3T open dataset combining RefinedWeb, Starcoder, C4, Pile, Slimpajama, pe2so, and arxiv to help train large language models.


https://bit.ly/4aUDRe2

Popular posts from this blog

How we adapted Tetris for the cloud

GUEST: I remember it fondly … long road trips with my family and hours in the car, with Tetris on my brother’s Game Boy keeping us company. Clearing line after line, nailing a Tetris whenever I could, having that iconic music loop in my head for hours until the batteries died … for many of us, […] https://bit.ly/2QFq9os

AI Weekly: In a chaotic year, AI is quietly accelerating the pace of space exploration

AI and machine learning technologies are accelerating space exploration during a chaotic year on planet Earth. https://bit.ly/35PLvqD

The DeanBeat: Predictions for gaming in 2021

Dean Takahashi of GamesBeat makes his annual predictions about where gaming will go. Let's hope 2021 will be a better year all around. https://bit.ly/3hxWQ3U