Over the 18 years of its existence, Reddit has accumulated a huge treasure trove of human interactions and conversations. These volumes of data are ideal raw material for training large artificial intelligence language models (LLMs), also known as AI-based chatbots. Now Reddit wants to capitalize on the accumulated cognitive treasures and will charge companies for access to the API required for LLM training.
Course
Fullstack Web Development
Big tech companies like Google and OpenAI use Reddit to feed AI services. Now Reddit wants to monetize its assets and is introducing “a new premium access point for third parties.”
It is not yet clear how much companies will have to pay for access to the data. It is only known that there are several levels of access, probably aimed at companies of different sizes. Tiers will differ in restrictions (or broader rights) of use.
“Reddit’s data collection is really valuable,” said Steve Huffman, founder and CEO of Reddit. “And we don’t need to give away all that value to some of the biggest companies in the world for free.”
Reddit is far from the only online repository of information used for LLM education. Web archives such as Common Crawl are also often used to train chatbots. However, Common Crawl and similar services provide raw data, such as large pools of information stored on the Internet. At the same time, Reddit’s data includes conversations between people. Comprehensive AI requires access to both types of data to increase factual accuracy and match human behavior.
UMG is demanding that Apple and Spotify block the parsing of music and lyrics by AI systems
Source: Engadget