Meta's Use of Pirated Books for Training AI
Meta, the tech giant, is embroiled in a lawsuit concerning its use of millions of pirated books to train its Large Language Model Algorithms (LLAMA). Plaintiffs, including Richard Kadrey, accuse Meta of using copyrighted works without consent as the company did not want to pay for them.
Meta's Defense
- Meta has admitted to using these books but contends that it constitutes "fair use" of already compromised material.
- It claims that its LLAMA is "highly transformative" and utilizing copyrighted materials is crucial for developing open-source AI models.
Historical Context of Fair Use
Historically, the concept of "fair use" has roots in ancient traditions. In India, the Vedas were oral traditions considered impersonal and not attributed to any individual. This continued until a few centuries ago when authorship gained importance for authenticity and monetary value.
Controversy and Implications
- The Association of American Publishers argues that systematic copying of textual works into an LLM without additional functionalities cannot be deemed transformative.
- Meta aims to monetize these works, using them to train AI for creating new outputs, including unlicensed sequels and derivative literature, potentially trivializing individual authorship.
Meta's Justification for Non-payment
- Meta initially considered license agreements but found them costly and resource-intensive.
- The company claims individual works do not significantly enhance LLM performance, with a contribution of only 0.06%, rendering payments unnecessary.
Broader Implications and Historical Parallels
This situation mirrors historical shifts, such as the Vedic system's transition to a caste-based hierarchy where knowledge was hoarded. Meta's actions could similarly create a new techno-hierarchy, undermining authentic creative work.
The legal outcome is pivotal for setting precedents on intellectual property use in AI development.