Human Data

Human-Data-iantoons

“Human Data” – A cartoon which illustrates how today’s language models are creating huge value by accessing human generated data, often without our knowledge.

The social media age saw a Faustian bargain where consumers benefited from better ways to connect with friends while companies like Meta generated trillion dollar advertising companies.

Today, AI companies building language models are trawling through public data and doing paid partnerships (e.g. OpenAI’s reputed $250M deal with News Corp), with little value back to the individual consumer.

As an example of the scale of this trawling, repair guide site iFixit said that one of Anthropic AIs crawlers accessed 10 TB worth of files in a single day and 73 TB total in May.

However, human generated language data is going to be an even larger bottleneck for continued advancement than the billions of dollars technology companies are spending on NVIDIA chips today.

According to Epoch AI, the total available stock of human-generated text data from the internet will run out by 2028. Attempts at creating AI-generated synthetic data to train the models fails to provide much value outside niche domains like math and coding.

One way to address this is for consumers to benefit from micro-payments for when their data is used (a model that San Francisco-based Coil tried).

What is not beneficial to consumers is when they are obliged to pay (e.g. OpenAI has usage caps on the free version) or the data is surreptitiously gathered.

For example, Elon Musk recently announced the launch of xAI, an open-source LLM  with the fundamental advantage is its “real-time knowledge of the world via the X platform.

Therefore, without any opt in/out, all your posts on X are automatically contributing to this new model. If you want to opt out of this, here is a link.

More Cartoons