The Download: how your data is being used to train AI, and why chatbots aren’t doctors

Millions of images of passports, credit cards, birth certificates, and other documents containing personally identifiable information are likely included in one of the biggest open-source AI training sets, new research has found.

Thousands of images—including identifiable faces—were found in a small subset of DataComp CommonPool, a major AI training set for image generation scraped from the web. Because the researchers audited just 0.1% of CommonPool’s data, they estimate that the real number of images containing personally identifiable information, including faces and identity documents, is in the hundreds of millions.

The bottom line? Anything you put online can be and probably has been scraped. Read the full story.

—Eileen Guo

AI companies have stopped warning you that their chatbots aren’t doctors

AI companies have now mostly abandoned the once-standard practice of including medical disclaimers and warnings in response to health questions, new research has found. In fact, many leading AI models will now not only answer health questions but even ask follow-ups and attempt a diagnosis.

Such disclaimers serve an important reminder to people asking AI about everything from eating disorders to cancer diagnoses, the authors say, and their absence means that users of AI are more likely to trust unsafe medical advice. Read the full story.

—James O’Donnell

The Download: how your data is being used to train AI, and why chatbots aren’t doctors

Why the for-profit race into solar geoengineering is bad for science and public trust

3 Questions: How AI is helping us monitor and support vulnerable ecosystems | MIT News

How to Build Supervised AI Models When You Don’t Have Annotated Data

This startup wants to clean up the copper industry

Top Insights

Why Mastercard’s $2-Billion Crypto Move Could End Traditional Banking Hours

Institutions move to Bitcoin and Ethereum as altcoins take heavy losses

Why the for-profit race into solar geoengineering is bad for science and public trust

The Download: how your data is being used to train AI, and why chatbots aren’t doctors

Related Posts

Why the for-profit race into solar geoengineering is bad for science and public trust

3 Questions: How AI is helping us monitor and support vulnerable ecosystems | MIT News

How to Build Supervised AI Models When You Don’t Have Annotated Data

This startup wants to clean up the copper industry

Why Mastercard’s $2-Billion Crypto Move Could End Traditional Banking Hours

Institutions move to Bitcoin and Ethereum as altcoins take heavy losses

Why the for-profit race into solar geoengineering is bad for science and public trust

Subscribe to Updates