researchers sharply criticize Meta’s “Galactica”


Update, November 18, 2022:

Meta AI and Papers with Code have responded to criticism from Galactica: the demo remains offline. The models are still available for researchers interested in working on and replicating the results of the article.

Meta-AI CEO Yann LeCun defended the project on Twitter, saying Galactica is meant to be a demo, not a finished product and not a replacement for doing science work and thinking on your own, but a convenience – a bit like a driving assistant in a car. .

“Real papers will contain new and interesting scientific data. This will include articles whose authors have used Galactica to help them write these articles,” writes LeCun. According to LeCun, the project is now “paused”.

A d

Debate on human interaction with AI

Ultimately, the debate is less about Galactica’s inability to deliver accurate results at all times. Rather, it is about the risk of misuse when humans adopt Galactica results without question, for example for convenience, and thereby consciously or unconsciously increase the quantity and quality of misinformation in the scientific process.

There were similar debates about the risk of misuse when GPT-3 was first introduced, for example in the context of a possible glut of fake news. As a result, OpenAI only released GPT-3 gradually and today uses many methods to reduce the risk of misuse.

However, equally powerful large language models are now available in open source. A flood of AI-generated fake news appears to have yet to materialize.

Opponents of Galactica might object that the language model is used in an academic context where precision is particularly important. In the future, however, researchers may use regular language models to support their work, which in turn may be even less accurate than Galactica. Halting work on Galactica doesn’t seem like a sensible, let alone definitive, solution to the problem described.


Gary Marcus calls Galactica a danger to science. If the language pattern is not stopped, it would be the “tipping point of a gigantic increase in the flow of misinformation,” Marcus writes, calling it an “epoch-event.”

A Wikipedia text on Marcus generated by Galactica had 85% incorrect information, according to the researcher, but it was worded plausibly. A “decent AI system” could verify this information online, but Galactica does not provide this functionality, Marcus said.

“It’s not a joke. Galactica is funny, but the uses that will be put to it are not.

False notice for false papers

Michael Black, director of the Max Planck Institute for Intelligent Systems in Tübingen, Germany, conducted his own tests in which Galactica cited nonexistent papers. Galactica, he said, was an interesting research project, but not useful for scientific work and dangerous to boot.

“Galactica generates text that is grammatical and feels real. This text will slip into real scientific submissions. It will be realistic but false or biased. It will be difficult to detect. It will influence the way people think,” Black writes.

This could lead to a new era of “deep fake science,” he says, in which researchers receive citations for papers they never wrote. These false quotes would then be reported in other newspapers. “What a waste it will be,” Black wrote.

A hint of possible AI hallucinations isn’t enough, he says: “Pandora’s box is open and we won’t be able to put the text back in.”

Galactica isn’t an accelerator for science and isn’t even useful as a writing aid, Black said. On the contrary, it distorts the research and constitutes a danger.

If we are going to have fake scientific papers, we may well have fake reviews of fake papers. And then we can also have fake reference letters for fake academics who are promoted to tenured in fake universities. I can then retire because I have nothing more to do.

Michael Black

Fundamental critique of language model research

University of Washington linguist Emily Bender finds some particularly strong words. She calls Galactica’s post trash and pseudoscience.

“Language models do not have access to ‘truth’, or any other type of ‘information’ beyond information about the distribution of word forms in their training data. And yet, here we are. Again,” Bender writes.

Bender and his colleague Chirag Shah previously criticized the use of large language models as search engines, particularly Google’s plans in this area, in a March 2022 scholarly paper.

Searching based on linguistic patterns could lead to further proliferation of fake news and increased polarization, they argue, because a search system must be able to do “more than match or generate a response”.

It should offer users different ways to interact with and make sense of information, rather than “simply retrieving it based on programmed notions of relevance and usefulness”, write the researchers.

According to them, information retrieval is “a socially and contextually situated activity with a diverse set of goals and support needs that should not be reduced to a combination of text matching and text generation algorithms” .

Similar criticisms of Galactica are currently piling up on Twitter. Meta AI and Papers with Code have yet to comment, but they have disabled the demo feature from Galactica’s website.

Leave a Comment

Your email address will not be published. Required fields are marked *