186
Google to pause Gemini AI image generation after refusing to show White people.
(www.foxbusiness.com)
This is a most excellent place for technology news and articles.
I think the interesting thing about this is that these LLMs are essentially like children: they don't have the benefit of years and years of social training to learn our complex set of unspoken rules and exceptions.
Race consciousness is such an ever-present element of our social interactions, and many of us have been habituated not to really notice it. So it's totally understandable to me that LLMs reproduce our highly contradictory set of rules imperfectly.
To be honest, I think that if we can set aside our tendency to understandably avoid these discussions because they're usually instigated by racist trolls, there's some weird and often unexamined social tendencies we can interrogate.
I think it's helpful to remind ourselves frequently that race is real like gender, but not like sex. Race exists because when people encountered new cultures, they invented a pseudoscience to create the concept of whiteness.
Whiteness makes no sense. Who is white is highly subjective, and it's always been associated with the dominant mainstream culture to which whiteness claims ownership. This means that you either buy into the racist falsehood that white culture is interchangeable with the default culture or it has no culture at all.. Whiteness really exists only in opposition to perceived racial inferiority. Fundamentally, that's all "white" means. It's a weird anachronistic euphemism for, "Not racially inferior".
There are plenty of issues with our racial construction of blackness and the quality of being Asian and east Asian and Desi and Indigenous and Latin, but none are quite as fucked up, imo, as the fact that we as a culture attempt to continue to use the concept of "Whiteness" as a non-racist construction. In my thinking, it can be a useful tool for studying the past and studying an unhealthy set of attitudes we're still learning to unlearn. But it's not possible to reform the concept, because it's fundamentally constructed upon beliefs we're trying to discard. If you replace every use of "white" with "not one of the lesser races", then I think you get a better understanding of why it's never going to stop causing problems as long as we try to use it in a non-racist way.
Today, people who were told growing up to view themselves as "white" now feel a frankly understandable sense of grievance and cultural alienation. Because we've begun acting more consistently and recognizing that there's really no benign version of white pride, but we never bothered to teach people to stop thinking of anyone as "white" or taught the people who identify as white to find pride in an actual culture. Midwestern in a culture. Irish is a culture. New Englander is a culture. White has never been a culture. But if we don't ever acknowledge that the entire concept's only value is as a tool to understand racism, it's inevitable that a computer repeating back to us our own attitudes is going to look dumb, inconsistent and either racially biased for or against white people.
Naw, dog. LLMs are nothing like children. A child has an inaccurate model of the world in their heads. I can explain things to them and they'll update their believs and understandings.
LLMs don't understand. Period.
I think this rigid thinking is unhelpful.
I think this presentation -- which at 10 months old is already quite dated! -- does a good job examining these questions in a credible and credulous manner:
Sparks of AGI: Early Experiments with GPT4 (presentation) (text)
I fully recognize that there is a great deal of pseudomystical chicanery that a lot of people are applying to LLM's ability to perform cognition. But I think there is also a great deal of pseudomystical chicanary underlying the mainstream attitudes towards human cognition.
People point to these and say, 'They're not thinking! They're just making up words, and they're good enough at relating words to symbolic concepts that they credibly imitate understanding concepts! It's just a trick.' And I wonder: why are they so sure that we're not just doing the same trick?
I can't take that guy seriously. 16 minutes in he's saying the model is learning while also saying it's entirely frozen.
It's not learning, it's outputting different data that was always encoded in the model because of different inputs.
If you taught a human how to make a cake and they recited it back to you and then went and made a cake a human demonstrably learned how to make a cake.
If the LLM recited it back to you it's because it either contained enough context in its window to still have the entire recipe and then ran it through the equivalent of "summarize this - layers" OR it had the entire cake recipe encoded already.
No learning, no growth, no understanding.
The argument of reasoning is also absurd. LLMs have not been shown to have any emergent properties. Capabilities are linear progress based on parameters size. This is great in the sense that scaling model size means scaling functionality but it is also directly indicative that "reason" is nothing more than having sufficient coverage of concepts to create models.
Which of course LLMs have models: the entire point of an LLM is to be an encoding of language. Pattern matching the inputs to the correct model improves as model coverage improves: that's not unexpected, novel or even interesting.
What happens as an LLM grows in size is that decreasingly credulous humans are taken in by anthropomorphic bias and fooled by very elaborate statistics.
I want to point out that the entire talk there is self described as non-quantitative. Quantitative analysis of GPT4 shows it abjectly failing at comparatively simple abstract reasoning tests, one of the things he claims it does well. Getting a 33% on a test that the average human gets above 90% on is a damn bad showing, barely above random chance.
LLMs are not intelligent, they're complex.
But even in their greatest complexity they entirely fail to come within striking distance of even animal intelligence, much less human.
Do you comprehend how complex your mind is?
There are hundreds of neural transmitters in your brain. 20 billion neocortical neurons and an average 7 thousand connections per neuron. A naive complexity of 2.8e16 combinations. Each thought tweaking those ~7000 connections as it passes from neuron to neuron. The same thought can bounce between neurons, each time the signal getting to the same neuron it gets changed by the previous path, how long it has been since it last fired and the strengthened or weakened connection from other firings.
If you compare parameters complexity to neural complexity that puts the average, humdrum human mind at 20,000x the complexity of a model that cost billions to train and make... Which is also static. Only changed manually when they get into trouble or find bettI can't take that guy seriously. 16 minutes in he's saying the model is learning while also saying it's entirely frozen.
It's not learning, it's outputting different data that was always encoded in the model because of different inputs.
If you taught a human how to make a cake and they recited it back to you and then went and made a cake a human demonstrably learned how to make a cake.
If the LLM recited it back to you it's because it either contained enough context in its window to still have the entire recipe and then ran it through the equivalent of "summarize this - layers" OR it had the entire cake recipe encoded already.
No learning, no growth, no understanding.
The argument of reasoning is also absurd. LLMs have not been shown to have any emergent properties. Capabilities are linear progress based on parameters size. This is great in the sense that scaling model size means scaling functionality but it is also directly indicative that "reason" is nothing more than having sufficient coverage of concepts to create models.
Which of course LLMs have models: the entire point of an LLM is to be an encoding of language. Pattern matching the inputs to the correct model improves as model coverage improves: that's not unexpected, novel or even interesting.
What happens as an LLM grows in size is that decreasingly credulous humans are taken in by anthropomorphic bias and fooled by very elaborate statistics.
I want to point out that the entire talk there is self described as non-quantitative. Quantitative analysis of GPT4 shows it abjectly failing at comparatively simple abstract reasoning tests, one of the things he claims it does well. Getting a 33% on a test that the average human gets above 90% on is a damn bad showing, barely above random chance.
LLMs are not intelligent, they're complex.
But even in their greatest complexity they entirely fail to come within striking distance of even animal intelligence, much less human.
Do you comprehend how complex your mind is?
There are hundreds of neural transmitters in your brain. 20 billion neocortical neurons and an average 7 thousand connections per neuron. A naive complexity of 2.8e16 combinations. Each thought tweaking those ~7000 connections as it passes from neuron to neuron. The same thought can bounce between neurons, each time the signal getting to the same neuron it gets changed by the previous path, how long it has been since it last fired and the strengthened or weakened connection from other firings.
If you compare parameters complexity to neural complexity that puts the average, humdrum human mind at 20,000x the complexity of a model that cost billions to train and make... Which is also static. Only changed manually when they get into trouble or find better optimizations.
And it's still deeply flawed and incapable of most tasks. It's just very good at convincing you with generalizations.
I agree with your factual assessments.
The points on which I think it makes sense to remain open minded are these:
The question we're examining is not whether current LLMs or any LLM by itself is sentient, but whether they're a step towards it. I think we need to be humble because the end point of AGI is not something we can claim to understand at the stage. We can make very reasonable assessments like the ones you're making about what these specifically can't do by themselves. But could an could an LLM constitute a potential module within an AGI, for instance? If a future system combined an LLM with a mechanism for self examination and self-guided retraining, what might be the product? I think these are reasonable ideas to consider.
I really think we need recognize the subjectivity at play here and formulate our inquiry around what functions it can perform without getting sidetracked into its internal state. We can never know if any machine can experience love. But we can assess whether a machine can convince a human that it loves them. If a machine were to create a work of art that humans found beautiful and innovative, we can't know if the machine is able to appreciate beauty, but we can infer that it's achieved a certain level of capability which we associate with artistry when demonstrated by humans. This is an issue that arrises when discussing art made by elephants. Are elephant painters truly creative, or just experimenting with the tools? I think that's an unproductive question to ask. I think we need to benchmark primarily based on overall performance regardless of internal states, because of point three:
I think we're comparing these systems to humans based on misconceptions of how sentient humans really are. Humans do many things which appear more intentional or motivated than we know them truly to be based on cognitive neuroscience. What we know about humans is based on our individual experiences within our own minds and observations of the performance of others. And this is remarkably biased toward overestimating the depth of our own facilities. We grossly overestimate how much we talk before we think, for instance. And we cannot measure or prove a human's ability to feel love any more than we can for a machine. We know these things exist because we can experience them, and others have the persuasive ability to convince us that they experience them as well. But epistemologically, how do we define our experience of pain as essentially different from a machine which reports a diagnostic that it is damaged?
Ultimately, I agree with you on the broad strokes. I agree about the state of the current technology. I disagree with some of your certainty of the future of this technology, and the ways in which we assess it.
Working through a response on mobile so it's a bit chunked. I'll answer each point in series but it may take a bit.
Can that model be tweaked and tuned and updated? Sure. But there's no reason to think that it demonstrates any capability out of the ordinary for "queryable encoded data", and plenty of questions as to why natural language would be the queryable encoding of choice for an artificial intelligence. Your brain doesn't encode your thoughts in English, or whatever language your internal thoughts use if you're ESL+, language is a specific function of the brain. That's why damage to language centers in the brain can render people illiterate or mute without affecting any other capacities.
I firmly believe that LLMs as a component of broader AGI is certainly worth exploring just like any of the other hundreds of forms of genetic models or specialized "AI" tools: but that's not the language used to talk about it. The overwhelming majority of online discourse is AI maximalist, delusional claims about the impending singularity or endless claims of job loss and full replacement of customer support with ChatGPT.
Having professionally worked with GitHub Copilot for months now I can confidently say that it's useful for the tasks that any competent programmer can do as long as you babysit it. Beyond that any programmer who can do the more complex work that an LLC can't will need to understand the basics that an LLC generates in order to grasp the advanced. Generally it's faster for me to just write things myself than it is for Copilot to generate responses. The use cases I've found where it actually saves any time are:
Generating documentation (has at least 1 error in every javadoc comment that you have to fix but is mostly correct). Trying documentation first and code generated from it never worked well enough to be worth doing.
Filling out else cases or other branches of unit test code. Once you've written a pattern for one test it stamps out the permutations fairly well. Still usually has issues.
Inserting logging statements. I basically never have to tweak these, except prompting for more detail by writing a
,
This all is expected behavior for a model that has been trained on all examples of code patterns that have ever been uploaded online. It has general patterns and does a good job taking the input and adapting it to look like the training data.
But that's all it does. Fed more training data it does a better job of distinguishing patterns, but it doesn't change its core role or competencies: it takes an input and tries to make it's pattern match other examples of similar text.