Hackworth

joined 6 months ago
[–] Hackworth@lemmy.world 6 points 1 week ago* (last edited 1 week ago)

I think it's more likely a compound sigmoid (don't Google that). LLMs are composed of distinct technologies working together. As we've reached the inflection point of the scaling for one, we've pivoted implementations to get back on track. Notably, context windows are no longer an issue. But the most recent pivot came just this week, allowing for a huge jump in performance. There are more promising stepping stones coming into view. Is the exponential curve just a series of sigmoids stacked too close together? In any case, the article's correct - just adding more compute to the same exact implementation hasn't enabled scaling exponentially.

[–] Hackworth@lemmy.world 2 points 1 week ago (1 children)

There used to be very real hardware reasons that upload had much lower bandwidth. I have no idea if there still are.

[–] Hackworth@lemmy.world 1 points 2 weeks ago

Ditto, I was about to start waxing poetic about my bard.

[–] Hackworth@lemmy.world 1 points 3 weeks ago

Yeah, but they encourage confining it to a virtual machine with limited access.

[–] Hackworth@lemmy.world 49 points 3 weeks ago (2 children)

Huh. Grandpa Simpson was right. It did happen to me too.

[–] Hackworth@lemmy.world 5 points 3 weeks ago

Logic and Path-finding?

[–] Hackworth@lemmy.world 29 points 3 weeks ago (3 children)

Shithole country.

[–] Hackworth@lemmy.world 4 points 4 weeks ago* (last edited 4 weeks ago)

Yeah, using image recognition on a screenshot of the desktop and directing a mouse around the screen with coordinates is definitely an intermediate implementation. Open Interpreter, Shell-GPT, LLM-Shell, and DemandGen make a little more sense to me for anything that can currently be done from a CLI, but I've never actually tested em.

[–] Hackworth@lemmy.world 7 points 4 weeks ago* (last edited 4 weeks ago) (2 children)

I was watching users test this out and am generally impressed. At one point, Claude tried to open Firefox, but it was not responding. So it killed the process from the console and restarted. A small thing, but not something I would have expected it to overcome this early. It's clearly not ready for prime time (by their repeated warnings), but I'm happy to see these capabilities finally making it to a foundation model's API. It'll be interesting to see how much remains of GUIs (or high level programming languages for that matter) if/when AI can reliably translate common language to hardware behavior.

[–] Hackworth@lemmy.world 1 points 4 weeks ago (1 children)

Can I blame Trump on 9/11 or something?

[–] Hackworth@lemmy.world 1 points 1 month ago

Aren't they in Macy's now? Wait, is Macy's still a thing?

 

“An intriguing open question is whether the LLM is actually using its internal model of reality to reason about that reality as it solves the robot navigation problem,” says Rinard. “While our results are consistent with the LLM using the model in this way, our experiments are not designed to answer this next question.”

The paper, "Emergent Representations of Program Semantics in Language Models Trained on Programs" can be found here.

Abstract

We present evidence that language models (LMs) of code can learn to represent the formal semantics of programs, despite being trained only to perform next-token prediction. Specifically, we train a Transformer model on a synthetic corpus of programs written in a domain-specific language for navigating 2D grid world environments. Each program in the corpus is preceded by a (partial) specification in the form of several input-output grid world states. Despite providing no further inductive biases, we find that a probing classifier is able to extract increasingly accurate representations of the unobserved, intermediate grid world states from the LM hidden states over the course of training, suggesting the LM acquires an emergent ability to interpret programs in the formal sense. We also develop a novel interventional baseline that enables us to disambiguate what is represented by the LM as opposed to learned by the probe. We anticipate that this technique may be generally applicable to a broad range of semantic probing experiments. In summary, this paper does not propose any new techniques for training LMs of code, but develops an experimental framework for and provides insights into the acquisition and representation of formal semantics in statistical models of code.

 
Attribute Unconscious Mind Generative AI
Information Processing Processes vast amounts of information rapidly and automatically, often without conscious awareness (From the first studies of the unconscious mind to consumer neuroscience: A systematic literature review, 2023) Processes large datasets quickly, extracting patterns and generating outputs without explicit programming for each task (Deep Learning, 2015)
Pattern Recognition Recognizes complex patterns in sensory input and past experiences, influencing behavior and decision-making (Analysis of Sources about the Unconscious Hypothesis of Freud, 2017) Excels at identifying patterns in training data, forming the basis for generating new content or making predictions (A Survey on Deep Learning in Medical Image Analysis, 2017)
Creativity Contributes to creative insights and problem-solving through unconscious incubation and associative processes (The Study of Cognitive Psychology in Conjunction with Artificial Intelligence, 2023) Generates novel combinations and ideas by recombining elements from training data in unexpected ways (e.g., GANs in art generation) (Generative Adversarial Networks, 2014)
Emotional Processing Processes emotional information rapidly, influencing mood and behavior before conscious awareness (Unconscious Branding: How Neuroscience Can Empower (and Inspire) Marketing, 2012) Can generate text or images with emotional content based on patterns in training data, but lacks genuine emotions (Language Models are Few-Shot Learners, 2020)
Memory Consolidation Plays a crucial role in memory consolidation during sleep, strengthening neural connections (The Role of Sleep in Memory Consolidation, 2001) Analogous processes in some AI systems involve memory consolidation and performance improvement (In search of dispersed memories: Generative diffusion models are associative memory networks, 2024)
Implicit Learning Acquires complex information without conscious awareness, as in procedural learning (Implicit Learning and Tacit Knowledge, 1994) Learns complex patterns and rules from data without explicit programming, similar to implicit learning in humans (Deep Learning for Natural Language Processing, 2018)
Bias and Heuristics Employs cognitive shortcuts and biases that can lead to systematic errors in judgment (Thinking, Fast and Slow, 2011) Can amplify biases present in training data, leading to skewed outputs or decision-making (Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models, 2023)
Associative Networks Forms complex networks of associations between concepts, influencing thought and behavior (The associative basis of the creative process, 2010) Creates dense networks of associations between elements in training data, enabling complex pattern completion and generation tasks (Attention Is All You Need, 2017)
Parallel Processing Processes multiple streams of information simultaneously (Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1986)) Utilizes parallel processing architecture (e.g., neural networks) to handle multiple inputs and generate outputs (Next Generation of Neural Networks, 2021)
Intuition Generates rapid, automatic judgments based on unconscious processing of past experiences (Blink: The Power of Thinking Without Thinking, 2005) Produces quick outputs based on learned patterns, which can appear intuitive but lack genuine understanding (BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2019)
Priming Effects Unconscious exposure to stimuli influences subsequent behavior and cognition (Attention and Implicit Memory: Priming-Induced Benefits and Costs, 2016) Training on specific datasets can "prime" generative AI to produce biased or contextually influenced outputs (AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias, 2018)
Symbol Grounding Grounds abstract symbols in sensorimotor experiences and emotions (The Symbol Grounding Problem, 1990) Struggles with true symbol grounding, relying instead on statistical correlations in text or other data (Symbol Grounding Through Cumulative Learning, 2006)
Metaphorical Thinking Uses embodied metaphors to understand and reason about abstract concepts (Metaphors We Live By, 1980) Can generate and use metaphors based on learned patterns but lacks deep understanding of their embodied nature (Deep Learning-Based Knowledge Injection for Metaphor Detection, 2023)
Dream Generation Produces vivid, often bizarre narratives and imagery during REM sleep (The Interpretation of Dreams, 1900) Some generative models can produce dream-like, surreal content (Video generation models as world simulators, 2024)
Cognitive Dissonance Automatically attempts to reduce inconsistencies between beliefs and behaviors (A Theory of Cognitive Dissonance, 1957) MoE architectures can handle a wider range of inputs without ballooning model size, suggesting potential for resolving conflicts between different AI components by synthesizing expert opinions into a coherent whole​ (Optimizing Generative AI Networking, 2024).
 

Also See: Worldwide Federated Training Of Language Models

Claude's Summary:

The two papers, "Worldwide Federated Training of Language Models" by Iacob et al. and "The Future of Large Language Model Pre-training is Federated" by Sani et al., both propose using federated learning (FL) as a new paradigm for pre-training large language models (LLMs). The main ideas are:

  1. FL allows leveraging more data and compute resources from multiple organizations around the world, while keeping the data decentralized and private. This can enable training larger LLMs on more diverse data compared to centralized training.

  2. FL relaxes synchronization requirements and reduces communication overheads compared to data-parallel distributed training, making it feasible for geographically distributed participants with varying hardware and connectivity.

  3. The papers present systems and algorithms for enabling efficient federated pre-training of LLMs at billion-parameter scales. Key techniques include allowing participants to modulate their amount of local training based on resource constraints, and partially personalizing models to clusters of participants with related data.

  4. Experimental results show federated LLM pre-training can match or exceed centralized training performance, with the performance gap narrowing as model size increases to billions of parameters. Larger federated models also converge faster and are more robust.

  5. Challenges include data and hardware heterogeneity across participants. The papers propose techniques like adaptive aggregation and load balancing to mitigate these issues.

In summary, the papers argue federated learning is a promising new direction for democratizing LLM pre-training by allowing many more organizations to collaboratively train large models on their combined data and compute resources. Let me know if you would like me to expand on any part of the summary or papers in more detail.

view more: next ›