Tanmay Bakshi

What do language models really learn

Are language models truly creative, or just powerful mathematical optimizers? This talk reveals what LLMs actually learn beyond the hype.

What do language models really learn
#1about 7 minutes

The fundamental challenge of modeling natural language

Language models aim to create intuitive human-computer interfaces, but this is difficult because language syntax doesn't fully capture semantic meaning.

#2about 3 minutes

How deep learning models learn by transforming data

Deep learning works by performing a series of transformations on input data to warp its vector space until it becomes linearly separable.

#3about 3 minutes

Why the training objective is key to model behavior

The training objective, or incentive, dictates exactly what a model learns and can lead to unintended outcomes if not designed carefully.

#4about 8 minutes

From Word2Vec and LSTMs to modern transformers

The evolution from slow, non-contextual models like LSTMs to the parallel and deeply contextual transformer architecture solved major NLP challenges.

#5about 7 minutes

A practical demo of a character-level BERT model

A scaled-down, character-level transformer model demonstrates the 'fill in the blank' pre-training task by predicting masked characters in artist names.

#6about 2 minutes

What language models implicitly learn about language structure

By analyzing a model's internal weights, we can see it learns phonetic relationships and syntactic structures without ever being explicitly trained on them.

#7about 7 minutes

Why current generative models don't truly 'write'

Generative models like GPT are excellent at predicting the next word based on statistical patterns but lack the underlying thought process required for true, creative writing.

#8about 4 minutes

Exploring the future with Blank Language Models

Blank Language Models (BLM) offer a new training approach by filling in text in any order, forcing the model to consider both past and future context.

#9about 3 minutes

The need for better tooling to accelerate ML research

The complexity of implementing novel architectures like BLMs highlights the need for better infrastructure and compiled languages like Swift for TensorFlow to speed up innovation.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
LM
Luis Minvielle
What Are Large Language Models?
Developers and writers can finally agree on one thing: Large Language Models, the subset of AIs that drive ChatGPT and its competitors, are stunning tech creations. Developers enjoying the likes of GitHub Copilot know the feeling: this new kind of te...
What Are Large Language Models?
CH
Chris Heilmann
With AIs wide open - WeAreDevelopers at All Things Open 2025
Last week our VP of Developer Relations, Chris Heilmann, flew to Raleigh, North Carolina to present at All Things Open . An excellent event he had spoken at a few times in the past and this being the “Lucky 13” edition, he didn’t hesitate to come and...
With AIs wide open - WeAreDevelopers at All Things Open 2025
KD
Krissy Davis
The Best Large Language Models on The Market
Large language models are sophisticated programs that enable machines to comprehend and generate human-like text. They have been the foundation of natural language processing for almost a decade. Although generative AI has only recently gained popula...
The Best Large Language Models on The Market
BB
Benedikt Bischof
How we Build The Software of Tomorrow
Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Thomas Dohmke who introduced us to the future of AI – coding.This is how Thomas describes himself:I am the CEO of GitHub and drive the company’s...
How we Build The Software of Tomorrow

From learning to earning

Jobs that call for the skills explored in this talk.