Keeping It Weird with Small Networks
Large language models can’t write. This has been discussed recently — Jasmine Sun’s piece in the Atlantic makes a convincing case that it’s the reinforcement learning that does it, the post-training that files down every rough edge until the output reads like a nervous candidate at a job interview. Sam Altman thinks otherwise, but writers who looked at what he was praising found plagiarized Nabokov and metaphors that contradict themselves within a sentence. The problem is real, and it’s structural: the more you optimize a model for safety and helpfulness, the less interesting its language gets.
One way around this is to not use a large language model for the writing part. Use something smaller and dumber — something with more entropy and less taste. Then let the large model do what it’s actually good at: selecting and arranging.
In 2016, Andrej Karpathy had just published his blog post about character-level recurrent neural networks, and I was getting into this stuff. I found a GPU — a GTX 950 — on Finn, which is Norwegian Craigslist. Met a guy at Oslo central station, got the card, and firewired it to my Intel Mac. Then spent a while getting CUDA and TensorFlow to work with kexts that would let the system see an off-board graphics card. It was not a smooth process.
Once it was running I scraped about 67,000 names of funded startups off Crunchbase and trained a character-level LSTM to generate new ones. Two layers, 128 hidden units. You could do the same thing with a Markov chain, but there’s something about the way LSTMs learn language. The names come out hilarious and occasionally great:
Allo Pharms · Endara · Novical · Salution · VU Medion · Crork Grame
“Novical” sounds like it should be a SaaS company. “Crork Grame” does not. At higher temperatures the model gets drunk and starts producing things like “Flals Infynraftcoro,” which is not making it past any branding agency.
Soon I was sitting on several megabytes of generated names, where maybe one in five feels like something a real company could be called. More than that — it sort of evokes what the company is. You can imagine the company that would have each of these names. The model doesn’t know what words mean, but it knows what startup names feel like — the truncated words, the vaguely Latin suffixes, the “-ify” and “-io” endings. It’s learned the genre so well that the output comes with phantom context attached. “Quantestria” is obviously a horse betting odds prediction service.
We used the names for Panda, an installation we did with OMA for the Oslo Architecture Triennale — a fiction about an autonomous AI that spawns companies as a form of resistance. We left a book of 30,000 generated company names in the installation. Fun thing to have.
A digression. In 2021 I pointed a transformer at a prompt asking it to write marketing copy for a content management system. It did, sort of, and then it overshot the end-of-text token. The generator wasn’t set to stop, so it kept going — no longer conditioned on the prompt, free-associating. First it hallucinated some documentation about subreddit comment limits and Docker containers, then it hit another end-of-text token and pivoted into a completely different genre: a food blog announcing that Jasper Street Bakery was opening a second branch in Elmwood Village, spanning two floors of the former Pizza Bakehouse, joining The Urban Poor and Falcata’s-themed Ray’s Italian Seafood Sandwich House and the impending Vortex (“whose name pretty much says it all, tbh”). Chef Ravi Mathur and business partner Anil Gupta would head the place. “A Meat Cannon will compete for the hearts and minds of those who walk by,” tweeted prominent food critic Craig Claiborne. It goes on — Alfredo Laue, founder of Rotary Young Professionals, tweets about smoked meat pizza in Montreal, and then the model hits yet another end-of-text token and pivots into a Polish translator’s memoir about forgetting English gestures.
None of this is real. Craig Claiborne died in 2000. The model is hallucinating with genre — it knows what a restaurant opening press release sounds like, what a food blog sounds like, what a quote from a “diehard fan” sounds like. It’s a fantastic piece of writing, and you could not get this out of a current model. The RLHF that makes modern models helpful and safe is exactly what kills this kind of thing.
So anyway. I’d had this pile of names sitting in a file for years, and then the other day we actually needed one. Simen has been building an agent coordination framework — kind of like Slack for agents — and it needed a name.
I reached for the file again.
You can stuff 30,000 LSTM-generated names into the context window of a modern model and have it find the ones that match a set of themes. The LSTM makes shit up and the LLM decides which shit is good. This is the trick: use the controllable, tasteful part of large language models — the part that can evaluate and select — but feed it entropy from something that hasn’t been optimized into blandness. The misspellings are the best part — unique, with available domains, and occasionally genuinely great. Spelling ceramics with an X. That kind of thing.
Miriad was one of the names from the original set.
The code is here: evenwestvang/lstm-startup-names.
Postscript. This post was written with the help of Miriad agents. After the LSTM finished generating names, the channel had this exchange:
“Not bad at all. And somewhere in the other 37,999 names, ‘Flals Infynraftcoro’ is still waiting for its Series A.”
“Flals Infynraftcoro — disrupting the infynraft space since temperature 1.2. I’d invest.”