Explore Ai like Robert Frank Explored America

Imagining text and image ai systems as geographic realms

Apr 09, 2023

This Sunday, Easter morning, I woke up thinking about LLMs as landscapes, like memory palaces. I pondered this partly because of:

The Creativity Gap

who explored Written Language as Hyper Landscape yesterday. Additional inspiration: “Image Space,” explained here in a brilliant visual way around minute 7:50, which I watched last November.

I will now share my thoughts on Ai as sensual realms via these meaning-drenched screen squiggles known as words.

“A castle made of cables in the style of vaporwave.” Generated by Dall-E

The Vocabulary

First and quickly, some vocabulary in case anyone is unfamiliar. If you’re familiar with these terms, skip to the next section.

Memory Palace: Also known as the “Method of Loci,” a memory palace is a mnemonic device used for memorization and learning, helpful for acquiring new languages. One mentally visualizes a space familiar to them or a newly created space, for example, a palace. Then one associates the mental space with concepts they’re trying to memorize. For example—you’re learning Spanish. You close your eyes and go to the Spanish Memory Palace. You imagine a gold throne. King Oro sits on the gold throne because in Spanish the word for gold is Oro. You remember this new Spanish word better because you’ve seen King Oro sitting on his gold throne in the Spanish Memory Palace.
Machine Learning: Machine learning is a subfield of artificial intelligence (AI) that develops algorithms and models to make predictions or decisions based on data. Machine learning teaches computers to learn and improve from experience without being explicitly programmed. Examples: image and speech recognition, natural language processing, recommendation systems, and predictive analytics.
Neural Networks: A neural network is a computer system that works like a human brain. Tiny parts (neurons) message back and forth. It learns to recognize patterns by practicing with lots of examples. For example, it learns to recognize kings on thrones by looking at many different pictures of kings on thrones and figuring out what they all have in common. Like a child’s brain, it can identify objects, understand words, and eventually generate complex combinations and conclusions.
Topography: Topography refers to the physical and geographical features of a place, such as hills, valleys, and rivers, or the arrangement of complex systems or structures, including the human body or a neural network.
LLMs/Large Language Models: Computer programs that use machine learning algorithms on super large, mega bodies of text (like the entire internet) in order to converse with humans in human languages. These models memorize patterns and relationships within language. They are colloquially known as Ai/artificial intelligence. An example: ChatGPT.
Ai Audio Generators: For music and voice, Ai audio generators use neural networks to create voices, sounds, and music. Generative AI audio systems typically work by training a neural network on a large dataset of audio samples. Examples: AIVA and others.
AIGs/Artificial Image Generators: AIGs or AiArt programs are computer image generator software similar to LLMs, but for pixels and images. AIGs can create images based on human text prompts. Example: MidJourney. There are several types of AIGs:
1. GANS (general adversarial networks) — a program that uses two neural networks to work together to create original images from pre-existing images.
2. VAEs (Variational Autoencoders) — A program that does unsupervised learning of complex data, including image data, in order to create new images by sampling from the learned distribution of image features.
3. Deep Dream: A program that uses neural networks to visualize new images based on human input. The technique involves feeding the input image into a pre-trained neural network and then iteratively “breeding” the image to maximize the activation of certain elements in the network.
Observer Effect: "Observer effect" or the "measurement problem" in quantum mechanics. This principle suggests that the act of observing or measuring a system changes the state of that system. The act of observation alters the outcome.

Human Memory and the Landscape

The human brain evolved to navigate landscapes, to remember geographical features, and to find food and safe places. The same part of the brain involved in spatial processing, the processing required to navigate landscapes, the hippocampus, also controls memory. It allows us to make maps, visualize surroundings, remember directions, design buildings, and even construct and navigate metaverses. Our brains create, visualize, process, and navigate landscapes using hormones and chemicals, our own algorithms and messengers.

Without this ability, human children might get lost in the woods and eaten by witches in candy houses, in which case they won’t be able to reproduce as adults. Evolution wants survival and reproduction, so it needs geographical processing. Navigating landscapes will be equally important for the future of ai, for robots, and for robocars.

Neural Networks as Topographies

Castles made of electronics generated by Dall-E

Some reproductive drive within humans seeks also to replicate itself in any way possible, from the sexual reproduction of babies to the production of computerized silicone brains; so as we make Ai in our own image, there is a pull to represent neural networks as the topography of our imagination, our brains—the most complex system within us.

However, just like how we don’t fully understand the human brain, we don’t fully understand how LLMs or even sprawling algorithms like Twitter actually, fully work. We can’t always predict their behavior. I can’t predict what the Ai image generator will spit out even if I wrote the prompt. We can build them and let them run unsupervised, but since the human memory palace is limited to a set terrain and capacity, once the computer goes beyond our city limits, we lose full understanding of where it is. Hence we must train it to convert its doings into a story, a visualization, a single image, a snapshot of its landscape, or some language we can make sense of, with a useable map to its possibly-infinite-possibilities memory palace.

We can think of LLMs as calculators for words, or as places.

In this vein, where might we adventurous, reproduction-hungry humans venture? What realms and maps will allow us to communicate with and document the better-memory-than-humans-machine(s)?

The Camera in the Machine

A castle made of computers in the style of Robert Frank, generated by Dall-E

Earlier this month, I wrote about documentary photographer Robert Frank, a classic example of an “outsider looking in.” Robert Frank was a Swiss street photographer and filmmaker who roamed America taking photos in the 1950s. Influenced by Jack Kerouac and Allen Ginsberg, he photographed whatever attracted his eye on road trips, becoming the imaging arm of the beat generation. He is best known for his documentary photography book, The Americans.

Now, imagine we are Robert Frank, dropped into the neural network of the Ai image realm. This is a world similar to Luis Borges's concept of the Library of Babel, an infinite and searchable world. It is more than a memory palace, but a memoryverse, a visualverse—everything seen and possible to be seen in infinity.

An illustration of one of the Library of Babel's hexagonal reading rooms by Antonio Toca Fernandez. It is embedded with links to the four shelves of books each hexagon contains. — Visual depiction of a wall in the Library of Babel from https://libraryofbabel.info/browse.cgi

Frank can roam and take snaps of whatever he wants. Instead of a car, he uses text commands to jump from spot to spot in the realm, the way characters move in video games or in the Minecraft realm. Using whatever words he can imagine, and by combining prompts, he can jump to new places, like the lillypad of the three-eyed-owl frog and cat, and he can take documentary photographs of what he sees. Without prompts he can also just roam, capturing whatever images exist in the infinite Seeworld, except maybe it isn’t infinite, though it may as well be. The realms are like eternal returns, where even if exhausted, things can repeat, regenerate, vary, and breed. He can photograph them all and save them to his clipboard, endless variations for the eye and camera lens to imbibe and capture.

As with the "observer effect" in quantum mechanics, maybe in these ai realms, you only see what you look at. Maybe they always exist or maybe they are generated like video game scenes, on the fly, based on a set of rules and parameters, as you explore and request them.

“A Castle of Computers in the Style of Robert Frank,” generated by Dall-E

How do we make these worlds? With math equations.

And the computer does math the best, better than us.

More Worlds

The Library of Babel in the style of MC Escher generated by Dall-E

Likewise, Frank can drop into WordRealm, the Letterverse, and he can explore, document, and share any combination of words that ever existed or will exist.

Frank can travel to SoundWorld, the audioverse. Inside SoundWorld, Frank can hear every combination of every pitch, timber, volume, and vibration. He can listen to any voice possible for a human to hear. In Soundworld every musical combination possible exists and he can use human language to navigate to their studios and listening booths. He can say, take me to Beyonce at age 21 singing the words to the first chapter of Cujo set to Beethoven’s fifth. In SoundWorld she is out there somewhere, vibrating like this and in every variation possible. Again, maybe it is not infinite, but for humans constrained by the lifespan of human time, it may as well be. Franck can’t possibly record everything, so he just visits the soundrooms that catch his ear, that tickle his breasts.

You get the idea.

Are there other realms to capture on his cameras? Other sensual topographies? Will the smellverse or touchverse exist one day?

Want to Keep Going Down the Bunny Hole?

Play with this: The Library of Babel.

The Tower of Babel, 1594 by Lucas Van Valckenborgh via the Louvre.fr

Beware of the Trillion-Dimensional Space:

Or get really nerdy and read about mapping neurochemicals to create neural network lookup tables: Transitions between cognitive topographies: contributions of network structure, neuromodulation, and disease | bioRxiv

Now You

Which senseworld do you wish to roam? Where would you point your camera?

Can the universe be compressed into a database?

Why do we want to replicate the entire universe and can there be two infinities? A copy of infinity? What do we call infinite copies of infinity?

Will we soon have an Ai metaverse which is the universe of everything ever possible? An infinite topography you can drop in and explore on foot or with text or voice porting? A universe replica containing infinite soundscapes, imagescapes, and storyscapes?

Is ai an infinite memory palace of math’s imagination?

Charlotte Dune's Lagoon

Discussion about this post