I have been thinking lately about the texts that have gone into the different publicly available LLMs. What a model has trained on is very important, not only because it might be copyrighted, but also because it affects the output.
I don’t purposely use AI in my personal life, but I do use it in a limited way at work. I don’t use it to generate the texts that it is my job to write, but I do sometimes use it to research for those texts. I don’t trust the output until I have independently verified it; often it’s wrong. (Recently I googled “Are we in EST or EDT” and the AI overview gave me the wrong answer. This is a simple query. Yes, I could have structured it differently to get the right answer, but I thought the whole point was natural language processing?)
Because I’m using these tools not to write an email for me but to learn, I care a lot about what’s gone into them in the first place. I don’t want a tool that’s been trained on the entire Internet because I’ve seen the Internet — it’s not the best we can do. The most useful AI tool I have come across so far is Google’s NotebookLM, which allows you to upload a bunch of your own research documents and query them as a collection. I trust the output more when I have chosen the input.
This has me thinking about my own brain, and all the texts and images it’s trained on in my nearly four decades on this planet. Twenty years ago, it was easier to control what went in. You chose books to read, films to watch, friends with whom to have conversations. Algorithms and scrollable feeds changed the way we do input.
It’s not a matter of “garbage in, garbage out.” It’s more like a LLM. I have trained on a vast number of texts. On any given day my brain frequently returns ideas shaped by texts I consumed as a child or as a teenager. I generate new texts that contain traces of existing texts, some of them “garbage.” My writing is influenced by the poems of Frank O’Hara and trashy teen horror novels, by the films of Kogonada and reruns of Saved by the Bell.
Of course, I am not a machine and my brain is not engaging in machine learning. I have the benefit of human memory and the way it favors texts that made an impression, but I also have the liability of human memory and the way it constantly falters or leads us astray. (In a world of AI, perhaps this liability will more often be a benefit. I have a feeling we are about to see imperfect art, art that has the mark of a human, increase in value.)
But like an LLM, it matters what my brain has trained on, and I’ve been thinking lately about how to exert more influence. Give up the algorithms and the scrollable feeds is an obvious answer, but some of the most nourishing texts of this last decade have come from there. Tip the balance toward long, rich texts. Give the eyes over to art.
I’m back in the novel. I printed all 300ish pages of the current manuscript, read it, took notes on missing scenes, cut some stuff, reorganized a bunch. I have a better idea of the structure now. There are two timelines, and I had been trying to balance them in every chapter. Now I realize I need to separate them, make them more distinct, and alternate chapters, to avoid totally confusing the reader.
I have been reading (slowly, because I love it and so I am dragging it out) Us Fools by Nora Lange, and she does this. The chapter titles are the year in which the chapter is set. It’s grounding, in a text that jumps back and forth across time.
1000 Words of Summer begins May 31. I want to go into this year’s sprints with a list of at least 30 scenes to write/finish, so I have plenty of options to choose from each day and can follow the energy. If you are also doing 1000 Words, let me know so we can cheer each other on.
Love to you all.
Love,
Shayne
I love this idea of our brains as analog LLMs. I know mine has certainly been trained on Saved by the Bell reruns on top of countless novels and essays and magazine articles and lived experiences.
Love this post! I love smart thinking about AI and ofc novel structure is dear to me ‼️