Humans are the biggest babies on the planet. I don’t mean the size of our fat heads because Elephants and Blue Whales are the clear winners there. Humans come in last across the animal kingdom in how long it takes us to develop basic life skills and learn how to communicate - years and years and years! Many animal parents hide away 1000’s of eggs and just skedaddle. Sure, survival rate is low but once that baby turtle scurries off the beach and finds somewhere safe, they are born with all the skills they need to carry on.
Child prodigies fascinate us because somehow their big brains let them accelerate the development and learning process and write symphonies when they’re 8 or mathematical proofs at 16. It’d be ok to have a few more Mozart’s and Pascal’s around.
The mountain of human knowledge grows ever higher and while we stand on the shoulders of giants, one still needs to understand what those giants were on about, so you can climb and build the mountain higher. Learning is the key to discovering and creating new things. What if we could learn faster?
Sci-Fi movies love the alien who happens upon Earth and then learns all of human history by watching movies at 10X speed or plugging their brain into the internet1. Sci-Fi books tend a bit more cranial, and you meet the alien race who cannot believe it takes our big thinkers around 25 years before they’re ready to contribute new knowledge. Then there is the story where the Aliens offer us a magical device to dramatically increase our ability to learn.2
We’ve got that device now and it’s AI. Companies are all over it. You know that drowning/drinking from the fire hose feeling when you start a new job? Sure, you’ve get a mentor or a work buddy, but you can’t sit on their lap all day. What if that buddy was always around and knew all the company’s acronyms and systems and processes and you could just check-in with the context of what you were working on, and it would help you out? Corporate AI systems are doing this today.3
Large Language Models or LLMs are the heart of these new learning systems, and they learn or are trained against huge sets of text information. They teach themselves by patterns they find across all this data and build corresponding models to create responses to questions.
There’s been a ruckus ever since ChatGPT became the first consumer facing LLM last year - what data was it trained on!? The New York Times filed a lawsuit4 against OpenAI for copyright violation for this exact thing, but you have to understand that AI/LLMs don’t store any data or NYT articles for reference but rather they voraciously consume it all to create models of how humans communicate. There is no database of NYT’s articles for reference; that’s not how it works.
Now AI companies are making deals with content providers to get unfettered access to their data to train and build their models. The latest one in the news is Reddit, who is about to go public, striking a deal with Google5. Google gets to use Reddit content to train its AI; Reddit gets to use Google AI to improve user services like search6 oh yeah and a cool $60 Million a year.
I saw the craziest commercial the other day. It was a phone call between a super suave world traveler dude emerging from the arrivals building at the airport in Bangkok and calling an Uber driver to tell him where to pick him up. The phone translates from English in real time including the driver’s response back in Thai. That’s some crazy sci-fi stuff.7
Deep in the bowels of Substack’s site settings there is a toggle to prevent AI using your writing to train:
Go for it GPTBot8 - train on The Wirepine Weekly!9
It makes me happy to know I can make a smol contribution to the grand corpus of human knowledge to train AI and keep compressing the learning curve so we can do better things faster for our people and our planet 🌎
Liu Cixin, my current sci-fi crush has written a couple of stories about this including The Village Teacher.
Another great and real example is helping developers learn and write code. Say you want to learn Python but would rather do it in weeks not months? Visual Studio Code is a development environment from Microsoft. VS Code has an extension that’s effectively an LLM trained on Github which is the world’s largest code repository. Github has more than 100 million developers and over 400 million code repositories - that’s some rich training data! Of course, it’s called GitHub Copilot and it will generate code for you based on what you’re trying to do and then revise it based on your specific requirements.
Ironically, often the best way to make Google cough up something actually useful is to add on ‘reddit’ at the end of your search string. Reddit’s organization by topic (subreddits) and up/down voting system to highlight the most relevant content is perfect training data for LLMs. More on the deal if there are any lawyers out there. Do I have any lawyer friends/readers? Might be handy, let me know ;)
In an odd twist to this story, the Reddit IPO filing also revealed that Sam Altman, OpenAI CEO, is one of Reddit’s biggest shareholders - #3 behind Condé Nast and Tencent (yep the Chinese multimedia conglomerate). OpenAI recently struck a deal with the German media company Axel Springer to train against their many media publications including the US news site Politico.
Here’s the Samsung commercial. About 10 years ago my group at Microsoft added real-time text translation to Skype - at least for a big on-stage demo with my CVP and CEO. However, it didn’t make it into the product until just a few years ago; we just couldn’t scale it at the time. What Samsung is showing is super impressive because you’re doing it on phone hardware not a computer and the performance has to be fast - like milliseconds fast - to sustain a voice conversation versus text.
This explanation of how OpenAIs webcrawler GPTBot works is what’s behind that Learn More hyperlink: GPTBot - OpenAI API
It’s the default of course and I’m certain Substack is or will figure out how to monetize it but that’s ok. Also, AI search is starting to show sources so wirepine shows up in results which is also bueno.
Check out last week’s article where I dig into the joy of playtime and why are all the computers so intent on beating us?!
I’m happy so far with my AI collaborators. I view the relationship sort of like a songwriting team: one individual writes the music; the other, the lyrics. On a more practical level, AI is my instant consulting librarian.
love it; nice analogy. Instant research for writing and I couldn't illustrate without it