After Reboot, A Redux

After Reboot, A Redux

After Reboot, A Redux
Facebook
Email
Since it’s been a while, I will summarize key takeaways from the previous posts. This will set the stage for future posts, and, for the new readers, it will give a taste of what’s coming.
In the Typology of Hype , I deconstructed conversations around GPT-3 when it was first released to help contextualize the buzz from its release. While many, including OpenAI’s PR efforts, were hype-y in nature, I cautioned the readers not to throw the baby with the bathwater. I wrote, “Self-supervision will change all of Artificial Intelligence in the future.” (Aug 2020). Two years later, this is increasingly true.
GPT-3 marks the beginning of a Cambrian explosion of few-shot learning products, but that era will not be limited to or dominated by GPT-3 alone. We will see few-shot learning capabilities beyond the written text. Imagine the possibilities of doing few-shot learning from images, videos, speech, time series, and multi-modal data. All this will happen in the early part of this decade, resulting in a proliferation of machine learning in more aspects of our lives. This proliferation will raise the urgency of working on bias, fairness, explainability, and transparency of ML models. So will the importance of working on fighting adversarial applications of ML models. (Rao, 2020)
In a recently invited panel discussion, I described self-supervision as a “drug we cannot get enough of”. Further, I am betting that building purely supervised models will be a thing of the past. I will have a lot to say about this, but if this position rattles you, I want to hear from you!
While discussing hype, capabilities, and the mysticism surrounding early few/zero-shot models, I discussed how meditating on the nature of very large models like GPT-3 forces us to examine what we mean by automation itself from the perspective of AI models in the Question of Automation post. AI-driven automation is unlike factory automation. I wrote:
AI automation is not a dualistic experience. One of the dangers of the hype overattributing capabilities of a system is that we lose sight of the fact that automation is a continuum instead of a discrete state. In addition to stoking irrational fears about automation, this kind of thinking also throws out of the window any exciting partial-automation possibilities (and products) that lie on the spectrum. (Rao, 2020)
In addition to an Automation taxonomy, I introduce ideas of “capability surfaces” and “task entropy” for these Very Large Parameter models.
To avoid the promise of large AI models devolving into hype, talking about their capability surfaces is extremely helpful. In simple terms, the capability surface of a model is the simplex of automation levels afforded by the model across a wide range of application domains.
In Riding The Hardware Lottery , I discussed the general nature of large-scale systems. Like other large-scale systems, the capabilities of Very Large Parameter (VLP) models are emergent.
Thanks for reading AI Research & Strategy! Subscribe for free to receive new posts and support my work.
Subscribe
We already see some of this in VLP models like GPT-3, where the model can “solve” several unseen problems in natural language or other domains after seeing only a few examples (so-called “zero-shot” / “few-shot” generalization). But we don’t understand how or why that happens. Studying this emergent reality should be the foremost preoccupation for anyone working on VLP models. (Rao, 2020)
Finally, in what appears to be the only contrarian viewpoint to the Hardware Lottery paper to date, I suggested:
[R]esource lotteries are inevitable, and we are better served by focusing on answering interesting questions posed by current realities than an imagined future. In trying to create a uniform exploration of idea spaces divorced from economic/practical realities (to “avoid the hardware lottery”), we would be missing out on exciting research opportunities by shunting works simply because they don’t fit our current understanding of how the human brain works or is capable of.
In particular, one has to keep in mind that not all big models are alike. Very Large Parameter models are uniquely attractive because they add more capabilities to the model in ways we don’t understand today. (Rao, 2020)
Something truly remarkable is happening in the “zone of interesting” that we will never see in small parameter regimes.
The next post will dive deeper into one of the critical aspects of the Zone of Interesting: EMERGENCE.
Thanks for reading AI Research & Strategy! Subscribe for free to receive new posts and support my work.
Subscribe

Images Powered by Shutterstock