AI chatbots see what's on the public Web, not what's on social media
In this post, I will explain how AI chatbots find content online, and why it is important for all businesses today to have a presence on the traditional Web, not just on social media.
The short version of this post is this: AI chatbots are “trained” on data that is publicly available on the Web. Information that requires a login to see or that is presented in complex ways (such as dynamic pages) is generally not visible to a chatbot or only partly visible. In other words, most of the content on social media is not visible at all to AI chatbots.
If you want an AI chatbot to be able to find your business and provide detailed information of your business to its users, you need to have a presence on the Web, outside of social media networks.
Note: This post makes no judgment on whether AI is good or bad. This post simply aims to inform about how AI works and how that impacts businesses. In the end, knowledge is your best tool to handle whatever this and other technologies might bring.
Table of Contents
In the beginning, there was the Web
What is “the Web,” anyway?
First, a distinction. The Internet and the Web are not the same thing.
The Internet is the physical computer network which all devices that are “online” connect to in order to communicate with each other all over the world.
The Web (or the World Wide Web) is an information framework that is made possible by the Internet. In short, the Internet is the computer servers and the cables and wireless connections that link everything together. The Web is the content we can see on the Internet.
The content we see on the Web is largely built using HTML, a language for structuring data to present it on a web browser, the software that you use to see web sites and web pages.
Then social networks closed off the Web
In the early 2000s, as huge advances were made in Web technology, user-generated content became possible.
One of the pioneers of user-generated content was YouTube, founded in 2005.
Though we take uploading videos for granted today, YouTube was an instant hit back in 2005 because it was one of the first platforms to allow users to upload their own videos. The videos didn’t have to be educational or “important.” You could just upload anything you wanted (within YT’s guidelines).
Around the same time as the rise of YouTube, another site focused on user-generated content was gaining huge popularity as well, Myspace, founded in 2003.
Myspace was one of the first so-called “social networks” to become a big hit. Other social networks that were starting to gain popularity back then were Facebook (2007) and Twitter (2006), now known as X.
Social networks were among the first web sites to start the trend of requiring users to log just to view their content.1 Many sites already had login features, but it was mostly only required to upload content or do some kind of operation on the site.
Generally, before social networks, most data on the Web was publicly visible to anyone whether they had an account on a specific site or not.
Because some of these social networks required a login just to view their content, these sites effectively closed off parts of the Web to people who weren’t registered with them.
1To be fair, Twitter’s content remained largely public for many years. It was Facebook in particular that really brought the closed-off dynamic to social networks, since it started as an invitation-only site for college students.
The closed-off, segmented Web grew and prospered until...
The closed-off Web created by social media continued to grow and expand especially since the introduction of the first iPhone in 2007.
Now, web sites had to adapt to mobile screens and to a new form of content delivery, smartphone apps.
No longer was the Web exclusively accessible through web browsers. Now, smartphone apps also provided access to the Web, often to a specific segment of the Web that they controlled.
This brought along developments in Web technology that transformed the classical way of delivering web content into a new way that largely obscured the classic, straightforward, highly readable format of HTML.
This dynamic Web is largely driven by JavaScript, a programming language that allows dynamically fetching data and updating parts of a web page on demand instead of having to refresh an entire page for every small change.
Because of the way JavaScript is used, web pages are often not stored as web pages at all. Many dynamic web sites are made up of scripts (programming code that does specific actions) that serve HTML in pieces and chunks to build and rebuild pages and parts of pages in response to user input.
As a result, automated tools, such as Google’s algorithms that index the Web to create its search results, find the dynamic Web much harder to read than the old, classic Web was. Without clear indicators of a page’s metadata (information that describes a page’s content), automated tools often can’t get an accurate idea of the content they find on the Web.
...AI chatbots changed everything
In 2022, OpenAI unleashed ChatGPT upon the world, ushering in a new era in how the Web is seen and consumed, the era of generative artificial intelligence.
Artificial intelligence (AI) is a field of Computer Science that deals with the computational problem of how to get computers to think like humans do. Classical AI aspires to develop algorithms that truly replicate human-like thinking in computers.
Generative AI has not achieved that classical goal of creating computers that think. Instead, gen AI uses a technique called machine learning to “train” computers on incredible amounts of data to create “models.” These models can then be loaded into other computers where users can ask them to perform tasks.
When asked to perform a task, these models are able to “infer” what might be the correct answer. Gen AI algorithms guess the answers to whatever they are asked based on their training data and statistical probability.
Large-language models (LLMs) are an application of gen AI algorithms to human language. LLMs have been trained on incredible amounts of data with the specific purpose of creating machines that can speak like humans do. These LLMs are often used to power AI chatbots like ChatGPT, which are tools that give users the ability to interact with an LLM by chatting with it in a natural way.
And the incredible amount of data that is used to train these LLMs is… the Web.
But the new Web that social media helped create is a closed-off and segmented environment. During the training of LLMs, a lot of the content locked behind social media is not visible to them. As a result, the chatbots powered by these LLMs have almost no knowledge of all the user-generated content inside these social networks.
As adoption of chatbots continues to grow, many users today are opting to use chatbots to ask things instead of visiting web sites, scrolling on social networks, or searching on Google.
But these chatbots don’t know about all the content locked away behind social networks, so they only present their users their version of the Web based on the training they received.
If your business's entire web presence is on social media, AI can't see you
As social networks grew in popularity, many businesses that traditionally would have had a web site or at least a public blog started moving their web presence to social networks.
But AI chatbots can’t fully access these social networks.
This means that, if a user asks a chatbot about something that might be relevant to your business, and the AI’s training never picked up information about your business’s social media presence, your business doesn’t exist in the chatbot’s LLM.
Chatbots are now equipped with the ability to search the Web on the spot. This is meant to allow chatbots to access new information that has become available after their training. However, even with these web searches, chatbots generally can’t see deep into social networks, so your business still can’t be found if it only lives on social media.
In order to be seen by AI chatbots and the users that are employing them, you need to have a presence on the public Web. You need to have a place on the Web where you have a simple public page with clear, simple information about your business in a simple format (not dynamically assembled by JavaScript).
In other words, chatbots work well with pages that resemble the classic Web, with neat HTML and simple, straightforward information. The chatbot era is promoting a return to a simpler public Web.
AI chatbots certainly create their own version of the Web (and that has its negative implications), but that version of the Web is built on publicly available data, not closed-off data.
So if you want AI chatbots to serve your business information to their users when they ask something relevant to your business, you have to make sure that your business details make it into the version of the Web served by these chatbots.
If you're not on the public Web, you need to get your business info out there
To get your business seen by AI chatbots, you need to get your business info out into the public Web.
That does not mean that you should abandon your social media strategy. It means that you need to get information in the public Web that leads potential customers into your social network accounts.
You can get a web site, start a public blog, get featured in articles in other web sites. Anything that you can get out into the public Web about your business will eventually be picked up by AI chatbots.
Be sure to craft a simple, straightforward description of your business that is displayed plainly on screen. Remember, you’re not selling your product or service to the AI chatbot, you’re informing the chatbot about your product or service so the chatbot can in turn inform its user.
Here’s an example for an imaginary bakery called “Cakes 1001”:
“Cakes 1001 is a premium bakery located in San Diego, California. It specializes in reimagining classic American bakes with a hipster twist. Its signature product is the avocado baked Alaska, which has been featured in numerous local food blogs after becoming a viral hit. Aside from its store lineup, Cakes 1001 also offers catering services, including to corporate clients. You can find out more about Cakes 1001 on their Instagram account: @[handle].”
As you can see in the example, the information provided is simple and straightforward. No hard selling or promotions. Just a plain description that is very close to the way an AI chatbot is likely to talk about your business to its user.
Get AI chatbots to work for you
By getting your business information out into the public Web, you’ll get chatbots to pick up information about your business, and you’ll get chatbots to work for you by presenting your business in the way that you want it to be presented.
Don’t become invisible in the era of AI. Get your business information into the public Web and ride the wave of AI chatbot marketing.
I wish you much success out there!