What if ChatGPT was much more than a chatbox? What if LLM-as-a-service was a search engine?

The Digital Services Act (DSA) is certainly one of the most interesting pieces of legislation adopted by the European Unions in the last few years. Just like the old e-commerce Directive (ECD), the DSA targets providers of intermediary services (i.e., the mere conduit, caching and hosting providers of this world) who continue to enjoy conditional liability exemptions. But the DSA goes further than the ECD: it includes a set of rules expressly targeting online platforms and search engines.

Could ChatGPT ever be governed by the DSA?

With ChatGPT’s release, some have argued that the DSA has a loophole, even if it would appear better suited than the Artificial Intelligence Act to regulate large language models (LLMs).

Can we be creative with the DSA though? Would there be a way to impose obligations on providers of services consisting in the making available to the public of LLMs such as ChatGPT, i.e., LLM-as-a-service (LLMaaS)?

LLMaaS is hard to frame as an intermediary service.

Intermediary services are defined in Article 3(g) of the DSA. One big difference between the ECD and the DSA is the definition of intermediary services in Article 3(g), which appears before the liability exemptions. What this could mean is that the category of intermediary services is broader than the category of intermediary services benefitting from the conditional liability exemptions set forth in Articles 4,5 and 6.

Still, the definition of Article 3(g) seems to comprise an exhaustive list. What is interesting, however, is that contrary to the definition of online platforms, which expressly mentions that they are hosting providers, the definition of search engines is less prolix:

“‘online search engine’ means an intermediary service that allows users to input queries in order to perform searches of, in principle, all websites, or all websites in a particular language, on the basis of a query on any subject in the form of a keyword, voice request, phrase or other input, and returns results in any format in which information related to the requested content can be found”

Does it matter that search engines do not perfectly fit within the definition of caching or hosting?

Probably not, as these definitions are malleable as demonstrated by the ECD Case Law [which could explain the silence of the drafters].

Although the category of hosting services is probably the closest one to LLMaaS, it is not a perfect fit either. With LLMaaS, the stored information, i.e., the model output, is not [strictly speaking] provided [could provided also mean triggered?] by the recipient of the service, although its storage is performed at the request of the recipient of the service.

With this said, the model input is provided by the recipient of the service and stored at the request of the recipient of the service. Would considering the model input sufficient to make LLMaaS a hosting service? It’s probably not the best argument when the primary concern relates to the handling of the model output, but is it good enough?

Assuming LLMaaS were to be considered hosting providers they would not be able to benefit from the liability exemption set forth in Article 6 as “the recipient of the service is acting under the authority or the control of the provider.” [Interestingly, the obligations related to notice and action and statement of reasons are not expressly made conditional upon the benefit of liability exemptions.]

But does it matter? Not necessarily.

Reading Recital 28, one could come up with is an argument that search engines are not necessarily or not always caching nor hosting providers, although they are intermediary services. Recital 28 states that "services establishing and facilitating the underlying logical architecture and proper functioning of the internet can also benefit from the exemptions from liability set out in this Regulation, to the extent that their services qualify as ‘mere conduit’, ‘caching’ or ‘hosting’ services." They “include, as the case may be, wireless local area networks, domain name system (DNS) services, top-level domain name registries, registrars, certificate authorities that issue digital certificates, virtual private networks, online search engines, cloud infrastructure services, or content delivery networks, that enable, locate or improve the functions of other providers of intermediary services.” Recital 29 when giving generic examples of ‘caching’ intermediary services does not expressly mention search engines but refers to "the sole provision of content delivery networks, reverse proxies or content adaptation proxies.” There is thus potentially a whole family of search engines.

Now what is the difference between LLMaaS and search engines? Is there any meaningful difference? What if the difference was not significant? What if the analogy was legitimate, even if LLMaaS could perform other sub-services or alternative services? Is the distinction between Business-2-Consumer services and Business-2-Business services relevant here?

Just like very large search engines, very large LLMaaS may “cause societal risks, different in scope and impact from those caused by smaller platforms.” Providers of very large LLMaaS “should therefore bear the highest standard of due diligence obligations, proportionate to their societal impact. Once the number of active recipients of an online platform or of active recipients of an online search engine, calculated as an average over a period of six months, reaches a significant share of the Union population, the systemic risks the online platform or online search engine poses may have a disproportionate impact in the Union.”

Very large LLMaaS “can be used in a way that strongly influences safety online, the shaping of public opinion and discourse, as well as online trade. The way they design their services is generally optimised to benefit their often advertising-driven business models and can cause societal concerns. Effective regulation and enforcement is necessary in order to effectively identify and mitigate the risks and the societal and economic harm that may arise.”

Very large LLMaaS “should therefore assess the systemic risks stemming from the design, functioning and use of their services, as well as from potential misuses by the recipients of the service, and should take appropriate mitigating measures in observance of fundamental rights. In determining the significance of potential negative effects and impacts, providers should consider the severity of the potential impact and the probability of all such systemic risks. For example, they could assess whether the potential negative impact can affect a large number of persons, its potential irreversibility, or how difficult it is to remedy and restore the situation prevailing prior to the potential impact.”

Drawing an analogy between LLMaaS and search engines would subject very large LLMaaS to risk assessment, risk mitigation and data access and scrutiny requirements to start with (see DSA section 5).

And what about ChatGPT?

As mentioned here and here :

13 million individual active users visited ChatGPT per day as of January 2023.
ChatGPT crossed the 100 million users milestone in January 2023.
In the first month of its launch, ChatGPT had more than 57 million monthly users.
ChatGPT reached an estimated 123 million monthly active users less than three months after launch

DSA Article 33 states that “This Section shall apply to online platforms and online search engines which have a number of average monthly active recipients of the service in the Union equal to or higher than 45 million.” We’ll certainly reach this threshold by 2024 [unless Data Protection Supervisory Authorities shoot faster than their shadows as Joe Dalton would say.]

What next?