The recent advances in generative artificial intelligence models, like MidJourney, and large language models, like chatGPT, are going to change the way we view information almost as comprehensively as the advent of the internet did.

These artificial intelligence models allow users to create emails, essays, code and art by giving them short textual prompts. However, there is a fundamental difference between these two technological leaps. While the early internet was a collaboration-oriented open space, the next few years of generative artificial intelligence are going to spell the end of free sharing of information and the community that it fosters.

When artists and writers put out work on the internet for free, they do it for a variety of reasons including publicity, portfolio building, community building, brand exposure and establishing their expertise.

For example, Brandon Sanderson, arguably one of the most successful fantasy writers today, has an entire novel published under the Creative Commons – CC 3 licence – so that fellow writers could provide feedback, and aspiring authors could learn from his process – all the while strengthening the community around his writing and sharing process tips.

The Creative Commons licences have enabled a host of educators, researchers, nonprofits and think tanks to share knowledge freely, publish and distribute materials while removing the monetary barriers to access. The Creative Commons licence allows users to share their work and lets other artists and audiences interact with it in ways that a creator defines.

Depending on the Creative Commons licence stated, creators can signal to their audience that the work can be used, reproduced or repurposed according to simple terms, enabling access to and sharing of resources which might previously have been tied up in complicated copyright regimes and corporate ownership.

A visual of the ChatGPT logo. Credit: via Pixabay.

In fact, one of the most important consequences of the proliferation of the internet has been the quick and easy access to quality information that has been brought on by people coming together to create a corpus of knowledge. Want to learn about a person, place or scientific phenomena? Wikipedia can probably get you started. Have a coding question? Stack Overflow has an answer. Want travel tips? Have obscure interests? Want to fix a car? Reddit and YouTube are your friends.

There is a long list of platforms today that cater to every informational need. They are available for use with nothing more than an email address, and sometimes, not even that. The best places on the internet are often ones focused on niche interests, bringing together experts and enthusiasts alike in the spirit of exploration and sharing. We live in a world where most of humanity’s knowledge is a click away.

There have, of course, been drawbacks to such easy access to tools of creation and dissemination of media: disinformation is rampant, polarisation of opinion online is obvious, and it is easier than it has ever been to lie to a large number of people. However, it is important to remember that these co-exist with the benefits that an open internet provides.

The creative commons was, and still is, a radical idea. No longer does an artist need to wait for the death of their idols and their creations to be out of copyright protection to be able to play with their works and publish their own takes on them. Derivative works, after all, are an important part of how culture progresses. We adapt cultural touchstones of the past to the current moment, ensuring their longevity and keeping a thriving connection to history.

If artists decide that they do want to share their creations publicly and they would like to interact with other artists, they should be allowed to do that without fear of “being scooped” on their own work.

That is a real danger in the age of artificial intelligence. For these artificial intelligence models to work and output realistic looking art and text, they need to be fed a large amount of pre-existing text and visual data. By “training” on datasets of millions of image and text data, these models learn to predict what the user entered paragraph may be expecting as an output.

The fact that living artists must use third-party tools to figure out if their art is in a dataset they did not consent to be added to in the first place is a gross violation of their creator rights. As a researcher and writer, I want people to be able to access and share research without restrictions. But, that does not mean I want my hard work to be used to train a model to confidently sound like me.

Living artists, writers, and other creators need to have legal and regulatory protections on how their work can be used, especially when they use creative commons licences to share their work and join a community that believes in the unfettered flow of ideas. The fact that people are working out in the open is not an invitation for corporations to hoover up content for the purposes of training a model.

Artwork displayed on the Midjourney website's showcase. Credit: Midjourney official website.

We urgently need interventions to protect the commons because it would not be surprising to see artists, writers and knowledge creators move back to a situation where all information is tightly locked down to protect it from being used indiscriminately for model training.

Companies (and researchers that create these models and datasets) need to be forced to comply with existing licences. Most creative commons licences are attributive, meaning that the work under these can be used and shared by citing the people who published it. Most large language models including chatGPT and generative art models like MidJourney or Stable Diffusion do not cite any of the works they train on, They also fail to cite any annotators or dataset curators who are an essential part of creating and updating the training data.

Currently, these generative models and the companies that create them, transfer the responsibility of making sure that the outputs are not copyrighted and do not “violate any laws” to users. But the process of creating outputs seems collaborative at best (and plagiarised at worst), so the process of identifying protected rights should also occur at the model level.

Unsurprisingly, the first steps of legal action against these generative models are now being taken: there is a class action lawsuit by artists in the US against the use of their art to train Stable Diffusion, a generative artificial intelligence model used to convert text to images. The same team is also suing GitHub, a platform for storing code, for its coding assistant called CoPilot over copyright concerns.

Stock photos company Getty Images is also suing the company behind the Stable Diffusion artificial intelligence art model over copyright violations. The coming years will end up defining how rights are divided and to what extent artists can exert control over their creations.

As these battles play out in courts worldwide, it is important that the benefits of an open commons that shares information and builds community are not lost due to the gatekeepers and understandable protectionist action by creators. After all, why would users continue to contribute to open communities if their contributions are only food for model training rather than community building? An open commons needs to be a community-first, creator-focused space, else further developments could just end up hurting creative endeavours in the long run.

Divyansha Sehgal is a technology researcher at The Centre for Internet and Society, and a Young Leaders in Technology Policy Fellow with the University of Chicago.