Generative artificial intelligence is fundamentally changing how visual content is created, edited, and perceived, making it crucial for users to develop new skills in visual literacy. A recent study highlights the need for critical thinking to navigate a world where photorealistic images can be generated from simple text prompts, often depicting events or objects that never existed.
As these technologies become more accessible, understanding their mechanics and limitations is essential for effective communication and for discerning the authenticity of visual information. This shift requires an evolution in education, moving beyond traditional writing skills to encompass a more comprehensive, or multimodal, literacy.
Key Takeaways
- Generative AI is disrupting long-held beliefs about the authenticity of images by creating photorealistic but entirely fictional content.
 - Modern literacy must be "multimodal," combining skills in text, visuals, and digital interaction to effectively use and critique AI outputs.
 - New research identifies key competencies needed at each stage of AI image generation, from selecting a tool to refining the final product.
 - Effective use of AI image tools requires specific prompts and an understanding of the technology's limitations, such as rendering text or specific cultural contexts.
 
The Evolving Nature of Literacy
For generations, education has focused on teaching children to express ideas through writing and drawing. These foundational skills allow individuals to build complex arguments and communicate abstract concepts. However, the rise of generative AI, which can produce novel content based on user commands, is reshaping these fundamental abilities.
The definition of literacy itself is expanding. While once limited to reading and writing, modern standards, such as those in the Australian Curriculum, define it as the ability to use language for learning and communication in various contexts. The European Union further broadens this to include navigating visual, audio, and digital materials.
From Text Commands to Multimodal Interfaces
Early computer interaction in the 1960s was based on written commands. By the 1970s, graphical user interfaces with icons and menus made computers more visual. Today's generative AI platforms often blend these two approaches. Tools like ChatGPT primarily rely on text prompts, while others like Adobe Firefly combine text commands with button-based controls for a more integrated experience.
This evolution demands what experts call multimodal literacies—a set of skills that spans across different modes of communication. Just as a person might use a different tone in a text message to a friend versus an email to an official, interacting with AI requires adapting communication to the specific tool and desired outcome.
AI's Challenge to Visual Authenticity
Historically, photographs were often perceived as a direct reflection of reality. While this view has been challenged over time, generative AI accelerates the disruption by making it simple to create highly realistic images of things that are not real. This capability fundamentally alters how we must approach visual information.
According to new research published in the Journal of Visual Literacy, understanding the AI image generation process is crucial for critically assessing its outputs. The study outlines essential literacies required at every step, from choosing a platform to creating and refining an image.
The Power of Prompts
AI systems often produce stereotypical or generic images when given vague prompts, such as a single word or an emoji. This is because the AI falls back on patterns in its vast training data. To achieve a more specific and envisioned result, users must provide detailed, descriptive prompts that guide the AI more precisely.
This dynamic underscores the need for users to not just be passive consumers but active co-creators with the technology. Knowing how to shape the AI's output is becoming as important as knowing how to write a clear sentence.
Key Competencies for the AI Era
Developing proficiency with generative AI involves several practical skills. These literacies go beyond simply typing a request and accepting the first result.
1. Selecting the Right Tool
One of the first decisions is choosing an AI image generator. The options vary significantly:
- Cost: Some systems are free, while others require a subscription.
 - Ethics: The datasets used to train AI models can raise ethical concerns. Some tools use ethically sourced data, while others may have been trained on copyrighted or problematic content.
 - Capabilities: Different platforms support different inputs. Some accept only text, while more advanced systems can process images, documents, and other file types.
 
2. Mastering Technical Specifications
Once a tool is selected, users must be able to articulate their needs. Many AI systems default to producing square images, which is suitable for platforms like Instagram. However, if a horizontal or vertical image is needed for a different purpose, the user must know how to specify that orientation through prompts or settings.
"Approaching visual generative AI with curiosity, but also critical thinking is the first step toward having the skills to use these technologies intentionally and effectively. Doing so can help us tell visual stories that carry human rather than machine values."
3. Understanding AI Limitations
Despite rapid advancements, AI still has notable weaknesses. For example, many image generators struggle to render legible text within an image, similar to how early systems had difficulty with human hands and ears. In such cases, a user might need to use a separate software like Canva or Adobe InDesign to add text after the image is generated.
Furthermore, AI-generated images often lack specific cultural context, making them appear generic or inauthentic to certain audiences. This can reduce their emotional impact and engagement.
Keeping Pace with a Rapidly Evolving Field
The landscape of generative AI is changing at an unprecedented speed. New products are launched regularly, and existing platforms are constantly updated. Earlier this year, OpenAI integrated its DALL-E image generator directly into ChatGPT, and TikTok released a tool to animate still photos.
Meanwhile, Google's Veo model is making cinematic video generation more accessible, and Midjourney has also introduced video output capabilities. The trend is moving toward integrated platforms where users can create and edit text, images, audio, and video within a single environment.
To build the necessary literacies for this future, researchers suggest starting with simple but critical questions:
- What is the primary message I want my audience to understand?
 - Is generative AI the most appropriate tool for creating this content?
 - What is the AI producing, and how can I actively shape its output?
 
By developing the skills to adapt, evaluate, and co-create with these powerful tools, individuals can ensure that technology is guided by human intention and values, rather than the other way around.





