Artificial intelligence-generated code is now a common tool for developers in various tech fields. These digital aids streamline the process of writing extensive code. However, experts warn that this convenience introduces new security risks and underscores the necessity for human oversight.
While many developers appreciate how AI reduces the heavy lifting of coding, experienced professionals are increasingly identifying flaws at an alarming rate. In July, the security testing firm Veracode released research based on over 100 large language model (LLM) AI tools, revealing that although AI produces functional code at impressive speeds, it also opens doors to cyber vulnerabilities. The report indicated that 45% of code samples failed security tests, exposing weaknesses highlighted by the Open Worldwide Application Security Project.
Veracode’s findings serve as a “wake-up call” for developers, security professionals, and anyone relying on AI for speed. Some experts aren’t surprised by the high incidence of security flaws, given AI’s current limitations in coding. Kirk Sigmon, a programmer and partner at the intellectual property law firm Banner Witcoff, remarked, “I’m surprised the percentage isn’t higher. Even when AI-generated code functions, it often contains logical flaws due to a lack of context.”
Cybersecurity researcher Harshvardhan Chunawala, who previously worked on the Iris Lunar Rover, likened AI code creation to house construction. “It’s like having AI draft a quick blueprint for a house, but it might include doors that don’t lock or unsafe wiring. With AI now involved in critical digital infrastructure, it’s not just drafting blueprints but also ordering materials and starting construction before inspections.”
“A human architect must verify every detail before the ‘house’ is safe,” Chunawala added. Sigmon, who has extensive experience with AI and machine learning, shared a recent example illustrating AI’s limitations. “I was assisting a friend with a space-themed website and asked an LLM for code to create CSS3-friendly panoramic stars. The result was underwhelming; the stars ended up clustered in one corner and strobing rather than twinkling.”
He noted that AI-generated code fosters poor habits that could affect the industry’s future. “Overall code quality has declined significantly. On the academic side, if students use AI to generate their homework, they miss out on learning good coding practices.”
Hallucinating Code
Sigmon learned coding through trial and error, a method he believes is being undermined by the ease of AI-generated code. “New graduates are entering the workforce and creating unreliable code, leading to a decline in the quality of many programs,” he said. This has resulted in modern codebases that are often incomprehensible. “I used to understand a fellow coder’s intent; now it often gives me a headache.”
James, a coder and former web content manager, echoed Sigmon’s concerns. “You need to be cautious with edits; you can’t fully trust AI code. The complexity of a project amplifies AI’s ‘hallucinations.’” When AI misinterprets patterns or objects, it can produce outputs that are illogical or incorrect, a phenomenon James describes as infuriating.
“You can make significant progress before realizing there’s a mistake caused by AI’s hallucination,” he noted. A 2024 study found that LLMs had a hallucination rate ranging from 69% to 88% in response to specific legal queries. Research from Stanford RegLab and the Institute for Human-Centered AI indicated that LLM performance declines with more complex tasks requiring nuanced understanding.
In a comparison of major LLM products—Claude, Gemini, and ChatGPT—Claude exhibited the lowest hallucination rate at about 17%. Adding to the complexity, James noted that AI sometimes doubles down on its errors. “While developing a role-playing combat app, I wanted to extract a name from the first file, but the AI kept losing it and trying to pull in data from elsewhere. Even when I pointed out the error, the AI refused to acknowledge it.”
This pattern of behavior is something James has encountered across multiple AI tools in the LLM category.
