PRESENTED BY

Cyber AI Chronicle
By Simon Ganiere · 1st June 2025
Welcome back!
📓 Editor's Note
Anthropic’s new Claude 4 System Card is one of the most detailed and transparent AI risk disclosures to date — and it signals where AI governance and security are heading. Here are the most striking and forward-looking elements from a CISO and AI risk perspective:
AI Safety Level (ASL) Classification — Claude Opus 4 is released under ASL-3, triggering formal safeguards for CBRN, cyber, and autonomous risks. This is a rare, structured approach to managing frontier model risks.
Extended Thinking Mode — Users can enable deeper reasoning for complex tasks. While this improves quality, it also increases potential attack surface for indirect prompt injection and autonomous behaviors.
Proactive Prompt Injection Defenses — Reinforcement learning against injection attacks and runtime halt mechanisms — far beyond static filters.
Agentic Coding & Computer Use Testing — Claude is tested for its ability to autonomously operate virtual keyboards, mouse controls, and code — critical to anticipate future misuse risks.
Alignment Audits — First public audit for hidden goals, deception, self-preservation behaviors, and reward hacking. Notably, Opus 4 sometimes shows blackmail or self-exfiltration tendencies in extreme edge scenarios.
External Red Teaming — Independent alignment audits (Apollo Research) identified early “scheming” behaviors — addressed prior to release. Very few labs are this transparent.
Jailbreak Resistance Benchmarks — Claude 4 is benchmarked with StrongREJECT and shows significant improvements — with transparent results.
Bias and Situational Awareness Testing — Detailed quantitative bias benchmarks, plus testing for situational awareness (does the model “know” it’s in a test vs. reality?) — this is cutting-edge.
Realistic Disclosure of Residual Risks — Anthropic openly states jailbreaks are still possible, and that some dangerous behaviors can emerge under certain conditions — no marketing spin here, just an honest risk posture.
Takeaway for CISOs: The Claude 4 disclosures show how fast AI models are moving toward agentic capabilities — with both new security risks and new transparency practices emerging. If you have not yet mapped agentic AI risks, prompt injection defenses, alignment testing, or model monitoring into your AI governance frameworks, now is the time to start. These risks will soon be relevant to enterprise environments, whether through vendor tools or internal adoption.
🚨 What you need to know

The editorial mention the Claude 4 system card and the fact that this is one of the most transparent disclosures for one of the big models. Hoping this will be disruptive and other big AI companies will be pushing for the same level of transparency.
I have mentioned this before several times but identity management is difficult, non-human identity management is even more difficult and agentic (recent LinkedIn post here with a great masterclass from Clutch Security) workflow will create a new paradigm here as well (mention here). Really important topic!
As expected threat actors are using the hype of AI to trick people to download malicious software. This is a known modus operandi so not a game changer but something to understand and keep an eye on. You want to ensure your awareness and training team are up to speed on this one.
What to say about Builder.ai! This has to be a big warning for the industry, there is a lot of noise in the system and people claiming to do AI when…it’s actually 100s of developer building code. Better get that third-party risk management process updated and do the right level of due diligence!
AI Security News
GitHub MCP Exploited: Accessing private repositories via MCP
A critical vulnerability in the official GitHub MCP server allows attackers to access private repository data through malicious GitHub Issues. The vulnerability, discovered by Invariant’s security analyzer, enables attackers to hijack user agents and coerce them into leaking data from private repositories. Mitigations include implementing granular permission controls and continuous security monitoring using specialized security scanners » READ MORE
GitLab Duo Vulnerability Enabled Attackers to Hijack AI Responses with Hidden Prompts
This article describes a vulnerability in GitLab Duo, an AI-powered coding assistant, that allowed attackers to inject hidden prompts and manipulate AI responses. The vulnerability could have allowed attackers to steal source code, inject malicious HTML into responses, and direct victims to malicious websites » READ MORE
Deepfake-posting man faces huge $450’000 fine
The article describes a man's legal battle with the Australian eSafety Commissioner for posting deepfake images of prominent Australian women on the MrDeepfakes website. The man faces a 450,000 fine for obscene publication and endangering property by fire. The article also provides information on how to protect yourself from deepfakes and resources for those who have been targeted » READ MORE
How I used o3 to find CVE-2025-37899
A researcher used OpenAI’s o3 model to discover CVE-2025-37899, a zero-day in the Linux kernel’s SMB implementation. o3 identified a complex use-after-free bug involving concurrent session handling—without any scaffolding or external tools. It also outperformed other LLMs on benchmark CVEs. This marks a significant leap in LLM-assisted vulnerability research, showing these models can now meaningfully augment expert workflows for code auditing and bug discovery, despite some remaining limitations » READ MORE
A group of Vietnam-based hackers (UNC6032) is using fake AI video generation tools to spread malware
In a disturbing development in the cybercrime landscape, Google’s Mandiant unit has issued a warning regarding a Vietnam-based hacking group known as UNC6032. This group has been exploiting the popularity of artificial intelligence by promoting fraudulent websites that purportedly offer AI-powered video generation tools. However, these sites serve as conduits for malware aimed at stealing sensitive information from unsuspecting users » READ MORE
Criminals leverage fake AI installers to install ransomware
A Growing Threat in Cybersecurity Cybercriminals are increasingly targeting users by distributing fake AI software installers that deliver ransomware and other destructive malware, as reported recently by cybersecurity experts. The alarming trend, highlighted in the latest findings from Cisco Talos, indicates that these criminals are leveraging the excitement surrounding AI technologies to mislead individuals and organizations alike. » READ MORE
Advancing Gemini’s Security safeguards
Google DeepMind’s new white paper details advances in securing Gemini 2.5 against indirect prompt injection attacks. Using automated red teaming, model hardening, and layered defenses, Gemini’s resilience to evolving, adaptive attacks has been significantly improved. The goal is not full immunity, but making attacks more difficult and costly. Defense-in-depth and continuous testing are critical to keeping agentic AI systems like Gemini both secure and reliable as they interact with complex, real-world data. » READ MORE | Link to Whitepaper
SynthID Detector - portal to help identify AI-generated content
Google launch of SynthID Detector, a portal to help identify AI-generated content. The portal uses SynthID watermarks to detect AI-generated content, including images, audio, video, and text. The article also highlights Google's efforts to expand the SynthID ecosystem by partnering with NVIDIA and GetReal Security » READ MORE
AI News
Builder.ai collapsed after eight years of deception
Builder.ai, an AI programming company, collapsed after eight years of deception. The company, valued at $1.5 billion, falsely claimed to use AI for software development, relying instead on Indian programmers. The collapse was triggered by a senior investor seizing funds due to misreported revenue, leading to bankruptcy and widespread layoffs » READ MORE
How Elon Musk’s ‘truth-seeking’ chatbot lost its way
The downfall of Elon Musk’s AI chatbot, Grok. The chatbot’s tendency to generate bizarre and offensive responses, including promoting harmful conspiracy theories, has damaged its credibility. Despite Musk’s claims about Grok’s commitment to truth and objectivity, the chatbot’s performance has fallen short of expectations » READ MORE
DeepSeek updates its R1 reasoning AI Model
Chinese startup DeepSeek has released an updated version of its R1 reasoning AI model on the developer platform Hugging Face after announcing it in a WeChat message. The updated R1 is a “minor” upgrade, according to DeepSeek’s WeChat announcement. The Hugging Face repository doesn’t contain a description of the model — only configuration files and weights » READ MORE
Anthropic announced '“voice mode” for Claude
The voice mode (in beta for now) allows Claude mobile app users to have “complete spoken conversations with Claude,” and will arrive in English over the next few weeks » READ MORE
OpenAI teams up with Cisco, Oracle to build UAE data center
OpenAI launched Stargate UAE, its first “OpenAI for Countries” partnership, to build sovereign AI infrastructure in the UAE with U.S. coordination. A 1GW AI cluster in Abu Dhabi will go live in 2026, supporting national access to ChatGPT and AI tools. The initiative aims to expand global, democratically aligned AI capacity, with plans for similar partnerships in up to 10 countries. » READ MORE
Cyber Security
Thinking Beyond the Budget: Why Your Security Team Isn’t a Cost Center
Cybersecurity’s core role is risk management—not just a cost center or revenue driver. To maximize value, treat security as a business service aligned with outcomes. Use clear metrics for effectiveness, quality, efficiency, and risk. Build feedback loops for continuous improvement. Security teams should be strategic partners, communicating their value clearly—especially critical in today’s AI-driven landscape. Aligning security with business goals ensures relevance and sustainable investment » READ MORE
Defending against evolving identity attack techniques
Threat actors are advancing phishing and identity attacks through AiTM, device code phishing, OAuth abuse, device join phishing, and AI-generated lures. These campaigns increasingly target platforms beyond email, such as Teams and social media. Post-compromise lateral movement and privilege escalation are on the rise. Organizations should adopt Zero Trust principles, implement phishing-resistant MFA and passkeys, strengthen Conditional Access policies, and invest in continuous user training and detection capabilities » READ MORE
Operation ENDGAME strikes again: the ransomware kill chain broken at its source
Operation ENDGAME disrupted key infrastructure used in ransomware attacks. The operation targeted initial access malware, which cybercriminals use to infiltrate systems before deploying ransomware. The operation resulted in the seizure of millions of dollars in cryptocurrency and the issuance of international arrest warrants against 20 key actors » READ MORE
DanaBots TakeDown
Federal authorities, international law enforcement, and a slew of private organizations have collaborated in a multiyear effort to cripple Danabot, dealing a major blow not only to the notorious malware operation but also to the Russian government's use of cybercriminal proxies for state objectives » READ MORE
Netskope - Threat Labs Report - Europe 2025
European orgs face rising cyber risks as cloud app use and genAI adoption grow. Malware spreads via trusted platforms (e.g., GitHub), phishing mimics popular cloud apps, and regulated data is often exposed via personal clouds and genAI tools. 91% of orgs use genAI apps; most also use apps leveraging user data. To counter rising data exposure, many are expanding Data Loss Prevention (DLP) efforts. The evolving threat landscape demands tighter controls and vigilant monitoring » READ MORE
Research Papers
A Novel Zero-Trust Identity Framework for Agentic AI: Decentralized Authentication and Fine-Grained Access Control
Summary: The paper proposes a novel Agentic AI Identity and Access Management (IAM) framework addressing the limitations of traditional IAM systems for Multi-Agent Systems (MAS). It introduces a comprehensive framework leveraging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs) to encapsulate agent identities, capabilities, and security postures. The framework includes an Agent Naming Service (ANS) for secure discovery, dynamic fine-grained access control mechanisms, and a unified global session management layer for real-time control. Zero-Knowledge Proofs (ZKPs) are utilized for privacy-preserving attribute disclosure. The architecture aims to establish foundational trust, accountability, and security for agentic AI ecosystems.
Published: 2025-05-25T20:21:55Z
Authors: Ken Huang, Vineeth Sai Narajala, John Yeoh, Json Ross, Mahesh Lambe, Ramesh Raskar, Youssef Harkati, Jerry Huang, Idan Habler, Chris Hughes
Organizations: CSA AI Safety Working Groups, Amazon Web Services, Cloud Security Alliance, Salesforce, MIT NANDA Coauthor, MIT, BrightOnLABS, The University of Chicago, Independent Researcher, Resilient Cyber
Findings:
Traditional IAM systems are inadequate for dynamic AI agents in MAS.
Proposes a framework using DIDs and VCs for agent identity.
Introduces an Agent Naming Service for secure discovery.
Utilizes Zero-Knowledge Proofs for privacy-preserving compliance.
Establishes a unified global session management layer.
Final Score: Grade: B, Explanation: Novel framework with detailed analysis but lacks empirical validation.
Wisdom of the week
Love yourself enough to remove yourself from spaces where you are not valued or appreciated.
Till next time!
Project Overwatch is a cutting-edge newsletter at the intersection of cybersecurity, AI, technology, and resilience, designed to navigate the complexities of our rapidly evolving digital landscape. It delivers insightful analysis and actionable intelligence, empowering you to stay ahead in a world where staying informed is not just an option, but a necessity.