PRESENTED BY

Cyber AI Chronicle
By Simon Ganiere · 8th September 2024
Welcome back!
Project Overwatch is a cutting-edge newsletter at the intersection of cybersecurity, AI, technology, and resilience, designed to navigate the complexities of our rapidly evolving digital landscape. It delivers insightful analysis and actionable intelligence, empowering you to stay ahead in a world where staying informed is not just an option, but a necessity.
Table of Contents
What I learned this week
TL;DR
AI model collapse threatens the integrity and effectiveness of AI systems including AI systems that support cyber security. Equip yourself with the right knowledge about this new challenge and what can be done » READ MORE
The threat landscape is as busy as always. To highlight only a couple of key things:
The U.S. Department of Justice (DoJ) seized 32 domains used by Russian-linked threat actor Doppelgänger to spread disinformation campaigns targeting the upcoming U.S. elections and international support for Ukraine; two Russian nationals were indicted for funding and spreading propaganda through a Tennessee-based company.
A critical-level Remote Code Execution (RCE) vulnerability (CVE-2024-70411) was found in Veeam’s Backup & Replication software, prompting immediate security updates; ransomware gangs like FIN7 and REvil are targeting these vulnerabilities to access and compromise enterprise data.
North Korean-backed threat actors exploited a Chromium zero-day vulnerability (CVE-2024-7971) to target cryptocurrency firms, using a rootkit for persistent kernel-level access and malware deployment; organizations are advised to patch vulnerabilities and enhance detection capabilities.
I briefly mentioned it last week but I decided to play a little bit with Cursor and oh boy I wasn’t disappointed! As I mentioned before i’m a developer by background, I did some coding back at university and some scripting over the years so I can read some code but it’s not like I can write a full application from scratch. The augmentation AI is bringing is just a game changer. You can ask questions, chase down bugs, ask for new features, etc. in a superfast manner. I build a new agent workflow to pull the latest list of vulnerability from the CISA KEV list and ask a couple of agents to pull more details about the CVE. You can find the results here. Let me know what you think. I might as well run this on a daily basis and send an email with the details if there is interest.
Would you like to receive an email context and references when a vulnerability is added to the CISA KEV list?
AI Model Collapse: A Growing Threat to Cybersecurity
As artificial intelligence becomes more common, a new problem called model collapse is emerging. This issue could weaken AI systems across all use-cases. In the cyber security context, as we use more AI in our digital defenses, it's important to understand and prevent model collapse to keep our security measures strong.
What is AI Model Collapse?
AI model collapse happens when AI systems are repeatedly trained using data that includes their own outputs. This creates a feedback loop where the AI's understanding of the world becomes more distorted over time, leading to a narrowing of its abilities and a shift away from reality.
Think of it like a game of telephone, where each player whispers a message to the next. With each round, the message becomes more jumbled. Similarly, as AI models learn from their own outputs, they amplify small errors and biases. Eventually, they produce results that don't resemble the original, human-generated data they were meant to copy.
A recent study in Nature showed this problem clearly. When an AI language model was repeatedly trained on its own output, by the ninth round, it was producing nonsense. For example, when asked about cooking a turkey, the AI's response went from sensible instructions to a jumbled list of unrelated words.
Why is Model Collapse Happening?
Several factors are increasing the risk of model collapse:
More AI-generated content: AI systems are creating a lot of content that ends up on the internet. Future AI models are likely to unknowingly train on this synthetic data. Sam Altman, CEO of OpenAI, says their models alone generate about 100 billion words per day – as much as a million novels.
Less new human-generated data: High-quality, diverse data created by humans is becoming harder to find. AI companies are running out of new, original content to train their models on.
Cost and speed: Using AI-generated data for training can be cheaper and faster than carefully selecting human-generated datasets. This tempts companies to take shortcuts.
Difficulty telling AI and human content apart: It's often hard to distinguish between content created by humans and by AI. This makes it challenging to filter out AI-generated data during training. A study by Amazon Web Services researchers found that about 57% of all web-based text has been generated or translated by AI.
Preventing AI Model Collapse
Addressing the challenge of model collapse requires several approaches:
Careful data selection: Prioritize using high-quality, diverse, and verifiable human-generated data for training AI models.
Better content tracking: Develop improved methods for distinguishing between human-generated and AI-generated content.
Regular retraining: Periodically retrain models on fresh, human-generated data to counteract drift. Experiments with language models found that keeping even 10% of original training data and using it for periodic retraining can significantly slow down model degradation.
Preserving diversity: Use techniques to ensure that rare but important data points are not lost during the training process. This is crucial as studies have shown that model collapse disproportionately affects minority data and rare events.
Openness and collaboration: Encourage sharing of methods and datasets within the AI community to prevent accidental use of synthetic data.
Ethical data collection: Establish clear guidelines for the ethical collection and use of human-generated data, respecting privacy and intellectual property rights.
What This Means for Cybersecurity
Model collapse could have serious effects on cybersecurity:
Weaker threat detection: AI systems used to identify cyber threats might become less effective at spotting new attack patterns. They might focus too much on a narrow set of known threats.
New vulnerabilities: As AI models used in cybersecurity tools get worse, they might introduce new weaknesses that attackers can exploit.
Less ability to adapt: Collapsed models might struggle to keep up with new types of cyber threats, leaving systems open to new attacks.
Biased risk assessments: AI used to analyze cybersecurity risks might overlook important vulnerabilities in less common systems or unusual network setups.
Easier to trick AI systems: Attackers could potentially use model collapse to manipulate AI-driven security systems, making them blind to certain types of attacks.
The Importance of Human Involvement in AI-Based Cybersecurity
Given the potential for model collapse and other AI limitations, human involvement remains crucial in cybersecurity. The idea of "human-in-the-loop" (HITL) is becoming more important in AI-based security tools for several reasons:
Oversight and checking: Humans can review and verify AI-generated alerts, reducing false alarms and making sure important threats aren't missed.
Understanding context: Human analysts can interpret AI outputs within broader contexts that machines might miss, such as current events or industry-specific trends.
Making ethical decisions: In sensitive situations, human judgment is essential for making ethical decisions that AI might struggle with.
Ongoing improvement: Human experts can provide feedback to improve AI models and help them adapt to new types of threats.
Creative problem-solving: Humans can come up with new solutions to unprecedented cybersecurity challenges that AI models might not think of.
Implementing HITL in cybersecurity means creating workflows where AI handles high-volume, repetitive tasks while escalating unusual or high-stakes situations to human experts. This combination of AI and human analysts can help reduce the risks of model collapse while using the strengths of both.
Measuring How Well Cybersecurity AI Systems Perform
To ensure that AI-based cybersecurity tools stay effective and to spot early signs of model collapse, it's important to regularly measure their performance:
Accuracy: Track false positive and false negative rates, precision, recall, and F1 scores for threat detection tasks.
Speed and efficiency: Measure how quickly the system detects and responds to threats, and how much data it can process.
Adaptability: Regularly test the system with new, unseen types of threats to assess how well it can generalize and adapt.
Diversity: Evaluate the model's performance across a wide range of scenarios, including unusual cases and rare events.
Agreement with humans: Compare AI decisions with those of human experts to check alignment and identify differences.
Bias checks: Regularly assess the model for biases in its decision-making processes.
Simulated attacks: Conduct mock attacks and penetration testing to evaluate the AI system's real-world effectiveness.
Long-term tracking: Monitor performance metrics over time to identify any decline that might indicate model collapse.
Conclusion
As AI plays a bigger role in cybersecurity, dealing with model collapse is becoming more important. By understanding this problem, using human-in-the-loop processes, and continuously measuring AI system performance, we can ensure that AI remains a useful tool in our digital defenses rather than a potential weakness.
The cybersecurity community needs to stay alert as threats continue to evolve. By constantly evaluating the performance of AI-driven security tools and adjusting strategies, we can address the potential impacts of model collapse. Combining the strengths of AI with human expertise, ongoing research, and careful performance monitoring will allow us to build stronger, more adaptable cybersecurity systems. This approach will help us make the most of AI's capabilities while protecting against its limitations, ensuring our digital systems and sensitive data remain secure in the face of tomorrow's challenges.
In this landscape, high-quality, human-generated data has become extremely valuable - it's the new gold. This shift puts organizations that can provide such data in a unique position. Data brokers, who collect, aggregate, and sell high-quality data, are becoming increasingly important in the AI ecosystem. However, this also makes them potential targets for cyber threats. We can expect to see more cyberattacks aimed at stealing or manipulating this precious resource. Data brokers will need to implement robust security measures to protect their valuable data assets. At the same time, companies using AI will need to carefully vet their data sources and perhaps invest more in generating their own high-quality data.
This new reality underscores the interconnected nature of AI development and cybersecurity. As we work to prevent AI model collapse, we must also strengthen our defenses around the data that feeds these models. The future of AI-driven cybersecurity doesn't just depend on better algorithms - it relies on protecting and ethically sourcing the human-generated data that keeps these systems grounded in reality.
Worth a full read
Under Meredith Whittaker, Signal is out to prove surveillance capitalism wrong
Key Takeaway
Signal’s growth reflects a growing demand for privacy-focused alternatives to Big Tech.
Surveillance capitalism is not the only viable model for technology's future.
Signal's nonprofit status is essential for maintaining its privacy-first approach.
AI's dependence on surveillance data links it intrinsically to privacy concerns.
Shifting political environments necessitate jurisdictional flexibility for privacy tech.
Regulation and structural separation can mitigate Big Tech’s dominance.
Signal’s success demonstrates the feasibility of privacy-centric tech without corporate backing.
Public demand for alternatives to Big Tech is increasing due to privacy concerns.
Building independent, privacy-focused tech requires significant capital and support.
A heterogeneous tech landscape with multiple privacy-preserving options is crucial.
The rise of the machines. Potential application of AI agents in offensive and defensive cybersecurity
Key Takeaway
Multi-agent systems enhance cybersecurity by automating complex, collaborative tasks.
AI agents' lack of legal recognition complicates accountability for their actions.
AI integration may shift cybersecurity workforce dynamics over time.
Differentiating language models per agent can reduce operational costs.
Multi-agent AI systems promise cost-effective, automated cybersecurity measures.
AI’s rapid advancement necessitates adaptive strategies for workforce transitions.
Ethical considerations and rigorous testing are crucial for AI deployment.
Combining GPT-4 for user interaction and GPT-3.5 for agent interaction can be cost-efficient.
Poorly constructed systems can lead to expensive agent discussion loops.
The next five years will see dramatic changes in cybersecurity and IT.
Business Email Compromise Guide (BEC)
Key Takeaway
Suspicious logins and permission changes indicate BEC actor persistence.
Unified Audit Log (UAL) is critical for centralized Office 365 event tracking.
Threat intelligence supports understanding tradecraft and identifying phishing emails.
Simple mitigations like MFA and blocking mail forwarding can prevent BEC.
Phishing emails often create urgency using generic subject lines.
BEC threat actors frequently spoof well-known brands and services.
Forwarding rules are a common tactic for maintaining persistence.
Brute force and credential stuffing are typical methods for account access.
OAuth abuse allows access without user credentials.
Evasion techniques include purging emails and disabling audit logs.
Wisdom of the week
Success lies in relentless execution of the basics.
Contact
Let me know if you have any feedback or any topics you want me to cover. You can ping me on LinkedIn or on Twitter/X. I’ll do my best to reply promptly!
Thanks! see you next week! Simon

