0PRESENTED BY

Cyber AI Chronicle

By Simon Ganiere · 19^th May 2024

Welcome back!

Project Overwatch is a cutting-edge newsletter at the intersection of cybersecurity, AI, technology, and resilience, designed to navigate the complexities of our rapidly evolving digital landscape. It delivers insightful analysis and actionable intelligence, empowering you to stay ahead in a world where staying informed is not just an option, but a necessity.

What I learned this week
- TL;DR
- Transform Content Overload with AI: Your Step-by-S …
Worth a full read
- Podcast: Investigation Into Zero-Day Exploitation …
  - Key Takeaway
- Southeast Asian scam syndicates, stealing $64 bill …
  - Key Takeaway
Some more reading
Wisdom of the week
Contact

What I learned this week

TL;DR

I spent this week playing with some code and creating some basic level of automation leveraging AI. Not even trying to be super innovative or creating brand new things, this is about learning and understanding how the tools can be used. The objective? Solve part of one of my biggest problem: how to triage content and focus on reading what matters vs noise. I read a few research papers from arXiv and that gave me some ideas: Step 1: download research paper based on keywords. Step 2: analyse and rate the research paper. Step 3: save the result to a database. Step 4: Build a web interface to review and select the article to read. This one is going to read a little bit like a tutorial but it will also show you the power of AI tools and how you can achieve quick results. You’ll also get the key prompts I have used! If you are not a coder this is 100% for you!
Significant announcements in the AI world this week. From OpenAI GPT-4o announcement, which is a multi-modal model that covers text, voice, and vision and provided for free for everyone. Check the videos on their YouTube channel for some impressive examples - the realtime translation one is a bit mind blowing 🤯 . They have also released a Desktop app for Mac OS (Windows to follow). Right after this, one of the co-founder of OpenAI also announced he quit.
Google I/O 2024 was running the day after OpenAI. A summary can be found here from Google and this one is also a good one.. Gemini is being integrated everywhere and the new context windows for Gemini Flash and Pro has reached 2 millions! That’s a lot of data as in input, the prompt and context you can use will be insane!
I did mention this before but the cyber world is going through consolidation. This is not unexpected at all based on the current cycle. The fact that this week show two announcement related to the SIEM market is not a surprise either. Palo Alto announced strategic partnership with IBM and acquired QRadar SaaS assets. LogRythm and Exabeam decided to merge. This on the back of the $28b acquisition of Splunk by CISCO in March 2024. This trend will continue for sure.

Transform Content Overload with AI: Your Step-by-Step Guide

As I mention in the introduction, one of my biggest problems is to go through the whole list of content from tweet, RSS feeds, news articles, podcast, YouTube, etc. I have a Feedly account, one week = more than a 1’000 articles and my list of feeds is not even huge. It’s like impossible to triage all of this. I decided to take a stab at this problem. My objective was the following:

Step 1: Search and download for research paper on arXiv based on keyword
Step 2: Analyse the research paper leveraging specific prompt in order to have a summary, study details, quality, findings, conflict of interest, summary statement and a final score.
Step 3: Save the output of the prompt in a simple database.
Step 4: Build a basic web front end to query the data and identify quickly the research paper I should read fully.
Bonus (not done yet): get a summary and in a format where I can add 3 research paper to this newsletter with minimum work 😄

I used to do a lot of more coding many years ago, I have not coded a proper project in a while but I can still read and understand some code without too much trouble. So, not going to hide it I did leverage ChatGPT a lot to get this work! Also leveraging some very good insights from https://learnprompting.org, I decided to break this down in pieces and go step by step.

Again nothing really new here, and I’m sure you can find something equivalent on the internet without creating a single line of code but that’s not the point, the point is to explore and learn.

Step 1: Use the API to get the research paper

First stop, let’s get the documentation for the arXiv API and ask the following question to ChatGPT:

I have the following script:

import urllib, urllib.request

url = 'http://export.arxiv.org/api/query?search_query=all:AI+AND+all:cyber&start=0&max_results=10'
data = urllib.request.urlopen(url)
print(data.read().decode('utf-8'))

the output of the api include a field with a PDF document. It looks like this:

    <link title="pdf" href="http://arxiv.org/pdf/2402.11082v1" rel="related" type="application/pdf"/>

I need to add to my script the ability to go through all of the results, find that field and download the PDF file in a folder name research_report

That gave me a basic script to work with. What I realised then was that not all research paper has a PDF file attached. So asked ChatGPT to add error handling:

can you add error handling in case the PDF file is not available or if there is an error message from the server

I then need to add the ability to change the keyword for the search. As I have provided the full path in the first prompt, ChatGPT understood what I wanted to do. I used the following prompt - note as well that i’m telling it not to change everything as this is a new feature not a “let’s redo the entire script” all together and well a bit of positive feedback never hurt:

keep the overall script the exact same as its working very well. I just want to add search parameter in the command line. I must be able to add as many keywords as possible in the command line and then update the url to the arXiv API

Step 2: Analyse the content of the research paper

So next, I need to get the content of the PDF file in order to send it to chatGPT, run it through the prompt and get the output. Same approach, let’s go step-by-step and let’s read the content of the PDF files:

ok perfect, now I need to create a second script that will go read the PDF files in the research_report folder and output the content on screen

I had to tweak a few things to make this work but nothing major. I now have the full PDF content in a single variable. I want to send that content to ChatGPT with a specific prompt. Nothing better than Daniel Miessler Fabric project to do that. I literally give ChatGPT a piece of code of Fabric and ask for it to be integrated with the code to read the PDF:

so now I want to integrate the below code. it's a copy paste from an API source code. I don't want to use the full API but basically integrate the functionality in the PDF reader script. I need to pass the content of the pdf (variable pdf_text) into the code below and get the output
@app.route("/extwis", methods=["POST"])
@auth_required  # Require authentication
def extwis():
    data = request.get_json()

    # Warn if there's no input
    if "input" not in data:
        return jsonify({"error": "Missing input parameter"}), 400

    # Get data from client
    input_data = data["input"]

    # Set the system and user URLs
    system_url = "https://raw.githubusercontent.com/danielmiessler/fabric/main/patterns/extract_wisdom/system.md"
    user_url = "https://raw.githubusercontent.com/danielmiessler/fabric/main/patterns/extract_wisdom/user.md"

    # Fetch the prompt content
    system_content = fetch_content_from_url(system_url)
    user_file_content = fetch_content_from_url(user_url)

    # Build the API call
    system_message = {"role": "system", "content": system_content}
    user_message = {"role": "user", "content": user_file_content + "\n" + input_data}
    messages = [system_message, user_message]
    try:
        response = openai.chat.completions.create(
            model="gpt-4-1106-preview",
            messages=messages,
            temperature=0.0,
            top_p=1,
            frequency_penalty=0.1,
            presence_penalty=0.1,
        )
        assistant_message = response.choices[0].message.content
        return jsonify({"response": assistant_message})
    except Exception as e:
        return jsonify({"error": str(e)}), 500

I got what I needed but I had to change a few things. As I wanted the output to be saved into a database, I went for the option to get this done in json so I rewrote the prompt from Fabric to get it changed - the only change is about the output and move to JSON rather than markdown. Do note I asked ChatGPT to do that for me to keep it simple. You can find the prompt here, this also allow me to change the prompt without touching the code of the script.

Step 3: Database support

I then ask ChatGPT to add support to save the JSON output to a sqlite database:

Can you add to the script the ability to save the json file into a sqlite database

Initially this script just dumped the entire json message in the database so had to ask to get the field “unpacked”:

can you map the json file fields into the sqlite database directly. so not dumping the json file in one go but creating database fields like SUMMARY, AUTHORS, FINDINGS, etc.

I’ll spare you some additional back and forth to get the json format right and the right value in the database but overall worked pretty well.

Now that I can keep track of what was downloaded, I also added a few checks, the obvious one is not to over-consume the API to OpenAI. So I have asked ChatGPT to get a few checks to not download the research paper twice:

I now want to add a new feature. The feature is that before downloading the file, i want to check the sqlite database which is in the same folder and named research_results.db if there is a file name the same way. If yes then skip the file, if not then download the file. In the database the table is named: analysis_results and there is a field name filename

Step 4: Build a Web GUI

Last steps, get a web interface, which turns out to be a Flask application. Had to add a second prompt to get Bootstrap for a slightly better web GUI.

can you build a web front end to display the content of the database and query it with basic search field?

can you add a bootstrap template to the page looks a bit more design

Conclusion

The GUI then looks like this, again not amazing but effective enough to have the job done. From there I can filter by keyword and rating and identify the top articles.

A few lesson’s learned from this exercise:

Establish your plan and break down step by step
Ensure to give the prompt to build the app with the right context and examples. Do not hesitate to provide examples of codes or output. What have not done here is for example build a mock-up of web interface, upload the image and ask ChatGPT to build the interface from the image.
The prompt for the analysis of the content can be full customised. That approach from Fabric is really powerful.

You can also imagine moving this to an agentic architecture and split the tasks. As a bonus I want to be able to select the article and then build a summary section that can be pasted in the newsletter with minimal effort.

Hope you liked the approach and if you want me to develop something similar for another use-case please let me know!

Worth a full read

Podcast: Investigation Into Zero-Day Exploitation of the Ivanti Connect Secure Appliances

‎

Mandiant Principal Analysts John Wolfram and Tyler McLellan join host Luke McNamara to discuss their research in the "Cutting Edge" blog series, a series of investigations into zero-day exploitation of Ivanti appliances

podcasts.apple.com/ch/podcast/the-defenders-advantage-podcast/id1073779629?i=1000655791659

Key Takeaway

Volt Typhoon's probing of Met Scalar systems highlights the importance of monitoring critical infrastructure.
Zero-day vulnerabilities are prime targets for cyber espionage groups like Volt Typhoon.
Collaboration among cybersecurity experts enhances threat tracking and mitigation efforts.
Cyber espionage poses significant risks to national security and critical infrastructure sectors.
Continuous monitoring and sharing information are key to improving cybersecurity defenses.
Understanding cyber espionage tactics helps develop more effective defense strategies.
Cybersecurity podcasts offer valuable insights into current threats and defense mechanisms.
Public awareness and collaboration are crucial for enhancing overall cybersecurity measures.
Sophisticated techniques used by cyber espionage groups require adaptive defense strategies.
Cybersecurity research and analysis must be ongoing to stay ahead of evolving threats.

Southeast Asian scam syndicates, stealing $64 billion annually

In Cambodia, Laos and Myanmar, the groups are estimated to reap about $43.8 billion each year through scams — some 40 percent of the three nations’ combined formal GDP.

therecord.media/southeast-asian-scam-syndicates-stealing-billions-annually

Key Takeaway

Scam syndicates' annual theft equals 40% of Cambodia, Laos, and Myanmar's combined GDP.
"Pig butchering" scams exploit emotional manipulation to secure fraudulent investments.
Political corruption universally enables organized crime in Southeast Asia.
Scam compounds operate like penal institutions with high security and forced labor.
Criminal operations adapt quickly to law enforcement pressure by shifting locations.
International coordination is crucial to combat the global spread of scam syndicates.
Social media platforms play a significant role in facilitating scam operations.
Advanced investigative techniques are essential to dismantle sophisticated scam networks.
Partnering with local elites provides scam syndicates with protection and impunity.
Economic impact of scams extends beyond Southeast Asia, affecting global victims.

Some more reading

YARA is dead, long live YARA-X » READ

3 North Koreans infiltrated US companies in 'staggering' alleged telework fraud: DOJ » READ

A Third of CISOs Have Been Dismissed “Out of Hand” by the Board » READ

This repository centralizes and summarizes practical and proposed defenses against prompt injection » READ

CISA and NIST have announced a new plan to address security flaws in software, aiming to enhance cybersecurity resilience » READ & GitHub Repo

LLMjacking: Stolen Cloud Credentials Used in New AI Attack » READ

User outcry as Slack scrapes customer data for AI model training » READ

Wisdom of the week

❝

The day the soldiers stop bringing you their problems is the day you stopped leading them.

They have either lost confidence that you can help them or concluded that you do not care.

Either case is a failure of leadership.

Colin Powell

Contact

Let me know if you have any feedback or any topics you want me to cover. You can ping me on LinkedIn or on Twitter/X. I’ll do my best to reply promptly!

Thanks! see you next week! Simon

#020 - Cyber AI Chronicle - Transform Content Overload with AI: Your Step-by-Step Guide

Table of Contents

What I learned this week

TL;DR

Transform Content Overload with AI: Your Step-by-Step Guide

Step 1: Use the API to get the research paper

Step 2: Analyse the content of the research paper

Step 3: Database support

Step 4: Build a Web GUI

Conclusion

Worth a full read

Podcast: Investigation Into Zero-Day Exploitation of the Ivanti Connect Secure Appliances

Key Takeaway

Southeast Asian scam syndicates, stealing $64 billion annually

Key Takeaway

Some more reading

Wisdom of the week

Contact

Reply

Keep Reading

Project Overwatch