Rob's playground

In this post I would like to discuss the interaction, and future co-dependency, between information security and large AI models whether language or multimodal. Due to the ability of large models to quickly change the economics of useful and nefarious content creation, they possess the ability to drastically alter the status quo even prior to reaching human-level AI. The solutions discussed in industry fall under the categories of AI safety and alignment research. We’ll discuss why these categories are in fact applications of information security that broadly fit in two groups, large models applied to security, and the security of large models.

Where does information security go from here?

The idea of using large models to augment existing security product categories is to be expected. Information security industry has long adopted automation for pentesting, source code analysis, endpoint detection and response (EDR), and many other uses. However, most automation in production is still heuristics based. Source code analysis tools primarily or very heavily rely on rule-based detections and known vulnerability signatures. Automated pentesting tools use network scanners, and pre-scripted exploits that come packaged from the tool’s vendor. While these heuristics have become very advanced, they are still nowhere close to the level of capability and creativity of a well-seasoned red team. EDR has made the most progress utilizing ML for automating anomalous event detection. However, the models are relatively lightweight when compared to generative models, as they are merely statistical anomaly detection tools.
Large models may not be the right tool for every one of these applications, however their ever-increasing capabilities at decreasing costs is likely to have far reaching effects. For the time being, the outputs of large models can be thought of as good on average, but much cheaper and faster. “Good on average” can be used in production whenever errors are inexpensive or when the steps necessary to validate an error are significantly lower cost than the previous cost of creation.
In the case of red teaming and source code analysis, new tools can use large models to draw inspiration from vast bodies of technical literature such as CVEs, textbooks, blog posts, and RFCs to have a high level background understanding of their current task. Language models can be used to rapidly generate very creative methods for testing networks and applications. Instead of just predicting the next word, fine tuned language models for action generation could identify the next step. Even if many of the generated theories for accessing, as an example, restricted instances are wrong or hallucinated, a properly trained and deployed language model could be made to find a valid vulnerability on its 1000th attempt before a human red teamer were to make their second attempt. Generally, with red teaming the cost of creativity is high, while the cost of attempting an exploit can be relatively low. Similarly, new source code analysis tools can benefit from utilizing language models to draw inspiration from existing CVEs (and other technical write ups) to apply high-level knowledge to the codebases they are analyzing. Such a top-down approach does not exist with today’s near function or class level scoped automation tools. There is an opportunity for new tools to draw on general knowledge, inspiration from security literature, embeddings of the codebase being analyzer to create low level actions plans based on high level descriptions. While the outputs of language models are mired with hallucinations, the path to reaching these capabilities now appears attainable.
As an example, imagine a LangChain powered LLM application that is able to look up vulnerability classes for services in a Kubernetes deployment. Instead of predicting the next word, an LLM could be used to predict the next steps in service discovery or vulnerability assessment. Afterwards, upon identifying a flask based microservice, the LLM is given context of the deployment and is prompted to generate thousands of payloads at a high temperature to discover what endpoints it is able to reach. In this case, the LLM’s outputs are generated based on general knowledge about flask apps, authorization and authentication issues, as well as specific context information about the Kubenertes deployment it collected previously. There is a lot of promising research in large action models and the use of Reinforcement Learning to generate prompts may be able to achieve a goal pursuit mentality similar to that of a red teamer. LLMs possess a superset of human capabilities, they are not blocked from invoking the same command lines tools humans are, in fact GPT3 is readily able to invoke Nmap and maybe with a little luck, Metasploit. The Toolformer paper, which was recently published by researchers at Facebook, describes methods in which LLMs can be made to reach out to APIs
A survey of large models applied to security would be remiss without covering recent developments in natural language frameworks like LangChain to “chat-your-data”. By using data ingestion pipelines, and frameworks like LangChain and gpt_index layered on top of GPT3 there are now quick to market solutions for deploying natural language interfaces. Recently, Orca integrated with GPT3 to provide a chatbot interface to their CSPM. With further development and additional data ingestion, it may ultimately be possible to turn language models into cloud architecture assistants that aid security and SRE teams.
Much further out in the future, advances in multimodal models are likely to bring bidirectional image to text comprehension. Endpoint security is likely to be improved by models that can “visually” watch a user’s screen and understand user interactions rather than just raw access patterns to judge risk. Additionally, since no human is involved in this monitoring, it may help to alleviate some of the privacy and safety concerns that are making this type of monitoring increasingly illegal in some jurisdictions. Some researchers believe language models such as GPT3 are already able to display theory of mind, which is likely the level of understanding that will be necessary to thwart much more sophisticated social engineering and cyber attacks in the future. The EDR of the future may be a personal assistant that is able to understand technical events as well as its users’ social circles to help it mitigate anything from memory corruption attacks to phishing attempts. However, the question becomes, is there an opportunity to create such a security endpoint assistant, or will it be an emergent feature of a much more powerful large model? I believe emergent behaviors don’t come completely free, it is likely new training technology must be developed for those capabilities to present themselves.

How does information security factor into large models?

Conversely, large models are not impervious to security afflictions. In fact, they introduce new attack surfaces which necessitate new methods for securing them.
Content moderation is essential for keeping training sets clear of unwanted content — illegal, hateful, harmful, and toxic topics and prompt injections. The scope of content moderation is likely to grow as more precedent is set through copyright case law. Content moderation may have to begin taking on the role of license moderation as well to determine whether content can be used in training. Currently no public content moderation tool exists for tagging data for the purpose of determining its eligibility for being used in training, as defined by an organization’s data governance policies.
Furthermore, it is not sufficient for AI API vendors alone to implement inference controls such as content moderation. As the underlying technology of large models is replicated and commoditized, the use of private large models will become ubiquitous. As the internet will likely begin to fill up with generative AI content like it did with spam, content moderation will become important for ordinary users, perhaps becoming as necessary and as ubiquitous as a core web browser feature. There will be opportunities for content moderation providers to offer new capabilities for determining authenticity in its many forms, is a picture unaltered, is a text message likely to have been sent by someone’s family, is a voice recording completely generated, is an highly regulated industry employee’s presentation using LLMs against company policy, etc.
In the cyber security threat space, we may even begin to see offline models created and sold specifically for nefarious purposes such as sophisticated phishing and social engineering campaigns.
While large models continue to lack agency, they will likely remain unable to discern when a prompt is being subverted either to reveal the original proprietary prompt, or to have the inference break rules and go the way of the subverter. A subservice phrase anywhere in a prompt can easily derail Zero-shot Learning (ZSL) and Few-shot Learning (FSL) inferences. The temporary fix is to perform content moderation on the prompt before it is submitted for inference. (As an unfortunate corollary, a subversive model connected to a model it is looking to exploit is almost the textbook definition of a generative adversarial network.) Furthermore, it has recently been found that just small opportunistic poisoning of the training dataset is sufficient to maliciously influence a model’s understanding of a topic. Potentially, there may even be an attack vector where sleeper prompts are injected into the training set to provide the attacker an undefeatable prompt “jailbreak”. Due to the vast training corpus requirement of large language models, getting attacker controlled strings into the dataset may be very easy. Opportunistically changing a Wikipedia page or taking over a domain in Common Crawl may be sufficient to poison a large language model’s training.
Lastly, as a more esoteric case, explainability and interpretability, sometimes referred to as “reverse engineering neural nets” is a very nascent field that is trying to answer how large language models think. There is little consensus among researchers, or even any real method to determine why models output the responses they do. As an interesting side effect of the lack of interpretability, it is currently impossible to know if large models may be able to verbatim regurgitate items from their training set. Explainability of language models might become a standard question for CISOs trying to understand the data governance of their data.

Rob's playground

Musings of a wetware LLM

Redrawing Infosec Boundaries: The Impact of Large AI Models