Language-agnostic Injection Detection

Lars Hermerschmidt, Andreas Straub, Goran Piskachev

injections grow on trees

Shotgun Unparser

if (recursive || print_dir_name)
{
if (!first)
	DIRED_PUTCHAR ('\n');
first = false;
DIRED_INDENT ();
PUSH_CURRENT_DIRED_POS (&subdired_obstack);
dired_pos += quote_name (stdout, realname ? realname : name,
			dirname_quoting_options, NULL);
PUSH_CURRENT_DIRED_POS (&subdired_obstack);
DIRED_FPUTS_LITERAL (":\n", stdout);
}

https://github.com/wertarbyte/coreutils/blob/master/src/ls.c

mkdir "1
1"
mkdir 2
ls | wc -l

Why do injections exist?

Shotgun Unparsers cause Injection Vulnerabilities

But why?

Correct Unparser Generators are not used

But why?

IO is "soo simple", let's just use the core libs

But why?

Core libs don't provide secure input handling

But why?

Lacking Awareness for the problem

But why?

Core libs don't provide secure input handling

Related Work

Language specific static and dynamic analysis:
SQLi, XSS, ... are well known
Language agnostic dynamic aka fuzzing:
Parsers are known to be broken
AUTOGRAM uses dynamic taint tracking:
Grammar reconstruction from a given parser

Our contribution: Language agnostic detection of injections for textual languages
Awareness

Detection is never complete; Use a constructive approach like McHammerCoder to solve the injection problem.

The Solution

Show, don't tell

Problem space

Detecting unparsers
Identifying injections in a given unparser
Generate attacks
Extract full grammar

Approach Overview

Guided fuzzing using language keyword information
Keywords are extracted from unparse trees (UPTs)
UPTs are inferred automatically using dynamic program analysis

UPT Inference

UPT Inference

UPT Inference

UPT Inference

UPTs and Keywords

Keywords have no origin in any input
They are created by the unparser
Their location in the UPT shows where (structurally) they are valid in the language

Fuzzing

generate targeted injection candidates based on keywords

example: "break out" of string-enclosing quotation marks

evaluate injection success by comparing parse trees

run both original input and modified input through unparser-parser round-trip
compare structures of resulting parse trees

if the parse tree changed, an injection was found

Results

Promising results in case studies

very accurate UPTs
found (implanted) injection vulnerabilities
structural keyword information can significantly improve fuzzing
caveat: not a quantitative evaluation

Fuzzing automatically yields PoC exploits

Key Observations

"Recursive descent unparsers" exist

common in ad-hoc implementations

Difference to Taint Tracking:

leveraging structural information to identify keywords and their scope

Requires structural variability in unparser outputs

poor UPTs in "template-based" unparsers
reduced to common taint tracking
better use a sample output for mutation fuzzing

Conclusion

Language-agnostic Injection Detection

works for recursive descent unparsers
use keywords from UPTs in fuzzing

Awareness

Creating output is not just writing an array of bytes
Injections might exist in all your unparses

Call to Action

Every programming language's core library deserves an (un)parser

Questions?

Lars: @bob5ec on Twitter

Andreas: andy@strb.org

MARGOTUA code on GitHub