Parsing C++ Functions Without a Compiler: Declarations, Definitions, Stubs
How an Unreal editor locates a function's declaration in a header and its definition in a cpp with no AST: a filter pipeline that narrows candidates to exactly one, bracket matching that ignores strings and comments, overload disambiguation by parameter matching, and generating a definition stub from a declaration.
This is the part of an in-editor code tool that has no shortcut: to read the C++ behind a Blueprint node, or to generate a function’s body from its declaration, the editor has to find that function in a file. It does this with no compiler, no AST, no preprocessor, just raw strings and bracket counting. That sounds fragile, and the interesting thing is how it is made reliable anyway. This article walks through it, using the AI Node Code Editor (Quick Code Editor on FAB) as the example. It is one of six articles in Building an AI Code Editor Inside Unreal Engine.
The core idea: match, do not understand
The key reframing: the editor never tries to understand C++. It has a trusted source of
truth already, the UFunction from Unreal’s reflection system, which gives it the exact
name and parameter types. The text in the file is the untrusted thing being matched
against that oracle. So “find a function” becomes “find the one piece of text that matches
this known-good signature.”
With no symbol table, the only way to do that is to find every string that looks like the name and then disprove the impostors. The whole locator is a progressive filter pipeline, each stage an array of candidate positions, succeeding only if exactly one survives:
TArray<int32> Candidates = FilterPositionsByName(File, Name); // every occurrence
Candidates = FilterCommentedPositions(File, Candidates); // not in a comment
Candidates = FilterNativeFunctionPositions(File, Candidates); // has UFUNCTION above
Candidates = FilterPositionsByMatchingParams(File, Candidates, Sig); // params match reflection
check(Candidates.Num() == 1); // exactly one, or refuse to act
That final assertion is a design stance, not laziness: faced with genuine ambiguity, the tool refuses rather than edits the wrong function. Safety over cleverness.
Finding the declaration in the header
Each filter is a small, honest heuristic.
Name matching with word boundaries. A plain Find loop, but each hit is validated by
the characters on both sides so Foo does not match inside FooBar or DoFoo. The
character before must be whitespace, : (for ClassName::Foo) or a newline; the character
after must be whitespace, ( or a newline. It is a hand-rolled \b.
Comment detection. Cheap heuristics rather than a lexer: scan back to the line start and
see if a // precedes the position, and check whether the position sits between a /* and
its */.
UFUNCTION detection. Search backwards line by line, capped at 20 lines for performance,
for a UFUNCTION macro. The clever stop condition: if a line ending in ; or } is hit
first, bail, because that terminator means we have crossed out of this declaration into
the previous member, so any UFUNCTION beyond it belongs to someone else.
for (int32 i = 0; i < MaxSearchLines; ++i)
{
const FString Line = PreviousLine(File, Pos).TrimStartAndEnd();
if (Line.IsEmpty() || Line.StartsWith(TEXT("//"))) continue;
if (Line.EndsWith(TEXT(";")) || Line.EndsWith(TEXT("}"))) return false; // crossed a boundary
if (Line.Contains(TEXT("UFUNCTION"))) return true;
}
Once a single declaration survives, the editor records its character range (it stores linear offsets, not line and column, which makes later surgical edits trivial) and pulls out the return type and parameters.

One scanner to rule them all
Almost everything rests on a single primitive: a bracket matcher that counts depth while
ignoring strings and comments. It is the substitute for a real tokenizer, and it is
reused for parameter extraction, body extraction, and capturing multi-line UFUNCTION(...)
specifiers:
int32 FindMatchingBracket(const FString& S, int32 Open, TCHAR L, TCHAR R)
{
int32 Depth = 0;
bool bInString = false, bInLineComment = false, bInBlockComment = false;
for (int32 i = Open; i < S.Len(); ++i)
{
const TCHAR C = S[i];
// ... toggle bInString / bInLineComment / bInBlockComment with escape handling ...
if (bInString || bInLineComment || bInBlockComment) continue;
if (C == L) ++Depth;
else if (C == R && --Depth == 0) return i; // matching close
}
return INDEX_NONE;
}
Finding the definition: overload disambiguation
The cpp side runs the same pipeline with different filters: name, not-in-comment, then
scoped (ClassName::Name) instead of UFUNCTION, then a parameter match. The scope check
falls back gracefully to non-comment matches if nothing is scoped, which handles free
functions.
The genuinely interesting filter is parameter matching, because that is how overloads are told apart. The expected signature comes from reflection; each candidate’s text parameters are normalized and compared type by type. The hard problem buried in there is splitting a parameter list on top-level commas only. Templates and braced initializers contain commas that must not split:
// TMap<int32, FString> Map, const TArray<int>& Values = {1, 2}
// ^ not a split ^ split here ^ not a split
So the splitter tracks paren depth, angle-bracket depth and brace depth simultaneously,
plus string and comment state, and only splits on a comma when all depths are zero. And
because < is ambiguous in C++ (template bracket or less-than operator), it uses a
look-behind: < only opens a template if the previous character is an identifier
character, >, ) or :. That one heuristic is a clean illustration of why C++ cannot be
tokenized without context.
Each parameter is then normalized before comparison: strip the default value (the first
top-level =), collapse whitespace (and remove it around *, &, <, > so spacing
differences vanish), and remove the parameter name, leaving just the type. Reflection and
hand-written source also disagree on const and & for by-ref parameters, so the matcher
does a strict comparison first and then retries ignoring const-ness, a deliberate leniency
so a single const mismatch does not lose the only candidate.
Generating a definition stub
Generate Definition turns a header declaration into a matching cpp body. It parses the declaration, then emits the out-of-line signature with exactly the right transformations:
// From: UFUNCTION(BlueprintCallable) static FString GetName(int32 Count = 4) const;
// To:
FString AMyActor::GetName(int32 Count) const
{
}
The rules encoded in that string surgery are real C++ rules:
- Qualify with
ClassName::. - Strip
static(illegal on an out-of-line definition) and theUFUNCTIONmacro. - Keep parameter names but drop default values (defaults live only in the declaration).
- Preserve
constif the declaration had it.

The insertion point is chosen to keep the cpp in the same order as the header: the editor finds the declarations immediately before and after the target in the header, locates their definitions in the cpp, and inserts the new stub between them (skipping back over any doc comments so it lands above the next function’s comment). If neither neighbour can be placed, it appends at the end of the file. Matching member order is a small thing that keeps a generated cpp readable instead of scrambled.

Where it breaks (honestly)
Text matching without an AST has real edges, and it is worth naming them rather than pretending otherwise:
- Multi-word return types (
const FVector,unsigned int) can be partially truncated by the single-token assumption when walking back to a definition’s start. - The simple comment check does not track string literals, so a
//inside a string on the same line can be misread as a comment start. - The class-name extractor takes the first
classline in a header, which is wrong for headers with multiple or nested classes. - The “exactly one match” rule means identical signatures across namespaces, or macro-generated members, make it abort. That is the safety tradeoff: it would rather do nothing than edit the wrong function.
None of these are fatal for the common workflow (one function, one class, ordinary signatures), and the abort-on-ambiguity stance means the failure mode is “it declines”, not “it corrupts your file”. When you write back, the editor also re-reads the file and compares a checksum before splicing, so an out-of-band edit aborts the save instead of clobbering it.
What to take away
- Without a compiler, match, do not parse: list every occurrence of the name, then disprove false positives until exactly one survives. Refuse on ambiguity.
- Build everything on one context-aware bracket matcher that ignores strings and comments; it is your stand-in for a lexer.
- Disambiguate overloads by normalized parameter comparison, and respect that C++
<is ambiguous, split parameter lists only on top-level commas. - Generating a stub is string surgery that encodes real rules: qualify the name, strip
staticand defaults, keep names andconst.
That parsing is what lets the editor load the C++ behind a node and feed it to the AI assistant and inline completion. The full series is Building an AI Code Editor Inside Unreal Engine, and the finished plugin is AI Node Code Editor on FAB.
Frequently asked questions
- Can you find a C++ function reliably without parsing the whole file into an AST?
- Yes, for the common cases. List every textual occurrence of the name with word-boundary checks, then filter: drop ones inside comments, keep ones with a UFUNCTION macro above, and keep ones whose parameters match the known signature. If exactly one survives you have your match; if more than one does, refuse rather than guess.
- How do you match braces and parentheses while ignoring strings and comments?
- One context-aware scanner does it: increment and decrement a depth counter on brackets, but suppress counting while inside a string, a char literal, a line comment or a block comment, with backslash-escape handling. This single primitive underpins parameter extraction, body extraction and macro capture.
- How do you tell overloads apart?
- By comparing parameter lists. The expected signature comes from Unreal reflection; each text candidate's parameters are normalized (defaults stripped, whitespace canonicalized, names removed) and compared type by type. The hard part is splitting the list on top-level commas only, since templates and braced initializers contain commas too.
- How is a definition stub generated from a declaration?
- Parse the declaration, then emit ReturnType ClassName::Name(params) with the static specifier and default argument values stripped (both are illegal on an out-of-line definition) but parameter names kept, followed by an empty body. The insertion point is chosen to preserve the header's member order in the cpp.