Parsing C++ Functions Without a Compiler: Declarations, Definitions, Stubs

How an Unreal editor locates a function's declaration in a header and its definition in a cpp with no AST: a filter pipeline that narrows candidates to exactly one, bracket matching that ignores strings and comments, overload disambiguation by parameter matching, and generating a definition stub from a declaration.

This is the part of an in-editor code tool that has no shortcut: to read the C++ behind a Blueprint node, or to generate a function’s body from its declaration, the editor has to find that function in a file. It does this with no compiler, no AST, no preprocessor, just raw strings and bracket counting. That sounds fragile, and the interesting thing is how it is made reliable anyway. This article walks through it, using the AI Node Code Editor (Quick Code Editor on FAB) as the example. It is one of six articles in Building an AI Code Editor Inside Unreal Engine.

The core idea: match, do not understand

The key reframing: the editor never tries to understand C++. It has a trusted source of truth already, the UFunction from Unreal’s reflection system, which gives it the exact name and parameter types. The text in the file is the untrusted thing being matched against that oracle. So “find a function” becomes “find the one piece of text that matches this known-good signature.”

With no symbol table, the only way to do that is to find every string that looks like the name and then disprove the impostors. The whole locator is a progressive filter pipeline, each stage an array of candidate positions, succeeding only if exactly one survives:

TArray<int32> Candidates = FilterPositionsByName(File, Name);        // every occurrence
Candidates = FilterCommentedPositions(File, Candidates);             // not in a comment
Candidates = FilterNativeFunctionPositions(File, Candidates);        // has UFUNCTION above
Candidates = FilterPositionsByMatchingParams(File, Candidates, Sig); // params match reflection

check(Candidates.Num() == 1);   // exactly one, or refuse to act

That final assertion is a design stance, not laziness: faced with genuine ambiguity, the tool refuses rather than edits the wrong function. Safety over cleverness.

Finding the declaration in the header

Each filter is a small, honest heuristic.

Name matching with word boundaries. A plain Find loop, but each hit is validated by the characters on both sides so Foo does not match inside FooBar or DoFoo. The character before must be whitespace, : (for ClassName::Foo) or a newline; the character after must be whitespace, ( or a newline. It is a hand-rolled \b.

Comment detection. Cheap heuristics rather than a lexer: scan back to the line start and see if a // precedes the position, and check whether the position sits between a /* and its */.

UFUNCTION detection. Search backwards line by line, capped at 20 lines for performance, for a UFUNCTION macro. The clever stop condition: if a line ending in ; or } is hit first, bail, because that terminator means we have crossed out of this declaration into the previous member, so any UFUNCTION beyond it belongs to someone else.

for (int32 i = 0; i < MaxSearchLines; ++i)
{
    const FString Line = PreviousLine(File, Pos).TrimStartAndEnd();
    if (Line.IsEmpty() || Line.StartsWith(TEXT("//"))) continue;
    if (Line.EndsWith(TEXT(";")) || Line.EndsWith(TEXT("}"))) return false;  // crossed a boundary
    if (Line.Contains(TEXT("UFUNCTION"))) return true;
}

Once a single declaration survives, the editor records its character range (it stores linear offsets, not line and column, which makes later surgical edits trivial) and pulls out the return type and parameters.

The declaration view of the code editor showing the full header context for a class, the source from which a declaration is located and parsed

One scanner to rule them all

Almost everything rests on a single primitive: a bracket matcher that counts depth while ignoring strings and comments. It is the substitute for a real tokenizer, and it is reused for parameter extraction, body extraction, and capturing multi-line UFUNCTION(...) specifiers:

int32 FindMatchingBracket(const FString& S, int32 Open, TCHAR L, TCHAR R)
{
    int32 Depth = 0;
    bool bInString = false, bInLineComment = false, bInBlockComment = false;
    for (int32 i = Open; i < S.Len(); ++i)
    {
        const TCHAR C = S[i];
        // ... toggle bInString / bInLineComment / bInBlockComment with escape handling ...
        if (bInString || bInLineComment || bInBlockComment) continue;
        if (C == L) ++Depth;
        else if (C == R && --Depth == 0) return i;   // matching close
    }
    return INDEX_NONE;
}

Finding the definition: overload disambiguation

The cpp side runs the same pipeline with different filters: name, not-in-comment, then scoped (ClassName::Name) instead of UFUNCTION, then a parameter match. The scope check falls back gracefully to non-comment matches if nothing is scoped, which handles free functions.

The genuinely interesting filter is parameter matching, because that is how overloads are told apart. The expected signature comes from reflection; each candidate’s text parameters are normalized and compared type by type. The hard problem buried in there is splitting a parameter list on top-level commas only. Templates and braced initializers contain commas that must not split:

// TMap<int32, FString> Map, const TArray<int>& Values = {1, 2}
//      ^ not a split        ^ split here      ^ not a split

So the splitter tracks paren depth, angle-bracket depth and brace depth simultaneously, plus string and comment state, and only splits on a comma when all depths are zero. And because < is ambiguous in C++ (template bracket or less-than operator), it uses a look-behind: < only opens a template if the previous character is an identifier character, >, ) or :. That one heuristic is a clean illustration of why C++ cannot be tokenized without context.

Each parameter is then normalized before comparison: strip the default value (the first top-level =), collapse whitespace (and remove it around *, &, <, > so spacing differences vanish), and remove the parameter name, leaving just the type. Reflection and hand-written source also disagree on const and & for by-ref parameters, so the matcher does a strict comparison first and then retries ignoring const-ness, a deliberate leniency so a single const mismatch does not lose the only candidate.

Generating a definition stub

Generate Definition turns a header declaration into a matching cpp body. It parses the declaration, then emits the out-of-line signature with exactly the right transformations:

// From:  UFUNCTION(BlueprintCallable) static FString GetName(int32 Count = 4) const;
// To:
FString AMyActor::GetName(int32 Count) const
{
}

The rules encoded in that string surgery are real C++ rules:

  • Qualify with ClassName::.
  • Strip static (illegal on an out-of-line definition) and the UFUNCTION macro.
  • Keep parameter names but drop default values (defaults live only in the declaration).
  • Preserve const if the declaration had it.

The Generate Definition feature creating the matching function body in the implementation file from a header declaration

The insertion point is chosen to keep the cpp in the same order as the header: the editor finds the declarations immediately before and after the target in the header, locates their definitions in the cpp, and inserts the new stub between them (skipping back over any doc comments so it lands above the next function’s comment). If neither neighbour can be placed, it appends at the end of the file. Matching member order is a small thing that keeps a generated cpp readable instead of scrambled.

The AI-generated declaration in the header, the kind of declaration Generate Definition then turns into a stub

Where it breaks (honestly)

Text matching without an AST has real edges, and it is worth naming them rather than pretending otherwise:

  • Multi-word return types (const FVector, unsigned int) can be partially truncated by the single-token assumption when walking back to a definition’s start.
  • The simple comment check does not track string literals, so a // inside a string on the same line can be misread as a comment start.
  • The class-name extractor takes the first class line in a header, which is wrong for headers with multiple or nested classes.
  • The “exactly one match” rule means identical signatures across namespaces, or macro-generated members, make it abort. That is the safety tradeoff: it would rather do nothing than edit the wrong function.

None of these are fatal for the common workflow (one function, one class, ordinary signatures), and the abort-on-ambiguity stance means the failure mode is “it declines”, not “it corrupts your file”. When you write back, the editor also re-reads the file and compares a checksum before splicing, so an out-of-band edit aborts the save instead of clobbering it.

What to take away

  • Without a compiler, match, do not parse: list every occurrence of the name, then disprove false positives until exactly one survives. Refuse on ambiguity.
  • Build everything on one context-aware bracket matcher that ignores strings and comments; it is your stand-in for a lexer.
  • Disambiguate overloads by normalized parameter comparison, and respect that C++ < is ambiguous, split parameter lists only on top-level commas.
  • Generating a stub is string surgery that encodes real rules: qualify the name, strip static and defaults, keep names and const.

That parsing is what lets the editor load the C++ behind a node and feed it to the AI assistant and inline completion. The full series is Building an AI Code Editor Inside Unreal Engine, and the finished plugin is AI Node Code Editor on FAB.

Frequently asked questions

Can you find a C++ function reliably without parsing the whole file into an AST?
Yes, for the common cases. List every textual occurrence of the name with word-boundary checks, then filter: drop ones inside comments, keep ones with a UFUNCTION macro above, and keep ones whose parameters match the known signature. If exactly one survives you have your match; if more than one does, refuse rather than guess.
How do you match braces and parentheses while ignoring strings and comments?
One context-aware scanner does it: increment and decrement a depth counter on brackets, but suppress counting while inside a string, a char literal, a line comment or a block comment, with backslash-escape handling. This single primitive underpins parameter extraction, body extraction and macro capture.
How do you tell overloads apart?
By comparing parameter lists. The expected signature comes from Unreal reflection; each text candidate's parameters are normalized (defaults stripped, whitespace canonicalized, names removed) and compared type by type. The hard part is splitting the list on top-level commas only, since templates and braced initializers contain commas too.
How is a definition stub generated from a declaration?
Parse the declaration, then emit ReturnType ClassName::Name(params) with the static specifier and default argument values stripped (both are illegal on an out-of-line definition) but parameter names kept, followed by an empty body. The insertion point is chosen to preserve the header's member order in the cpp.