Transform contexts

A transform context annotates an instruction with the compiler transformations that produced it. The value is a list of short identifiers; the list may repeat the same identifier when the transformation has been applied multiple times—for example, doubly-inlined code carries transform: ["inline", "inline"].

Explore
View source
Playground

Loading ....

YAML
JSON

ethdebug/format/program/context/transform
$schema: "https://json-schema.org/draft/2020-12/schema"
$id: "schema:ethdebug/format/program/context/transform"

title: ethdebug/format/program/context/transform
description: |
  Annotates an instruction with compiler transformations that
  produced it. The value is a list of short identifiers naming
  each transformation; the list may repeat an identifier when
  the same transformation has been applied more than once (e.g.,
  `["inline", "inline"]` for doubly-inlined code).

  A transform context is *additional* annotation — it does not
  replace semantic contexts. When the compiler inlines a
  function, the invoke/return contexts for the logical call
  should still be emitted at the call boundary so the debugger's
  source-level call stack remains coherent. The transform
  context tells debuggers **how** the call was realized.

  Combine a transform with other discriminator keys (`invoke`,
  `return`, `code`, etc.) by placing them side-by-side on the
  same context object — `gather` is only needed when two
  contexts would collide on the same key.

  Consumers that ignore transform contexts still get a sound
  source-level view from the invoke/return contexts alone.
  Consumers that understand transform contexts can offer
  optimization-aware presentations — e.g., rendering inlined
  code as a collapsible block, or reconciling tail-call-optimized
  back-edges with the logical call stack.

  The identifier set is extensible. v1 defines:

  - `"inline"` — the marked instruction is part of an inlined
    function body. Surrounding invoke/return contexts name the
    inlined callee.
  - `"tailcall"` — the marked instruction is a
    tail-call-optimized back-edge JUMP or continuation, where
    the call was realized as a direct jump (or reuse of the
    caller's frame) rather than a standard call/return sequence.
  - `"fold"` — the marked instruction carries the result of a
    compile-time constant fold. Typically a PUSH of the folded
    value, replacing a compute sequence that appeared in source.
  - `"coalesce"` — the marked instruction is part of a
    read-write merging sequence (e.g., SHL/OR sequences packing
    narrower fields into a wider word) that the user did not
    explicitly write; the compiler introduced it to combine
    adjacent source-level reads or writes.

  Debuggers unfamiliar with a given identifier should preserve
  it as an opaque label.

  Order in the array is not semantically significant — only the
  multiset of identifiers matters.

type: object
properties:
  transform:
    title: Applied transformations
    description: |
      List of transformation identifiers. Identifiers may
      repeat; order is not semantically significant.
    type: array
    items:
      type: string
      minLength: 1
    minItems: 1

required:
  - transform

examples:
  - transform: ["inline"]
  - transform: ["tailcall"]
  - transform: ["fold"]
  - transform: ["coalesce"]
  - transform: ["inline", "inline"]
  - transform: ["inline", "tailcall"]
  - transform: ["inline", "fold"]
  - transform: ["coalesce", "coalesce"]

ethdebug/format/program/context/transform
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "schema:ethdebug/format/program/context/transform",
  "title": "ethdebug/format/program/context/transform",
  "description": "Annotates an instruction with compiler transformations that\nproduced it. The value is a list of short identifiers naming\neach transformation; the list may repeat an identifier when\nthe same transformation has been applied more than once (e.g.,\n`[\"inline\", \"inline\"]` for doubly-inlined code).\n\nA transform context is *additional* annotation — it does not\nreplace semantic contexts. When the compiler inlines a\nfunction, the invoke/return contexts for the logical call\nshould still be emitted at the call boundary so the debugger's\nsource-level call stack remains coherent. The transform\ncontext tells debuggers **how** the call was realized.\n\nCombine a transform with other discriminator keys (`invoke`,\n`return`, `code`, etc.) by placing them side-by-side on the\nsame context object — `gather` is only needed when two\ncontexts would collide on the same key.\n\nConsumers that ignore transform contexts still get a sound\nsource-level view from the invoke/return contexts alone.\nConsumers that understand transform contexts can offer\noptimization-aware presentations — e.g., rendering inlined\ncode as a collapsible block, or reconciling tail-call-optimized\nback-edges with the logical call stack.\n\nThe identifier set is extensible. v1 defines:\n\n- `\"inline\"` — the marked instruction is part of an inlined\n  function body. Surrounding invoke/return contexts name the\n  inlined callee.\n- `\"tailcall\"` — the marked instruction is a\n  tail-call-optimized back-edge JUMP or continuation, where\n  the call was realized as a direct jump (or reuse of the\n  caller's frame) rather than a standard call/return sequence.\n- `\"fold\"` — the marked instruction carries the result of a\n  compile-time constant fold. Typically a PUSH of the folded\n  value, replacing a compute sequence that appeared in source.\n- `\"coalesce\"` — the marked instruction is part of a\n  read-write merging sequence (e.g., SHL/OR sequences packing\n  narrower fields into a wider word) that the user did not\n  explicitly write; the compiler introduced it to combine\n  adjacent source-level reads or writes.\n\nDebuggers unfamiliar with a given identifier should preserve\nit as an opaque label.\n\nOrder in the array is not semantically significant — only the\nmultiset of identifiers matters.\n",
  "type": "object",
  "properties": {
    "transform": {
      "title": "Applied transformations",
      "description": "List of transformation identifiers. Identifiers may\nrepeat; order is not semantically significant.\n",
      "type": "array",
      "items": {
        "type": "string",
        "minLength": 1
      },
      "minItems": 1
    }
  },
  "required": [
    "transform"
  ],
  "examples": [
    {
      "transform": [
        "inline"
      ]
    },
    {
      "transform": [
        "tailcall"
      ]
    },
    {
      "transform": [
        "fold"
      ]
    },
    {
      "transform": [
        "coalesce"
      ]
    },
    {
      "transform": [
        "inline",
        "inline"
      ]
    },
    {
      "transform": [
        "inline",
        "tailcall"
      ]
    },
    {
      "transform": [
        "inline",
        "fold"
      ]
    },
    {
      "transform": [
        "coalesce",
        "coalesce"
      ]
    }
  ]
}

Role: additional annotation

A transform context does not replace semantic contexts. When the compiler inlines a function, the caller's debug info should still carry invoke/return contexts naming the inlined callee at the call boundary—so the debugger's logical call stack reflects the source-level structure. The transform context is additional information telling the debugger how the call was realized.

Consumers are free to ignore transform contexts entirely; the invoke/return contexts alone always give a sound source-level view. Consumers that understand transform contexts can offer optimization-aware presentations:

Render inlined code as a collapsible block tied to the original callee's source location.
Show which call sites were tail-call-optimized vs. realized as full call/return sequences.
Explain apparent anomalies in the trace (e.g., a JUMP that carries an invoke context is a TCO back-edge).

v1 identifiers

Four identifiers are recognized in v1:

"inline" — the marked instruction is part of an inlined function body. Surrounding invoke/return contexts name the inlined callee; this marker tells the debugger the physical code does not correspond to a separate activation record.
"tailcall" — the marked instruction is a tail-call-optimized back-edge JUMP or continuation, where the call was realized without pushing/popping a full activation. A JUMP carrying a tailcall transform typically sits on a context that also carries both a return (from the previous iteration) and an invoke (of the new iteration).
"fold" — the marked instruction carries the result of a compile-time constant fold. Typically a PUSH of the folded value replacing a compute sequence (e.g., ADD over two known constants) that appeared in source. The instruction's surrounding code context, if present, points to the original expression.
"coalesce" — the marked instruction is part of a read-write merging sequence the compiler introduced to combine adjacent source-level reads or writes. Common examples include SHL/OR sequences that pack narrower fields into a single storage slot, or wider loads split into narrower field extractions. The user did not write these instructions directly; the coalesce marker lets a debugger present the sequence as one source-level operation rather than stepping through each byte-shuffling opcode.

The identifier set is extensible. Compilers may emit additional identifiers for optimizations not yet standardized; debuggers should preserve unfamiliar identifiers as opaque labels rather than rejecting them.

Repetition and composition

Identifiers may repeat. A function inlined into another inlined function produces transform: ["inline", "inline"]. A coalesce sequence nested inside another coalesced region produces transform: ["coalesce", "coalesce"].

Different transformations compose: transform: ["inline", "tailcall"] marks an instruction inside an inlined body that was itself a TCO back-edge in the callee; transform: ["inline", "fold"] marks a constant-folded PUSH sitting inside an inlined body.

Order in the array is not semantically significant—only the multiset of identifiers matters.

Composing with other contexts

A context object can carry several discriminator keys at once — code, variables, invoke, return, transform, and so on all live in the same object. A TCO back-edge JUMP, for example, typically combines three facts as sibling keys on a single context:

return:
  identifier: "fact"
  declaration: { ... }
invoke:
  jump: true
  identifier: "fact"
  target: { pointer: { location: code, offset: ... } }
transform: ["tailcall"]

The return and invoke state the source-level facts (iteration N returned, iteration N+1 was invoked); the transform explains how the compiler realized that pair as a single JUMP.

Reach for gather only when two contexts would collide on the same key — e.g., two independent variables blocks or two frames from different pipeline stages. When keys don't collide, the flat form is preferred.

Role: additional annotation​

v1 identifiers​

Repetition and composition​

Composing with other contexts​

Role: additional annotation

v1 identifiers

Repetition and composition

Composing with other contexts