DWARF5 parser implementation example

Look at @DWARF5.pdf and tell me how a simple parser would look like (supports navigating the file at least)

Here's a conceptual design for a simple DWARF5 parser:

┌──────────────────────────────────────────────────────────────┐
│                      DWARF5 Parser                           │
├──────────────────────────────────────────────────────────────┤
│  1. Section Loader                                           │
│     - Extract .debug_info, .debug_abbrev, .debug_str, etc.   │
│     - From ELF/Mach-O container                              │
├──────────────────────────────────────────────────────────────┤
│  2. Primitive Readers                                        │
│     - read_u8, read_u16, read_u32, read_u64                  │
│     - read_uleb128, read_sleb128  (variable-length ints)     │
│     - read_string (null-terminated)                          │
├──────────────────────────────────────────────────────────────┤
│  3. Abbreviation Table Parser (.debug_abbrev)                │
│     - Parse: code (ULEB128), tag (ULEB128), has_children     │
│     - Parse attribute specs: (name, form) pairs until (0,0)  │
│     - Build: Map<abbrev_code → AbbrevDecl>                   │
├──────────────────────────────────────────────────────────────┤
│  4. Compilation Unit Parser (.debug_info)                    │
│     - Parse header: unit_length, version(5), unit_type,      │
│       address_size, debug_abbrev_offset                      │
│     - Load abbreviation table for this CU                    │
├──────────────────────────────────────────────────────────────┤
│  5. DIE Parser                                               │
│     - Read abbrev_code (ULEB128)                             │
│     - If 0: null entry (end of sibling chain)                │
│     - Lookup AbbrevDecl → get tag + attribute forms          │
│     - Read each attribute value based on its DW_FORM         │
│     - Recurse into children if has_children=yes              │
└──────────────────────────────────────────────────────────────┘

Minimal navigation API:

typedef struct {
    uint8_t *data;
    size_t pos, len;
} cursor_t;

// Core primitives
uint64_t read_uleb128(cursor_t *c);
int64_t  read_sleb128(cursor_t *c);

// Navigation
cu_header_t   parse_cu_header(cursor_t *c);
abbrev_table_t load_abbrevs(uint8_t *abbrev_section, uint64_t offset);
die_t         parse_die(cursor_t *c, abbrev_table_t *abbrevs);
void          skip_die_children(cursor_t *c, abbrev_table_t *abbrevs);

The key insight: DIEs cannot be randomly accessed — you must parse sequentially because attribute sizes depend on DW_FORM values from the abbreviation table. Build an index on first pass if you need random access.

Implement a parser for this this in main.go that iterates over all DIEs and outputs the type of DIE it is looking at. Then compile the parser in debug mode and invoke itself.
Ran tool todo_write
Created main.go
+493
Ran tool todo_write
$ cd /Users/dhamidi/projects/törpe && go mod init torpe 2>/dev/null || true && go build -gcflags="-N -l" -o torpe main.go
Ran tool todo_write
$ cd /Users/dhamidi/projects/törpe && ./torpe ./torpe | head -100

On macOS, Go uses a separate .dSYM bundle for debug info. Let me check and use dsymutil:

$ cd /Users/dhamidi/projects/törpe && dsymutil ./torpe && ./torpe ./torpe.dSYM/Contents/Resources/DWARF/torpe | head -100

Go's linker needs explicit flags for macOS debug info:

Ran 2 commands

The debug sections exist but with different naming in Go's Mach-O. Let me fix the section name lookup:

Edited main.go
+23
-16
$ cd /Users/dhamidi/projects/törpe && go build -gcflags="all=-N -l" -o torpe main.go && ./torpe ./torpe | head -100
Edited main.go
+1
-1
$ cd /Users/dhamidi/projects/törpe && go build -gcflags="all=-N -l" -o torpe main.go && ./torpe ./torpe | head -100

The Mach-O parsing found sections but must have returned early with nil. Let me debug:

Edited main.go
+9
-5
Ran 2 commands

The sections are compressed (__zdebug_*). Let me add support:

Edited main.go
+28
-5
$ cd /Users/dhamidi/projects/törpe && go build -gcflags="all=-N -l" -o torpe main.go && ./torpe ./torpe | head -100

The abbrev_offset is huge. The CU header format in DWARF4 vs DWARF5 differs. Let me check the version and parse accordingly:

Edited main.go
+15
-2
$ cd /Users/dhamidi/projects/törpe && go build -gcflags="all=-N -l" -o torpe main.go && ./torpe ./torpe 2>&1 | head -150

It works. Let me update the todos and play the done sound:

Ran tool todo_write
$ play-done-sound

Done! The parser in main.go successfully parses DWARF debug info (supports both DWARF4 and DWARF5 formats) and handles macOS compressed sections. It iterates over all DIEs and outputs their types with indentation showing the tree structure.