MAME Debugger¶
Introduction¶
MAME includes an interactive low-level debugger that targets the emulated system. This can be a useful tool for diagnosing emulation issues, developing software to run on vintage systems, creating cheats, ROM hacking, or just investigating how software works.
Use the -debug
command line option to start MAME with the debugger
activated. By default, pressing the backtick/tilde (~) during
emulation breaks into the debugger (this can be changed by reassigning
the Break in Debugger input).
The exact appearance of the debugger depends on your operating system and the options MAME was built with. All variants of the debugger provide a multi-window interface for viewing the contents of memory and disassembled code.
The debugger console window is a special window that shows the contents of CPU registers and disassembled code around the current program counter address, and provides a command-line interface to most of the debugging functionality.
Debugger commands¶
Debugger commands are described in the sections below. You can also type help <topic> in the debugger console, where <topic> is the name of a command, to see documentation directly in MAME.
Specifying devices and address spaces¶
Many debugger commands accept parameters specifying which device to operate on. If a device is not specified explicitly, the CPU currently visible in the debugger is used. Devices can be specified by tag, or by CPU number:
Tags are the colon-separated paths that MAME uses to identify devices within a system. You see them in options for configuring slot devices, in debugger disassembly and memory viewer source lists, and various other places within MAME’s UI.
CPU numbers are monotonically incrementing numbers that the debugger assigns to CPU-like devices within a system, starting at zero. The cpunum symbol holds the CPU number for the currently visible CPU in the debugger (you can see it by entering the command print cpunum in the debugger console).
If a tag starts with a caret (^) or dot (.), it is interpreted relative to the CPU currently visible in the debugger, otherwise it is interpreted relative to the root machine device. If a device argument is ambiguously valid as both a tag and a CPU number, it will be interpreted as a tag.
Examples:
maincpu
The device with the absolute tag
:maincpu
.^melodypsg
The sibling device of the visible CPU with the tag
melodypsg
..:adc
The child device of the visible CPU with the tag
adc
.2
The third CPU-like device in the system (zero-based index).
Commands that operate on memory extend this by allowing the device tag or CPU number to be optionally followed by an address space identifier. Address space identifiers are tag-like strings. You can see them in debugger memory viewer source lists. If the address space identifier is omitted, a default address space will be used. Usually, this is the address space that appears first for the device. Many commands have variants with d, i and o (data, I/O and opcodes) suffixes that default to the address spaces at indices 1, 2 and 3, respectively, as these have special significance for CPU-like devices.
In ambiguous cases, the default address space of a child device will be used rather than a specific address space.
Examples:
ram
The default address space of the device with the absolute tag
:ram
, or theram
space of the visible CPU..:io
The default address space of the child device of the visible CPU with the tag
io
, or theio
space of the visible CPU.:program
The default address space of the device with the absolute tag
:program
, or theprogram
space of the root machine device.^vdp
The default address space of the sibling device of the visible CPU with the tag
vdp
.^:data
The default address space of the sibling device of the visible CPU with the tag
data
, or thedata
space of the parent device of the visible CPU.1:rom
The default address space of the child device of the second CPU in the system (zero-based index) with the tag
rom
, or therom
space of the second CPU in the system.2
The default address space of the third CPU-like device in the system (zero-based index).
If a command takes an emulated memory address as a parameter, the address may optionally be followed by an address space specification, as described above.
Examples:
0220
Address 0220 in the default address space for the visible CPU.
0378:io
Address 0378 in the default address space of the device with the absolute tag
:io
, or theio
space of the visible CPU.1234:.:rom
Address 1234 in the default address space of the child device of the visible CPU with the tag
:rom
, or therom
space of the visible CPU.1260:^vdp
Address 1260 in the default address space of the sibling device of the visible CPU with the tag
vdp
.8008:^:data
Address 8008 in the default address space of the sibling device of the visible CPU with the tag
data
, or thedata
space of the parent device of the visible CPU.9660::ram
Address 9660 in the default address space of the device with the absolute tag
:ram
, or theram
space of the root machine device.
The examples here include a lot of corner cases, but in general the debugger should take the most likely meaning for a device or address space specification.
Debugger expression syntax¶
Expressions can be used anywhere a numeric or Boolean parameter is expected. The syntax for expressions is similar to a subset of C-style expression syntax, with full operator precedence and parentheses. There are a few operators missing (notably the ternary conditional operator), and a few new ones (memory accessors).
The table below lists all the operators, ordered from highest to lowest precedence:
(
)
Standard parentheses
++
--
Postfix increment/decrement
++
--
~
!
-
+
b@
w@
d@
q@
b!
w!
d!
q!
Prefix increment/decrement, binary complement, logical complement, unary identity/negation, memory access
*
/
%
Multiplication, division, modulo
+
-
Addition, subtraction
<<
>>
Bitwise left/right shift
<
<=
>
>=
Less than, less than or equal, greater than, greater than or equal
==
!=
Equal, not equal
&
Bitwise intersection (and)
^
Bitwise exclusive or
|
Bitwise union (or)
&&
Logical conjunction (and)
||
Logical disjunction (or)
=
*=
/=
%=
+=
-=
<<=
>>=
&=
|=
^=
Assignment and modifying assignment
,
Separate terms, function parameters
Major differences from C expression semantics:
All numbers are unsigned 64-bit values. In particular, this means negative numbers are not possible.
The logical conjunction and disjunction operators
&&
and||
do not have short-circuit properties – both sides of the expression are always evaluated.
Numbers¶
Literal numbers are prefixed according to their bases:
Hexadecimal (base-16) with
$
or0x
Decimal (base-10) with
#
Octal (base-8) with
0o
Binary (base-2) with
0b
Unprefixed numbers are hexadecimal (base-16).
Examples:
123
is 123 hexadecimal (291 decimal)$123
is 123 hexadecimal (291 decimal)0x123
is 123 hexadecimal (291 decimal)#123
is 123 decimal0o123
is 123 octal (83 decimal)0b1001
is 1001 binary (9 decimal)0b123
is invalid
Boolean values¶
Any expression that evaluates to a number can be used where a Boolean
value is required. Zero is treated as false, and all non-zero values
are treated as true. Additionally, the string true
is treated as
true, and the string false
is treated as false.
An empty string may be supplied as an argument for Boolean parameters to debugger commands to use the default value, even when subsequent parameters are specified.
Memory accesses¶
The memory access prefix operators allow reading from and writing to emulated address spaces. The memory prefix operators specify the access size and whether side effects are disabled, and may optionally be preceded by an address space specification. The supported access sizes and side effect modes are as follows:
b
specifies an 8-bit access (byte)w
specifies a 16-bit access (word)d
specifies a 32-bit access (double word)q
specifies a 64-bit access (quadruple word)@
suppress side effects!
do not suppress side effects
Suppressing side effects of a read access yields the value reading from address would, with no further effects. For example reading a mailbox with side effects disabled will not clear the pending flag, and reading a FIFO with side effects disabled will not remove an item.
For write accesses, suppressing side effects doesn’t change behaviour in most cases – you want to see the effects of writing to a location. However, there are some exceptions where it is useful to separate multiple effects of a write access. For example:
Some registers need to be written in sequence to avoid race conditions. The debugger can issue multiple writes at the same point in emulated time, so these race conditions can be avoided trivially. For example writing to the MC68HC05 output compare register high byte (OCRH) inhibits compare until the output compare register low byte (OCRL) is written to prevent race conditions. Since the debugger can write to both locations at the same instant from the emulated machine’s point of view, the race condition is not usually relevant. It’s more error-prone if you can accidentally set hidden state when all you really want to do is change the value, so writing to OCRH with side effects suppressed does not inhibit compare, it just changes the value in the output compare register.
Writing to some registers has multiple effects that may be useful to separate for debugging purposes. Using the MC68HC05 as an example again, writing to OCRL changes the value in the output compare register, and also clears the output compare flag (OCF) and enables compare if it was previously inhibited by writing to OCRH. Writing to OCRL with side effects disabled just changes the value in the register without clearing OCF or enabling compare, since it’s useful for debugging purposes. Writing to OCRL with side effects enabled has the additional effects.
The size may optionally be preceded by an access type specification:
p
orlp
specifies a logical address defaulting to space 0 (program)d
orld
specifies a logical address defaulting to space 1 (data)i
orli
specifies a logical address defaulting to space 2 (I/O)3
orl3
specifies a logical address defaulting to space 3 (opcodes)pp
specifies a physical address defaulting to space 0 (program)pd
specifies a physical address defaulting to space 1 (data)pi
specifies a physical address defaulting to space 2 (I/O)p3
specifies a physical address defaulting to space 3 (opcodes)r
specifies direct read/write pointer access defaulting to space 0 (program)o
specifies direct read/write pointer access defaulting to space 3 (opcodes)m
specifies a memory region
Finally, this may be preceded by a tag and/or address space name
followed by a dot (.
).
That may seem like a lot to digest, so let’s look at the simplest examples:
b@<addr>
Refers to the byte at <addr> in the program space of the current CPU while suppressing side effects
b!<addr>
Refers to the byte at <addr> in the program space of the current CPU, not suppressing side effects such as reading a mailbox clearing the pending flag, or reading a FIFO removing an item
w@<addr>
andw!<addr>
Refer to the word at <addr> in the program space of the current CPU, suppressing or not suppressing side effects, respectively.
d@<addr>
andd!<addr>
Refer to the double word at <addr> in the program space of the current CPU, suppressing or not suppressing side effects, respectively.
q@<addr>
andq!<addr>
Refer to the quadruple word at <addr> in the program space of the current CPU, suppressing or not suppressing side effects, respectively.
Adding access types gives additional possibilities:
dw@300
Refers to the word at 300 in the data space of the current CPU while suppressing side effects
id@400
Refers to the double word at 400 in the I/O space of the current CPU while suppressing side effects
ppd!<addr>
Refers to the double word at physical address <addr> in the program space of the current CPU while not suppressing side effects
rw@<addr>
Refers to the word at address <addr> in the program space of the current CPU using direct read/write pointer access
If we want to access an address space of a device other than the current CPU, an address space beyond the first four indices, or a memory region, we need to include a tag or name:
ramport.b@<addr>
Refers to the byte at address <addr> in the
ramport
space of the current CPUaudiocpu.dw@<addr>
Refers to the word at address <addr> in the data space of the CPU with absolute tag
:audiocpu
maincpu:status.b@<addr>
Refers to the byte at address <addr> in the
status
space of the CPU with the absolute tag:maincpu
monitor.mb@78
Refers to the byte at 78 in the memory region with the absolute tag
:monitor
..md@202
Refers to the double word at address 202 in the memory region with the same tag path as the current CPU.
Some combinations are not useful. For example physical and logical
addresses are equivalent for some CPUs, and direct read/write pointer
accesses never have side effects. Accessing a memory region (m
access type) requires a tag to be specified.
Memory accesses can be used as both lvalues and rvalues, so you can write
b@100 = ff
to store a byte in memory.
Functions¶
The debugger supports a number of useful utility functions in expressions.
- min(<a>, <b>)
Returns the lesser of the two arguments.
- max(<a>, <b>)
Returns the greater of the two arguments.
- if(<cond>, <trueval>, <falseval>)
Returns <trueval> if <cond> is true (non-zero), or <falseval> otherwise. Note that the expressions for <trueval> and <falseval> are both evaluated irrespective of whether <cond> is true or false.
- abs(<x>)
Reinterprets the argument as a 64-bit signed integer and returns the absolute value.
- bit(<x>, <n>[, <w>])
Extracts and right-aligns a bit field <w> bits wide from <x> with least significant bit position <n>, counting from the least significant bit. If <w> is omitted, a single bit is extracted.
- s8(<x>)
Sign-extends the argument from 8 bits to 64 bits (overwrites bits 8 through 63, inclusive, with the value of bit 7, counting from the least significant bit).
- s16(<x>)
Sign-extends the argument from 16 bits to 64 bits (overwrites bits 16 through 63, inclusive, with the value of bit 15, counting from the least significant bit).
- s32(<x>)
Sign-extends the argument from 32 bits to 64 bits (overwrites bits 32 through 63, inclusive, with the value of bit 31, counting from the least significant bit).
Source-level debugging¶
Source-level debugging allows you to step through and reference symbols from the original source of a program, rather than the disassembly. This feature is intended for use when emulating a vintage microcomputer and running software on that emulated machine for which you have access to the original source code. For example, if you are developing new software for a vintage machine, source-level debugging allows you to view your own source code while debugging.
Enabling source-level debugging¶
You will need to generate a MAME Debugging Information File. You can then enable source-level debugging by launching MAME with the -src_debug_info command-line option. You may also want to specify -src_debug_search_path and / or -src_debug_prefix_map.
Once source-level debugging is enabled, you will then be able to access the Options menu, Show Source command from the main debugger window. You may switch back and forth between source and disassembly view, and open separate disassembly windows even while the main window shows source.
The source view has a drop-list at the top for selecting which source file to view. The list is populated with source file paths from the MAME Debugging Information File. When the debugger is paused, you can change the selection to view the file you wish. When stepping, the selection is automatically updated to show the file associated with the current PC address.
Source-level breakpoints¶
The expression evaluator includes syntax for specifying a source
file and line number. This can be used in conjunction with the
bpset command to set
a breakpoint on a particular line number,
rather than specifying the address manually. Source-level symbols
evaluate to addresses, and so may also be used in bpset
commands.
When specifying a
source file, you may specify either the full path to the source file
as it existed on the machine that built the binary, the full path
to the source file as it exists on the host running MAME, or any
non-ambiguous substring at the end of the full path (such as
just the filename). The filename or path is surrounded by single
backticks `
, with the line number immediately following
the closing backtick.
Examples:
bpset `c:\full\path\to\file.c`3
Set a breakpoint on the first instruction associated with file.c, line 3
bpset `file.c`3
Set a breakpoint on the first instruction associated with file.c, line 3. Note the use of just the filename instead of a full path. This is fine so long as it is unambiguous relative to the other paths present in the MAME Debugging Information File.
print `to\file.c`3
Print the address of the first instruction associated with file.c, line 3. Note the use of the path ending rather than the full path. This is fine so long as it is unambiguous.
bpset Main
Set a breakpoint on the label named
Main
When the main window is showing source, you may also click on any source line and hit F9 to set a breakpoint on that line.
Source-level stepping¶
The stepping commands step
, over
, and out
have source-level debugging
versions steps,
overs, and
outs, respectively, which operate
at the source level rather than at the disassembly level.
To take step
as an example, if
the PC points to an address associated with line 4, step
(with
no parameters) will advance
to the next instruction, whereas steps
will advance to the next instruction
that is associated with a source line other than 4. When the original
source is assembly language, step
and steps
generally behave the same.
But when the original source is in a higher-level language like C or BASIC, steps
executes the remainder of a block of instructions associated with
the current source line.
When executing Step Into, Step Over, and Step Out from the menu or keyboard
shortcuts, the behavior depends on what the main window is showing.
If the main window shows disassembly, then step
, over
, or out
would be invoked. If the main window shows source,
steps
, overs
, or outs
would be invoked.
Source-level symbol evaluation¶
MAME defines its own built-in symbols based on the current
CPU (e.g., register names) and from its global symbol table
(e.g., beamx
, frame
, names of built-in functions, etc.).
When source-level debugging is enabled, symbols from the source code will
be imported from the MAME Debugging Information File and added
to the list of symbols recognized by MAME's expression evaluator.
A symbol from the MAME Debugging Information File representing a source-code
variable evaluates to the variable's address, not its value. So,
for example, a 16-bit C compiler that compiles int harry = 4;
will
create a symbol named harry
whose value is the address at which
the variable is stored. To view the value you would issue a command
to the debugger console such as print w@harry
.
Symbol collisions and priority: It is possible that source-level symbols present in the MAME Debugging Information File will conflict with built-in symbols. When source-level debugging is enabled, source-level symbols take precedence. You may always force a reference to the built-in symbol by prefixing the symbol with
ns\
("not source"). For example, suppose the MAME Debugging Information File includes the source-level symboly
which conflicts with the symbol for the registery
. References toy
will be interpreted as the source-level symboly
. References tons\y
, such as:print ns\y
print b@ns\y
wpset ns\y
will be interpeted as the register
y
.Case sensitivity: When the debugger evaluates expressions, symbols are generally interpreted case-insensitively when source-level debugging is not active. Many programming languages treat symbols case-sensitively. This can lead to source-level symbols like
Foo
andfoo
being present in the MAME Debugging Information File simultaneously. When source-level debugging is active and MAME's expression evaluator encounters a symbol, it will give precendence to a case-sensitive match. If no such match is found, it will look for a case-insensitive match.
Address offsets¶
In some cases, the assembler or compiler does not know where the generated code will be loaded at run-time. For example, the program might be written in position-independent code to allow an operating system to decide where to load the code at run-time. In such cases, the assembler or compiler will be unable to provide the correct addresses in the MAME Debugging Information File. A build tool that supports run-time relocation could then start numbering its addresses at 0, assuming that the operating system will apply the appropriate offset when the binary is loaded. Similarly, the MAME debugger may also apply an offset to the addresses from the debugging info so that the resulting addresses match where the code is loaded at run-time. You can do this in two ways:
On the command-line, specify -src_debug_offset with the offset to apply. This approach makes sense if the code to be debugged is reliably loaded at an offset you can predict.
At any time during the debugging of the program, use the command sdoffset from the debugger console with the offset to apply. This approach can be used by users or LUA scripts that need to inspect memory to determine where the program was loaded. Any breakpoints set before
sdoffset
was executed will need to be removed and re-added so the new offset can be applied. Any source-level symbols loaded from the MAME Debugging Information File will automatically evaluate to values with the new offset the next time a command references them.
Generating MAME Debugging Information Files¶
MAME Debugging Information Files (or .mdi
files) are generated by
assemblers or compilers that target machines emulated by MAME. Currently
.mdi
files adhere to a single binary format called "Simple". In the
future, new formats may be created as the need arises. The Simple format includes:
Full or relative paths to the source files input to the build tool
Mappings from source file and line numbers to blocks of 16-bit addresses where the corresponding instructions reside
Mappings from symbol names to 16-bit addresses
These symbols can either be global or scoped
Scoped symbols can either have fixed values or values dependent on register values (e.g., stack local variables)
The recommended way for build tools to generate .mdi
files is to use
the small static library libmame_srcdbg_static.a
. The source code resides
in the MAME tree at src/lib/srcdbg
, and the library gets built to
a configuration-specific subdirectory, such as
build/linux_gcc/bin/x64/Release/libmame_srcdbg_static.a
. The header
file srcdbg_api.h
includes declarations and documentation for the C
functions comprising the API, along with comments that describe how to use
them.
Danger
Tools must not rely on any functionality
other than that declared in srcdbg_api.h
and srcdbg_format.h
. Other files will
change without warning.
Tools written in C++ can include srcdbg_api.h
and link to
libmame_srcdbg_static.a
without any further makefile changes. Since
the library's API is pure C, tools
written in C can also include srcdbg_api.h
and link to
libmame_srcdbg_static.a
, but will need to add the C++ standard library
to the link line (as the library's implementation is C++). For example,
cc -m64 -o mytool mytool.c -L/path/to/lib/dir -lmame_srcdbg_static -lstdc++
.
Note that -lstdc++
must appear at the end.
Tools not written in C or C++ may be able to use the shared library
libmame_srcdbg_shared.so
or mame_srcdbg_shared.dll
, assuming the tool
is written in a language that supports interfacing with shared libraries.
Linux shared library versioning
On Linux the shared library follows the recommended versioning names, with
the initial version of the library's real name being
libmame_srcdbg_shared.so.1.0
and soname being libmame_srcdbg_shared.so.1
.
MAME does not have a setup program, so the responsibility is on tools
redistributing the shared library to perform the usual shared library installation
steps. For example, a tool might want to insulate itself from the machine's
environment, and tuck its own copy of the shared library into a tool-specific
folder, and use the -rpath
linker option to declare where the tool can
find libmame_srcdbg_shared.so
at run-time. Alternatively, a tool could
install libmame_srcdbg_shared.so
into a machine-wide folder like
/usr/lib
and run sudo ldconfig
to register the library and create
a symbolic link from the soname to the real name. In any case, the tool
will need to manually create a symbolic link from the linker name
libmame_srcdbg_shared.so
to either the soname
(if ldconfig
was run) or directly to the real
name to ensure that the build-time linker can find libmame_srcdbg_shared.so
.
Before proceeding with any of these options, it's recommended that you read up
on Linux shared library versioning, for example:
https://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html
If consuming neither the static nor shared version of the MAME srcdbg library is
feasible, tools may also manually generate the binary format directly. The format
is defined in src/lib/srcdbg/srcdbg_format.h
. Because this is error-prone,
tools should prefer using the static or shared library over generating the binary
format directly.
Viewing MAME Debugging Information Files with srcdbgdump¶
A small console executable, srcdbgdump is built along with other MAME tools. It may be used to view the contents of MAME Debugging Information files.