272
edits
Dreammaster (talk | contribs) m (Spelling, grammar, and wording fixes) |
Dreammaster (talk | contribs) m (Mention IDA Free 5.0 must be run in Administrator mode.) |
||
(22 intermediate revisions by 9 users not shown) | |||
Line 3: | Line 3: | ||
== Resources == | == Resources == | ||
[ | [https://downloads.scummvm.org/frs/extras/IDA/idafree50.exe IDA Freeware Version 5.0] - IDA is the preferred tool for disassembling old games from scratch. The most recent freeware version no longer supports disassembling DOS games, but this earlier version still supports it. Note: In more recent Windows versions, you will need to run it in Administrator mode, or it will error out on startup. | ||
IDA is | |||
[https://ghidra-sre.org Ghidra] Ghidra is an open source alternative to IDA that can be used for disassembling old games. It is not as mature as IDA and is missing some features, but it has a nice decompiler. | |||
[http://vogons.zetafleet.com/viewtopic.php?t=7323 DosBox Debugger] | [http://vogons.zetafleet.com/viewtopic.php?t=7323 DosBox Debugger] | ||
The DosBox Debugger is an invaluable tool for running old DOS games, to monitor how the program executes, and what values are generated by the executing code. | The DosBox Debugger is an invaluable tool for running old DOS games, to monitor how the program executes, and what values are generated by the executing code. | ||
[https://imhex.werwolv.net/ ImHex Open Source Hex Editor] | |||
A powerful and flexible hex editor for all OSes and the Web. | |||
[http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.htm XVI32 Hex File Viewer] | [http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.htm XVI32 Hex File Viewer] | ||
A very primitive hex editor compared to ImHex, but may be useful if you just need a quick and simple way to view a file's contents and make changes. | |||
[http://www.ctyme.com/rbrown.htm Ralf Brown's Interrupt List] | [http://www.ctyme.com/rbrown.htm Ralf Brown's Interrupt List] | ||
Line 17: | Line 21: | ||
[http://en.wikipedia.org/wiki/X86_assembly_language 8086 Assembly Language] | [http://en.wikipedia.org/wiki/X86_assembly_language 8086 Assembly Language] | ||
For those new to 8086 assembly language, you'll need a handy reference to learn the syntax. The Wikipedia is a good starting point, but you can also simply Google for an introduction as well. | For those new to 8086 assembly language, you'll need a handy reference to learn the syntax. The Wikipedia is a good starting point, but you can also simply Google for an introduction as well. | ||
[http://beginners.re/ "Reverse Engineering for Beginners" free book] | |||
This site has a free eBook that may be useful as a gentle introduction to reverse engineering techniques in general. | |||
[http://godbolt.org/ Compiler Explorer] | |||
A pretty cool online tool that lets you paste in C code and shows you the compiled assembly under various different compilers. Useful if you're familiar with C, and want to see what kinds of assembly are produced for various different code fragments. | |||
[https://www.frida.re/ FRIDA - Dynamic Instrumentation Framework] | |||
Nice tool for easy writing and injecting hooks to game binaries. Useful for watching how code executes, check when the internal functions are called, for dumping structures from memory of a target process and for changing data in memory on the fly. Works for newer binaries of games (32bit and Windows XP). | |||
[http://www.bttr-software.de/products/insight/ Insight - real-mode DOS debugger] | |||
May prove useful as an alternative to the DosBox debugger. | |||
== Using the DosBox Debugger == | == Using the DosBox Debugger == | ||
It's up to the individual if you want to use a debugger when reverse engineering a program. Some prefer a more cerebral challenge of only figuring out code execution using a decompiler tool, whereas others may find using a debugger useful for figuring out what values are passed to functions. I would recommend using a debugger particularly when reversing a game for the purpose of adding ScummVM support. When you start implementing code to implement game functionality, once you've got portions of the game disassembled, it can be immensely useful for tracking down bugs. Particularly if you initially | It's up to the individual if you want to use a debugger when reverse engineering a program. Some prefer a more cerebral challenge of only figuring out code execution using a decompiler tool, whereas others may find using a debugger useful for figuring out what values are passed to functions. I would recommend using a debugger particularly when reversing a game for the purpose of adding ScummVM support. When you start implementing code to implement game functionality, once you've got portions of the game disassembled, it can be immensely useful for tracking down bugs. Particularly if you initially write your code with names that closely match the names you give the methods in the disassembly. | ||
For debugging purposes, if the game is a DOS game, the DosBox Debugger is the best tool I've found for executing and debugging DOS programs. The default distribution of DosBox doesn't have it enabled, but you can either compile DosBox with it enabled, or download a previously compiled executable. See the [http://vogons.zetafleet.com/viewtopic.php?t=7323 DosBox Debugger Thread] for more information. | For debugging purposes, if the game is a DOS game, the DosBox Debugger is the best tool I've found for executing and debugging DOS programs. The default distribution of DosBox doesn't have it enabled, but you can either compile DosBox with it enabled, or download a previously compiled executable. See the [http://vogons.zetafleet.com/viewtopic.php?t=7323 DosBox Debugger Thread] for more information. | ||
Line 53: | Line 69: | ||
=== Naming Methods === | === Naming Methods === | ||
Methods can be renamed using the general 'N' hotkey (as well as via the menus), and the 'Y' can be used to specify a C-like prototype for a method. This is particularly useful when some of the parameters for a method are passed using registers. By explicitly documenting what the method expects, it makes it easier to remember later on when you're reversing methods that call it. Standard methods where parameters are passed via the stack are easy, since IDA can automatically set up the function prototype for you. If a method does have parameters passed in registers, prototypes like the below can be used: | Methods can be renamed using the general 'N' hotkey (as well as via the menus), and the 'Y' can be used to specify a C-like prototype for a method. This is particularly useful when some of the parameters for a method are passed using registers. By explicitly documenting what the method expects, it makes it easier to remember later on when you're reversing methods that call it. Standard methods where parameters are passed via the stack are easy, since IDA can automatically set up the function prototype for you. If a method does have parameters passed in registers, prototypes like the below can be used: | ||
< | <syntaxhighlight lang="c"> | ||
int __usercall sub_100FB<ax>(__int8 param1<al>, int param2<bx>) | int __usercall sub_100FB<ax>(__int8 param1<al>, int param2<bx>) | ||
</ | </syntaxhighlight> | ||
In this case, the method takes an 8-bit parameter in the al register, and another 16-bit value in bx, then returns a result in ax | In this case, the method takes an 8-bit parameter in the al register, and another 16-bit value in bx, then returns a result in ax | ||
Line 63: | Line 79: | ||
When dealing with data, you'll frequently see cases like | When dealing with data, you'll frequently see cases like | ||
< | <syntaxhighlight lang="asm"> | ||
mov bx, 30h | mov bx, 30h | ||
mul bx | mul bx | ||
mov ax, [bx+2D00h] | mov ax, [bx+2D00h] | ||
</ | </syntaxhighlight> | ||
In this case, an initial index in the ax register is multiplied by 30h (30 hexadecimal = 48 decimal). So from this we can determine that the given structure is 48 bytes in size, and can create a new structure accordingly. For smaller sized structures, you may want to create as many 2 byte word fields as needed to make up the correct size for the structure. For larger sizes, the easiest way is to simply declare an array of the needed structure size - 1, and follow it with a single byte field. You can then delete/undefine the array. The remaining byte will keep the structure at the correct size, and you can then later fill in the fields as you find references to them. | In this case, an initial index in the ax register is multiplied by 30h (30 hexadecimal = 48 decimal). So from this we can determine that the given structure is 48 bytes in size, and can create a new structure accordingly. For smaller sized structures, you may want to create as many 2 byte word fields as needed to make up the correct size for the structure. For larger sizes, the easiest way is to simply declare an array of the needed structure size - 1, and follow it with a single byte field. You can then delete/undefine the array. The remaining byte will keep the structure at the correct size, and you can then later fill in the fields as you find references to them. | ||
Line 83: | Line 99: | ||
=== File Access === | === File Access === | ||
One of the easiest places to start a disassembly is generally by identifying file accesses. Using IDA, you can, for example, do a text search for 'open | One of the easiest places to start a disassembly is generally by identifying file accesses. Using IDA, you can, for example, do a text search for 'open', 'read', 'close', etc. to find occurrences of file opening. IDA provides standard comments for many operating system calls, so even in a new disassembly you should be able to locate such calls by their comment text. Likewise for file reading, writing, and closing. Normally, a program will encapsulate these calls into a method of it's own, so your first disassembly step can be in identifying the methods and naming them appropriately with names like 'File_open', 'File_read', and so on. Likewise, giving the passed parameters an appropriate name. In IDA, the 'Y' command can be used to set up an appropriate method signature for methods. By properly naming the method and it's parameters, this will help you in all the methods that call those methods. | ||
For example, if a read method has a 'size' parameter and a 'buffer' parameter, then if a method that calls it passes '200' for the size, and a reference from a location on the stack, you can be confident that the stack entry can be called something like 'readBuffer', and use the '*' (array size) key when looking at the Stack View (Ctrl-K) to set the size of the array to 200 bytes. | For example, if a read method has a 'size' parameter and a 'buffer' parameter, then if a method that calls it passes '200' for the size, and a reference from a location on the stack, you can be confident that the stack entry can be called something like 'readBuffer', and use the '*' (array size) key when looking at the Stack View (Ctrl-K) to set the size of the array to 200 bytes. | ||
Line 101: | Line 117: | ||
=== Graphics access === | === Graphics access === | ||
Another place to get started on the disassembly is the graphic draw routines, those responsible for copying raw pixels to the screen surface. Graphic display | Another place to get started on the disassembly is the graphic draw routines, those responsible for copying raw pixels to the screen surface. | ||
be used to represent different parts of pixels - | |||
Graphic display was complicated in the early PC days by different modes for the different graphics cards writing to memory in different ways. In the Monochrome/Hercules mode, for example, 8 pixels are stored per 8-bit byte. In EGA, the addressing can be complicated by how the display is configured - the same areas of memory may be used to represent different parts of pixels - with the part of a pixel being updated depending on specific values sent to hardware ports. Finally, of them all, the most common 320x200x256 color mode is the easiest to deal with, with each pixel taking up a single byte. | |||
For most of the graphics modes, you can look at them in a similar manner - as a block of data in memory starting at offset A000h:0. Only the number of bytes per line will vary, depending on what the graphics mode is. Assembly routines that deal with the graphics screen will typically have code to figure out screen offsets based on provided x and y parameters, so it will frequently be easy to identify the parameters and figure out how the screen offsets work. For example, in 320x200x256 MCGA mode, an offset on the screen will be calculated using the formula (y * 320) + x. | |||
For | For finding the graphic routines you have two options: | ||
The first is to entirely use IDA, and simply search for immediate values of 'A000h'. Since this is the area of memory that graphics are commonly displayed in, it can be a quick way to locate graphic routines. | |||
The other alternative is to use the DosBox Debugger. It has a use command called 'bpm' that allow you to set a memory breakpoint, which then gets triggered if the given memory address changes. So you could do 'bpm A000:0' to set a breakpoint on the first byte of the screen memory (i.e. the top left hand corner of the screen). Then whichever routine modifies it first will trigger the breakpoint. Using the | The other alternative is to use the DosBox Debugger. It has a use command called 'bpm' that allow you to set a memory breakpoint, which then gets triggered if the given memory address changes. So you could do 'bpm A000:0' to set a breakpoint on the first byte of the screen memory (i.e. the top left hand corner of the screen). Then whichever routine modifies it first will trigger the breakpoint. Using the previously discussed techniques, you can find the same place in your IDA disassembly, and look into reversing that method first. | ||
It will be likely that related functions will be next to each other, so once you've looked into the given identified function, you may also be able to review previous or following functions to see if they have identifiable graphic routines. | It will be likely that related functions will be next to each other, so once you've looked into the given identified function, you may also be able to review previous or following functions to see if they have identifiable graphic routines. | ||
Line 137: | Line 156: | ||
Particularly for cases like that, identifying and naming the graphic/screen methods may be helpful, since you could work the disassembly from both the front end reading the animation, and from the low level drawing of the graphics of the animation. | Particularly for cases like that, identifying and naming the graphic/screen methods may be helpful, since you could work the disassembly from both the front end reading the animation, and from the low level drawing of the graphics of the animation. | ||
=== Practical Examples === | |||
Following are a few other examples of specific strategies I used to make initial progress in figuring out adventure games I've worked on in the past. They may help give you inspiration for your own work: | |||
==== The font system ==== | |||
I frequently like to tackle the font system for a game early on, since it's usually fairly straight-forward to figure out, and it's a nice milestone for your own re-implementation of the game to be able to display visual text. | |||
* I started by taking a screenshot of the game that showed some text, and used Paint to determine the screen co-ordinates of one of the pixels in a character that was displayed on-screen. | |||
* I then calculated what the memory location of that pixel was in the screen. Presuming the fairly standard 320x200 8-bit graphics, the segment would be A000h, and the offset would be y * 320 + x. | |||
* Using the DosBox Debugger to run the game up to a point just before the text was displayed, I set a memory breakpoint for the pixel, and then allowed the game to start to display the text. | |||
* The point where the pixel set was identified as the routine for drawing a character. It's fairly standard for it to take in an x and y position, and the specified character to display. I gave it an appropriate name like 'font_write_char'. | |||
* By examining how the character is used to lookup the pixel data to copy the screen, I was able to identify where the font was stored, and what structure it had. | |||
* By looking at other places that referenced the same memory I was able to identify other font related methods, such as for loading the font, or determining the width of a given character. And then some of the callers of these methods could then be identified. For example, one caller called the method in a loop for every character of a string, so I was able to call it 'font_string_width'. Further, since that method was now known to take in printable strings, I was able to see what called it and identify string handling in other parts of the game. | |||
* Similarly, I was able to identify a method that called font_write_char in a loop, and could name it 'font_write_string'. As with the identified font_string_width, I could then look at callers to font_write_string to identify how other parts of the system pass strings. | |||
* Because the display of information is so important for adventure games, the font_write_string method becomes an excellent starting point for all sorts of further investigations. For example, suppose in your adventure game if you try to combine two arbitrary items together and it prints "That doesn't work". You can set a breakpoint in DosBox debugger in the font_write_string method at it's end, and when it's hit by trying to combine the items, trace backwards to find out which method called it. | |||
* Based on a similar test, I was able to identify methods that worked with the inventory, and the lists in memory of all the available items as well as the list of items in the player's inventory. I then set up an appropriate structure for items. I didn't know all the fields they held initially, but some fields, such as a pointer to the textual name, were easy to figure out, and I could always come back to the structure later and add proper definition for other fields when I figured them out. I then went on to identify other methods that also access the inventory list and identified the methods that handle adding and removing items from the inventory. I was also able to see the method that gets the details of the inventory for displaying the inventory on-screen. This was extremely useful, since it used a standard "sprite draw" method to draw both the inventory area background as well as the glyphs for each item in the player's inventory. And with the sprite drawing method identified, there would be the potential for looking at all it's callers and identifying methods that handle drawing in-game screens and dialogs. | |||
In fact, even other inventory methods like adding and removing proved to be important too, as I was able to check their callers, and identify them as methods that were part of a script engine. These methods were indexed by a table of all possible script command handler methods, so I was able to see which method was using that list to identify the main script execution method. By setting breakpoints in the method and doing various actions in-game, I was able to start figuring out what the various script methods did, and based on that, identifying and naming the methods that they called. A pretty good result from a process that started with a single pixel on the screen. | |||
==== The Hotspot list ==== | |||
In adventure games, hotspots are areas of the screen that has an interactable item. Moving the mouse over the area causes a description of the hotspot. Using a breakpoint in the write string method, I was able to see what the caller was. Then examining the caller to find out how the description came to be passed, I was able to figure out the structure of how the list of hotspots were stored in memory. This allowed me to create a Hotspot structure, and set up the array of hotspots in memory. From there I was able to go in two direcitons. Firstly, by identifying other methods that access the same hotspot list, then finding out which one set up the values, I was able to locate the hotspot loading code, which was part of the overall scene loading code. With scene loading identified, I could look at the other files being accessed, and the structures being loaded, and get further ideas of what information each scene contains. | |||
The other direction I was able to go in was hotspot interaction. By setting breakpoints in other methods that accessed the hotspot list and then trying to interact with a hotspot, I was able to identify the general method that handles actually doing item interactions. In this case, it turned out to use other fields of the Hotspot record, and then load a script and call it to be executed. Since I'd already identified the script execution method, it made it easier to realize that script data was being loaded, since it was immediately passed on to be executed. | |||
==== Scene sprites ==== | |||
Scenes would be kind of boring if they didn't have any animation or action within them. As such adventure games tend to have a main scene drawing method to draw in the background and then add in any sprites for extra items within the scene. I was able to take an item that's obviously animating, and then set a memory breakpoint on a pixel within it just before the new scene was loaded, and see what changes it. The first change was clearing the screen. Next was rendering of the scene background, which in itself was a useful diversion to figure how the scene specified the background to load, and what format backgrounds are stored in. Then the next change came from rendering the sprite onto the scene. In this, I confirmed that the scene items were using the already identified sprite drawing method. So by tracing out of the sprite drawing method (by setting a breakpoint on it's 'retf' opcode and then running DosBox debugger until it it's hit, you can then single step out of the method), I was able to identify the code looping to draw items within the scene, and the overall method that draws the contents of the scene for each frame. | |||
I was also able to start figuring out how the structures for the scene sprites were being stored in memory, with fields such as the position for where to draw the sprite, which set of sprites to use, and what frame number. The frame number particularly is useful.. sprites in games generically collect multiple "frames" in a single sprite, representing all the possible ways the given sprite can be drawn. By seeing what method was called to get a specific frame to pass to the sprite drawing code, I was able to locate the methods that handle managing sprites, and flesh out both it as well as other nearby methods in the disassembly that handled loading a sprite from files. | |||
==== What Followed ==== | |||
In fact, at this point, the only thing I was missing was loading in an appropriate palette to be able to implement the basics of loading a game scene, drawing meaningful sprites within the scene, and having hotspots work. Presuming any other adventure game you look into has similar kinds of structures, this could lead to you having a concrete start at reimplementing the game, and enough methods figured out in the disassembly, and pick any area of the game you want to work on.. at this point you'd have the basics of scene loading, graphics rendering and sprite display, scene interaction, and the basics of actually doing items. Maybe you could.. | |||
* figure out which sprite controls the player, and put memory breakpoints on the sprites position and/or frame. From that, you could identify methods that control moving the character, and from there investigate the pathfinding logic the game uses, leading to you being able to have your re-implemention's character walking around the screen | |||
* Flesh out more of the script system. Maybe concentrate on some of the first actions done in the game, so your reimplementation can likewise do them. | |||
* Look into the music/sound system. From strings in the data segment, or the filenames of specially loaded music drivers, you may be able to figure out where the sound code is loaded, and start looking into what commands the sound and/or music drivers have, and what sound formats are used. | |||
There's lot of areas you can proceed to delve in. Hopefully the suggestions above will help get you over the initial barrier of not being sure how to start, and assist you getting started in your efforts. | |||
== Extending/Logging DosBox == | |||
Sometimes you'll want to be able to compare the operation of the original in DosBox against your implementation in ScummVM. One of the easiest ways to do this is to set up debug(..) calls in ScummVM to output relevant information, and likewise hack the source code of DosBox to likewise output similar information in the equivalent method in the original. This way, you can compare the output of the two in any file comparison tool, like for example TortoiseGitMerge. To do this, the first step is for you to install and compile the DosBox source. You can find relevant guides on the DosBox website, and any help as needed in the Dosbox forums. | |||
Once you're able to compile the source, the next step is add in relevant debugging information when certain CS:IP locations are hit in the executable; if you've been reversing the game executable in IDA, you should be able to use it to figure out specific addresses for whatever method you're interested in. The appropriate place to add in logging information is in the CPU_Core_Normal_Run method's loop in core_normal.cpp. Here you can check for specific memory addresses having been reached, and output values from various registers. For example, considering the following example code fragment I once added: | |||
<pre> | |||
Bit32u segCS = SegValue(cs); | |||
if (segCS == 0x8a80) { | |||
Bit32u segIP = reg_eip; | |||
if (segIP == 0x2E9 || segIP == 0x307) | |||
fprintf(stderr, "-----------------------------\n"); | |||
else if (segIP == 0x3d0) { | |||
Bit16u ax = reg_eax; | |||
fprintf(stderr, "%.2x %.2x\n", ax & 0xff, ax >> 8); | |||
} | |||
} | |||
</pre> | |||
In this case, I was implementing the sound player from World of Xeen, and I needed to output the register/value pairs passed to a "write to adlib" method. So I added this fragment to print their values when the method was called, as well as an extra separator printed out whan the "playSound" method was called, to separate initialization writes that initialize the Adlib card from calls made after a sound effect call is made. | |||
When the changed Dosbox was compiled, I went to a command prompt. Presuming I'd changed the C drive to the Dosbox build directory, and the game in the current directory (on D drive), I used a command: | |||
<pre>c:dosbox 2> d:\temp\dosbox.txt</pre> | |||
The 2> causes a redirection of the stderr output to a file. Afterwards, I was able to compare the result with a similar output from ScummVM. With differences identified, I could set up a conditional breakpoint in ScummVM in the engine's adlib write method for the last call before wrong values started being written, and stepped through the code until I figured out what the engine was doing incorrectly. | |||
== Miscellaneous == | |||
This is a small section of miscellaneous tips and musings to help you as you reverse engineer games | |||
* Don't get too bogged down in the minutiae of what methods do.. it's sometimes best to just get a high level understanding and then implement your own code from scratch. For example, several games I've reversed had complicated logic as part of loading resources to allocate memory handles and reserve space in extended/expanded memory. Obviously this isn't something you need to worry about in your reimplementation anyway. It was enough to know the resource method returned a pointer to the data, and callers were responsible for calling a free method when they were done with it. So all the extra logic could be ignored in favor of simple C new byte[..] and delete[] calls. | |||
* In a similar vein, drawing code in EGA or earlier based games can be particularly labyrinthine. Since you'd be more likely to set up an 8-bit per pixel graphics surface when reimplementing such games, it may be simpler for you to experiment with your own implementation of drawing methods based on examining the raw data. EGA in particular had a nasty graphic layout, with single bytes represent parts of multiple pixels, and port usage to control what portion of the pixels' values were modified. This can make it complex to understand what original drawing code is. And frequently data will be passed to such drawing methods using one pixel per byte, or two pixels per byte in any case. So taking a stab at it yourself first may end up saving time. | |||
* Visual Studio (but alas not IDA) has a nice feature where if you hold down Alt whilst dragging the cursor, you can select a square area of text to copy. I've found this very helping when copying arrays of static data to create arrays. If you've copied a raw block of memory from DosBox, you can paste it into Visual Studio and then Alt-drag the cursor down along the start of all the lines. Then when you can start putting sequences of ', 0x' between the byte values and have it be applied on all the lines. This saves a lot of time converting the data to a properly formatted C array. | |||
== Final Words == | == Final Words == |
edits