OpenTasks/Tools/Game script decompiler

From ScummVM :: Wiki
< OpenTasks
Revision as of 13:20, 8 March 2012 by Strangerke (talk | contribs) (Move decompiler to closed tasks)
Jump to navigation Jump to search
Closed Task
Task Name Game script (bytecode) decompiler
Technical Contact(s) Johannes Schickel
Subsystem Tools
Status N/A

TODO: This task was partially implemented during GSoC 2010. For the first time, it seems we have something that we really can use. However, for now only stack-based bytecodes are "officially" supported, and more work is still needed, plus this has not been merged into trunk yet. Thus, we should keep this task's description around for the time being.

Background:

Most adventure engines ScummVM supports are driven by bytecode scripts, which control the game behavior. For various reasons (in particular, debugging), it's very useful to be able to take such a script and decompile it into something resembling regular code (with variables, loops, if statements etc.). We have a tool for doing this for the bytecode used by SCUMM games. This tool is called descumm. We also have similar (but less advanced) tools for various other engines, in our tools module in git.

Currently, descumm uses a relatively crude heuristic to detect "if" statements, loops and so on. In particular, it only supports loops with the condition at the start (think "while(...) { ... }" ), not such with the condition at the end (as in "do { ... } while(...)" and also suffers from various other limitations (like not being able to correctly recognize "continue" and "break" statements.

The whole thing is somewhat complicated by the fact that there are two main versions of this bytecode: V5 and older (register-based bytecode) and V6 and newer (stack-based). We handle the register-based code quite well, but not so the stack based one. And in fact most non-SCUMM engines use a stack-based virtual machine.

The Task:

Write a generic bytecode decompiler with the goal of ultimately being able to decompile bytecode of all games which use a stack-based bytecode virtual machine. You can look at existing work from GSoC 2007, our descumm compiler (only SCUMM v6 and newer is stack-based) and some other de*.cpp files in our tools git module. Your goal should be to support at least two different bytecode systems (i.e., from two different ScummVM engines). But everything should be designed and implemented with the ultimate goal of being able to support all such bytecode systems (e.g. by providing a suitable API or subclassing facilities that make it possible to hook in parsers for arbitrary bytecode variants). Engines which use a stack-based virtual machine include SCUMM v6+, Kyra, Sword1, Sword2, ... For many of these, free game demos are available, too. We would help you by providing dumps of scripts etc.

A good starting point for this might be the Jode Java bytecode decompiler which does a rather good job at decompiling. Another useful site might be [1], and this thesis. The technical contact has several ideas on how to approach this project, too -- as usual, you are expected to talk to us and ask for help and ideas!

The tool should run on at least Linux, Mac OS X and Windows. Acceptable languages include C/C++, Python, Perl -- other languages might be OK, but please consult with us first -- after all, other developers also will have to use and maintain your code. For the same reason it is obviously mandatory that your code be well documented.

Actually, we probably would prefer if it was written in C++ and was written as kind of a library, with the command line tool just a frontend to the library. This "library" approach would make it possible to include the decompiler into ScummVM itself, making it possible to decompile scripts on the fly from ScummVM's built-in debugger console.


Note:

This task was being worked on as part of the Google Summer of Code 2007 and again 2009, but unfortunately in both cases the result sadly is incomplete and never reached a satisfactory status. It is probably best to start from scratch, but you can find the code from 2007 here and the code from 2009 here.