Open main menu

Difference between revisions of "OpenTasks/Tools/Game script decompiler"

m
(→‎The Task:: Correct link to accessible target)
 
(5 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{Infobox_OpenTasks|
{{Infobox_ClosedTasks|
taskname=Game script (bytecode) decompiler|
taskname=Game script (bytecode) decompiler|
gsocworkload=100%|
gsocworkload=100%|
techcontact=[[User:Fingolfin|Max Horn]], [[User:LordHoto|Johannes Schickel]]|
techcontact=[[User:LordHoto|Johannes Schickel]]|
subsystem=Tools|
subsystem=Tools|
taskstatus=GSOC 2007, 2009, 2010, Partially completed
}}
}}


__TOC__
__TOC__


<span style="color:red">TODO: This task was partially implemented during GSoC 2010. For the first time, it seems we have something that we really can use. However, for now only stack based bytecodes are "officially" supported, and more work is still needed, plus this has not been merged into trunk yet. Thus, we should keep this task's description around for the time being.</span>
<span style="color:red">TODO: This task was partially implemented during GSoC 2010. For the first time, it seems we have something that we really can use. However, for now only stack-based bytecodes are "officially" supported, and more work is still needed, plus this has not been merged into trunk yet. Thus, we should keep this task's description around for the time being.</span>


===Background:===
===Background:===


Most adventure engines ScummVM supports are driven by bytecode scripts, which control the game behavior. For various reasons (in particular, debugging), it's very useful to be able to take such a script and decompile it into something resembling regular code (with variables, loops, if statements etc.). We have a tool for doing this for the bytecode used by SCUMM games. This tool is called descumm. We also have similar (but less advanced) tools for various other engines, in our [http://scummvm.svn.sourceforge.net/viewvc/scummvm/tools/trunk tools module in SVN].
Most adventure engines ScummVM supports are driven by bytecode scripts, which control the game behavior. For various reasons (in particular, debugging), it's very useful to be able to take such a script and decompile it into something resembling regular code (with variables, loops, if statements etc.). We have a tool for doing this for the bytecode used by SCUMM games. This tool is called descumm. We also have similar (but less advanced) tools for various other engines, in our [https://github.com/scummvm/scummvm-tools tools module in git].


Currently, descumm uses a relatively crude heuristic to detect "if" statements, loops and so on. In particular, it only supports loops with the condition at the start (think "while(...) { ... }" ), not such with the condition at the end (as in "do { ... } while(...)" and also suffers from various other limitations (like not being able to correctly recognize "continue" and "break" statements.
Currently, descumm uses a relatively crude heuristic to detect "if" statements, loops and so on. In particular, it only supports loops with the condition at the start (think "while(...) { ... }" ), not such with the condition at the end (as in "do { ... } while(...)" and also suffers from various other limitations (like not being able to correctly recognize "continue" and "break" statements.


The whole thing is somewhat complicated by the fact that there are two main versions of this bytecode: V5 and older (register based bytecode), and V6 and newer (stack based). We handle the registed based code quite well, but not so the stack based one. And in fact most non-SCUMM engines use a stack based virtual machine.
The whole thing is somewhat complicated by the fact that there are two main versions of this bytecode: V5 and older (register-based bytecode) and V6 and newer (stack-based). We handle the register-based code quite well, but not so the stack based one. And in fact most non-SCUMM engines use a stack-based virtual machine.


===The Task:===
===The Task:===


Write a ''generic'' bytecode decompiler with the goal of ultimately being able to decompile bytecode of all games which use a stack based bytecode virtual machine. You can look at existing work from GSoC 2007, our descumm compiler (only SCUMM v6 and newer is stack based) and some other de*.cpp files in our [http://scummvm.svn.sourceforge.net/viewvc/scummvm/tools/trunk/ tools SVN module]. Your goal should be to support at least two different bytecode systems (i.e., from two different ScummVM engines). But everything should be designed and implemented with the ultimate goal of being able to support all such bytecode systems (e.g. by providing a suitable API or subclassing facilities that make it possible to hook in parsers for arbitrary bytecode variants). Engines which use a stack based virtual machine include SCUMM v6+, KYRA, SWORD1, SWORD2, ... For many of these, free game demos are available, too. We would help you by providing dumps of scripts etc.
Write a ''generic'' bytecode decompiler with the goal of ultimately being able to decompile bytecode of all games which use a stack-based bytecode virtual machine. You can look at existing work from GSoC 2007, our descumm compiler (only SCUMM v6 and newer is stack-based) and some other de*.cpp files in our [https://github.com/scummvm/scummvm-tools tools git module]. Your goal should be to support at least two different bytecode systems (i.e., from two different ScummVM engines). But everything should be designed and implemented with the ultimate goal of being able to support all such bytecode systems (e.g. by providing a suitable API or subclassing facilities that make it possible to hook in parsers for arbitrary bytecode variants). Engines which use a stack-based virtual machine include SCUMM v6+, Kyra, Sword1, Sword2, ... For many of these, free game demos are available, too. We would help you by providing dumps of scripts etc.


A good starting point for this might be the [http://jode.sourceforge.net/ Jode] Java bytecode decompiler which does a rather good job at decompiling. Another useful site might be [http://www.program-transformation.org/Transform/DeCompilation], and [http://vanemmerikfamily.com/mike/master.pdf this thesis]. The technical contact has several ideas on how to approach this project, too -- as usual, you are expected to talk to us and ask for help and ideas!
A good starting point for this might be the [http://jode.sourceforge.net/ Jode] Java bytecode decompiler which does a rather good job at decompiling. Another useful site might be [http://www.program-transformation.org/Transform/DeCompilation], and [http://www.backerstreet.com/decompiler/vanEmmerik_ssa.pdf this thesis]. The technical contact has several ideas on how to approach this project, too -- as usual, you are expected to talk to us and ask for help and ideas!


The tool should run on at least Linux, Mac OS X and Windows. Acceptable languages include C/C++, Python, Perl -- other languages might be OK, but please consult with us first -- after all, other developers also will have to use and maintain your code. For the same reason it is obviously mandatory that your code be well documented.
The tool should run on at least Linux, Mac OS X and Windows. Acceptable languages include C/C++, Python, Perl -- other languages might be OK, but please consult with us first -- after all, other developers also will have to use and maintain your code. For the same reason it is obviously mandatory that your code be well documented.


Actually, we probably would prefer if it was written in C++ and was written as kind of a library, with the command line tool just a frontend to the library. This "library" approach would make it possible to include the decompiler into ScummVM itself, making it possible to decompile scripts on the fly from ScummVM's built-in debugger console.
Actually, we probably would prefer if it was written in C++ and was written as kind of a library, with the command line tool just a frontend to the library. This "library" approach would make it possible to include the decompiler into ScummVM itself, making it possible to decompile scripts on the fly from ScummVM's built-in debugger console.


===Note:===
===Note:===


This task was being worked on as part of the Google Summer of Code 2007 and again 2009, but unfortunately in both cases the result sadly is incomplete and never reached a satisfactory status. It is probably best to start from scratch, but you can find the [http://scummvm.svn.sourceforge.net/viewvc/scummvm/tools/branches/gsoc2007-decompiler/ code from 2007 here] and the [http://scummvm.svn.sourceforge.net/viewvc/scummvm/tools/branches/gsoc2009-decompiler/decompiler/ code from 2009 here].
This task was being worked on as part of the Google Summer of Code 2007 and again 2009, but unfortunately in both cases the result sadly is incomplete and never reached a satisfactory status. It is probably best to start from scratch, but you can find the [http://scummvm.svn.sourceforge.net/viewvc/scummvm/tools/branches/gsoc2007-decompiler/ code from 2007 here] and the [http://scummvm.svn.sourceforge.net/viewvc/scummvm/tools/branches/gsoc2009-decompiler/decompiler/ code from 2009 here].

Latest revision as of 14:19, 16 December 2014

Closed Task
Task Name Game script (bytecode) decompiler
Technical Contact(s) Johannes Schickel
Subsystem Tools
Status GSOC 2007, 2009, 2010, Partially completed

TODO: This task was partially implemented during GSoC 2010. For the first time, it seems we have something that we really can use. However, for now only stack-based bytecodes are "officially" supported, and more work is still needed, plus this has not been merged into trunk yet. Thus, we should keep this task's description around for the time being.

Background:

Most adventure engines ScummVM supports are driven by bytecode scripts, which control the game behavior. For various reasons (in particular, debugging), it's very useful to be able to take such a script and decompile it into something resembling regular code (with variables, loops, if statements etc.). We have a tool for doing this for the bytecode used by SCUMM games. This tool is called descumm. We also have similar (but less advanced) tools for various other engines, in our tools module in git.

Currently, descumm uses a relatively crude heuristic to detect "if" statements, loops and so on. In particular, it only supports loops with the condition at the start (think "while(...) { ... }" ), not such with the condition at the end (as in "do { ... } while(...)" and also suffers from various other limitations (like not being able to correctly recognize "continue" and "break" statements.

The whole thing is somewhat complicated by the fact that there are two main versions of this bytecode: V5 and older (register-based bytecode) and V6 and newer (stack-based). We handle the register-based code quite well, but not so the stack based one. And in fact most non-SCUMM engines use a stack-based virtual machine.

The Task:

Write a generic bytecode decompiler with the goal of ultimately being able to decompile bytecode of all games which use a stack-based bytecode virtual machine. You can look at existing work from GSoC 2007, our descumm compiler (only SCUMM v6 and newer is stack-based) and some other de*.cpp files in our tools git module. Your goal should be to support at least two different bytecode systems (i.e., from two different ScummVM engines). But everything should be designed and implemented with the ultimate goal of being able to support all such bytecode systems (e.g. by providing a suitable API or subclassing facilities that make it possible to hook in parsers for arbitrary bytecode variants). Engines which use a stack-based virtual machine include SCUMM v6+, Kyra, Sword1, Sword2, ... For many of these, free game demos are available, too. We would help you by providing dumps of scripts etc.

A good starting point for this might be the Jode Java bytecode decompiler which does a rather good job at decompiling. Another useful site might be [1], and this thesis. The technical contact has several ideas on how to approach this project, too -- as usual, you are expected to talk to us and ask for help and ideas!

The tool should run on at least Linux, Mac OS X and Windows. Acceptable languages include C/C++, Python, Perl -- other languages might be OK, but please consult with us first -- after all, other developers also will have to use and maintain your code. For the same reason it is obviously mandatory that your code be well documented.

Actually, we probably would prefer if it was written in C++ and was written as kind of a library, with the command line tool just a frontend to the library. This "library" approach would make it possible to include the decompiler into ScummVM itself, making it possible to decompile scripts on the fly from ScummVM's built-in debugger console.

Note:

This task was being worked on as part of the Google Summer of Code 2007 and again 2009, but unfortunately in both cases the result sadly is incomplete and never reached a satisfactory status. It is probably best to start from scratch, but you can find the code from 2007 here and the code from 2009 here.