Difference between revisions of "Auto detection"

From ScummVM :: Wiki
Jump to navigation Jump to search
(Auto detection ideas; not finished)
 
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
=== The problem ===
== Introduction ==
Right now, the SCUMM engine suffers from an abundance of different game targets. We distinguish multiple game targets for many games. Of course this has very good and valid technical reasons, but from the users perspective, it is a nuisance.
This page talks about auto detection of games in ScummVM. It's mostly written from the perspective of the SCUMM engine, but a lot said here should hold true for arbitrary engines.


Furthermore, we have a very good MD5 based auto detection system; but if the MD5 of a game is not known, we fall back to an extremly simplistic file name based detection scheme. This approach is very limited, e.g. "00.LFL" is a file name which implies a multitude of different possible games (Maniac Mansion, Zak, Indy3EGA, Indy3VGA, Indy3Towns, Loom, Loom Towns).


In order to detect based on filenames, we have to generate all possible file names; some code duplication exists here between the detector and the openRoom() method; this can lead to problems if the code doesn't perfectly match (a game might be detectable, but later not usable, or the other way around; see the problems with "monkey" vs. "monkey1").


Also some of the games are avaible in multiple versions which each use different filenames; most notably, the Mac versions of many games use long verbose filenames. We currently deal with these using the generateSubstResFileName() method. This works, but means yet another translation table is needed and has to be maintained.
== Detection vs. Instantiation ==
Insight: 'detect' and 'create' are extremly similar. The main difference is in their precise input and output; what happens internally, is actually very similar.


Then, the MD5 system is not fully unified; for some games, we already distinguish multiple variants based on the MD5, so we have two seperate MD5 tables. This is not a major problem, but not nice either, and certainly somewhat arbitrary and awkward.
'create' receives some hints (gameid, platform, language), and using those
plus MD5 data, has to figure out for a given path which (unique) game settings are appropriate for that path. In other words, it returns either a single game setting, or none at all.


To sum it up:
'detect' is pointed at a directory, but given no further hints. It has to identify all (potential) games in that directory and returns a list of hints usable by 'create' (i.e. gameid, platform, language).
* Too many targets, confusing our users
* MD5 system is split, only one part of it is generated from a common source file, the rest is maintained manually
* Limited detection capabilities if no MD5 is present
* File names are generated in at least two different places in the code, resulting in potential inconsistencies


Seeing that, we should be able to combine the core parts of the two algorithm. For a 'create', we essentially run a 'detect' and compare the given game hint with the game hint(s) returned by 'detect', and if a unique match exists, proceed with that.


As a further consequence, gameids actually become optional: In some cases (in particular, in case of an MD5 match), ScummVM is able to determine the precise game settings given only the path to the game data and nothing else!


=== Upgrading config ===
Upgrading should be smooth and easy. Existing config files should continue to work, and be updated automatically. So we need to put into place a system which translates old target names into new ones. That shouldn't be too hard. We could even automatically update the config file, although that would mean making it unusable for older ScummVM versions, so I'd rather not do it unless we have to. It's fine to add additional keys, though, e.g. platform/language settings.


A simple way to allow such backward compatibility: simply provide a table which maps old targets to the new ones, possibly containing some extra hinting info.


== Detection in the SCUMM engine ==
What was said in the previous section can be used to improve the SCUMM detection code.
To this end, we have to restructure and unify some of our current data sets. Right now, we use MD5 checksums in the SCUMM engine for two different purposes:
* Mapping an MD5 to a triple of (gameid, platform, language)
* Mapping an MD5 to a specific game instances, i.e.
      (game desc, SCUMM/HE version, feature flags, platform, language, etc.)


=== Target descriptions ===
The SCUMM engine is also troubled by the fact that it currently uses the gameid to get an initial guess for the gamesettings. However, that guess can be quite inaccurate; e.g. for Monkey Island, it defaults to SCUMM version 4, even though some versions of MI are version 5 games. This leads to some annoying problems.
If we reduce the number of game descriptions, we'd also loose some variety in the game descriptions; not a terrible problem, but annoying. However, it would be easy to allow a system were game variants can extend the base game description. In fact, multiple_versions_md5_settings already allows for this. But instead of specifying a full new desc string for each variant, we'd just specify an (optional) addition.
Example: All indy3 variants share the base description "Indiana Jones and the Last Crusade". The EGA version could add "(EGA)", the VGA version "(VGA)", the FM-TOWNS version... you get the idea.


So, we would specify the description in three ways:
# The base target ("indy3") carries the base desc
# Every MD5 table entry can optionally augment the desc with a string; that string is put into paranthesis and added to the end of the base desc, whenever the game detector assembles a GameSettings record for that game
# For non-MD5 detection, we can add a table which allows differentiating based on whether the file is OLD_BUNDLE vs. SMALL HEADER vs. MODERN; and of course based on the actual file names.


So, instead of having a table that maps a gameid to a single game setting struct, we could use a one-to-many relation instead.


=== Old bundle, USE_KEYS ===
More specifically: For each gameid, we would allow multiple different game settings. To the end user, everything remains transparent, because each game id still maps to a unique (basic) game description. The "multiple settings" would be internal only.
Some games exist in both old bundle and small header variants. Distinguishing isn't hard, we just have to look at the first few bytes of 00.LFL.  


Also, some game variants use XOR encrypted data files, some don't; in particular indy3ega does, while the VGA version don't use encryption. We could either try to automatically detect this
We then would extend the MD5Table table struct by (1) a way to specify which of the multiple variants for the game settings is meant, and (2) a field for an "extra description" string, like "CD", Floppy", "VGA", etc. (see also the section on "Standardized descriptions" below).  


With this, it would be possible to get rid of multiple_versions_md5_settings again. (Well, actually, my explanations here are probably a bit sketchy and incomplete. Feel free to ask me, Fingolfin, for clarifications).


=== File names ===
Another note: We already support a "basename" config file entry, and base the "_gameName" on this. And in addition, we also have generateSubstResFileName. These should be unified, somehow.


Now, for the mapping of filenames:
  for the detection part, we could use a table like this:
  string "filename" is mapped to a struct
  struct {
    String target;  // E.g. "indy3"
    String additionalDesc;  // e.g. "EGA" or "Demo" or empty
    int heVersion; // if not equal to -1: alternative HE version
  };


Of course, multiple entries may be present for a single filename; like for "00.LFL", we would have several entries.
=== Detection done without MD5 ===
MD5 detection is a very powerful tool. However, it's an all-or-nothing approach: Either we know a game variant and its MD5, or we don't. If we relied on MD5 based detection exclusively, then that would shut out users of e.g. fan made translations, or so far unknown game variants.


Hence, other means are necessary. Some thoughts on that follow.


TODO: For detection, we only need one name, e.g. "samnmax.sm0"; but to actually use the game, we often need a pattern (in this case, "samnmax.sm%d". How to handle this best
Files can be distinguished by their content as follows:
* For 00.LFL files, to distinguish OLD_BUNDLE vs. SMALL_HEADER: look at first two bytes of the file:
** 0xF701 (or 0xF7010000), bytes 5+6 are '0R' -> V3 small header
** 0xFFFE -> V2 or V3 old bundle (really is 0x0001, XOR encoded using 0xFF)
** 0xCEF5 -> V1 game (really is 0x0A31, XOR encoded using 0xFF)
* For 000.LFL files (V4 games:loomcd, monkeyega, monkeyvga, pass), bytes 5+6 are 'RN', not encoded (corresponds to 'RNAM' in newer games).
* For *.000 files: they start with 0x3b272824 ('RNAM' XOR encoded using 0x69)
* For *.la0 files: they also start with 'RNAM', not encoded this time.
* For *.he0 files: Apprently they start with MAXS, encoded; sometimes also with RNAM.


TODO: Instead of forgetting the file name we found during detection, it would be clever to remember that name, somehow. Maybe a "basename" setting like in the config?
In addition, of course the filename itself carries valuable information. One can narrow down the SCUMM version, and in the case of newer games, also may deduce the gameid from the filename. And sometimes even the platform and/or language. We currently do not take full advantage of this.
openRoom() should take advantage of this; also the code in sound.cpp (and anything else which uses generateSubstResFileName right now).




=== Detection ===
=== Building blocks ===
The core for the revised create/detect scheme would be a function <code>detectGames</code> that takes a path, and optionally a set of hints (like gameid, platform, language. (Note: I am not yet sure whether those 'hints' will be actually useful, so maybe we can do w/o them).


TODO: Detection should be able to take "hints" from the user; i.e. if "indy3" is specified, then we'll assume any 00.LFL we encounter to be part of "indy3".  
It returns a set of potential games. The set entries will contain: A ScummGameSettings record, a SubstResFileNames record (or equivalent, as I plan to modify the file name subsitution scheme, see below), an MD5 (computed, may be misssing), and the name/path of the 'detecfile' (the file that was used to identify that game, and to which the MD5 checksum belongs).


For 00.LFL files, to distinguish OLD_BUNDLE vs. SMALL_HEADER: look at first two bytes of the file:
This function can then both be used by the regular detector, as well as the 'create' function. The latter would in fact become quite simple:
0xF701 (or 0xF7010000), bytes 5+6 are '0R' -> V3 small header
# Call <code>detectGames</code> to obtain <code>gameList</code>
0xFFFE -> V2 or V3 old bundle (really is 0x0001, XOR encoded using 0xFF)
# Remove all entries from <code>gameList</code> which do not match the user specified gameid
0xCEF5 -> V1 game (really is 0x0A31, XOR encoded using 0xFF)
# (Not sure if this step is good/bad/necessary) Remove entries for which the platform does not match (UNK would act like a joker here)
# (Not sure if this step is good/bad/necessary) Remove entries for which the language does not match (UNK would act like a joker here)
# If the size of <code>gameList</code> does not equal (i.e. no unique match was found), abort with an error
# Finally, create a suitable Engine instance
Note: I am not yet sure how to handle target_md5 here. Maybe we don't need it anymore, maybe it could be passed as a hint to <code> detectGames </code>.


For 000.LFL files (V4 games:loomcd, monkeyega, monkeyvga, pass), bytes 5+6 are 'RN', not encoded (corresponds to 'RNAM' in newer games).


For *.000 files: they start with 0x3b272824 ('RNAM' XOR encoded using 0x69)
=== File name handling (revising the SubstResFileNames system) ===
The current file name substitution system, built around SubstResFileNames, tries to solve multiple goals at once: (TODO: Give a list here). This makes some things more complicated than necessary. Also, it can't do all that one would like to be able to do... for example, when used during detection, it should be possible to deduce language/platform information from those filenames (useful for the detector).


For *.la0 files: they also start with 'RNAM', not encoded this time.
Also, there are some efficency issues: If we search through the full list of file name subsitutions for each game, we essentially arrive at a quadratical run time.


For *.he0 files: Apprently they start with MAXS, encoded; sometimes also with RNAM.
Taking a closer look at <code>substResFileNameTable</code>, one quickly notices that almost all entries in there really map a gameid to either a full fixed file name (usually for mac versions), or to a "pattern" filename (think of  GAME.LA0, GAME.LA1, GAME.LA2 etc.). The exception to this is the code to detect MM/Zak variants for NES and C64. If we add dedicated code for the latter, the table finally is reduced to what I just described: A map from gameids to file name "patterns".


Now, take a look at <code>ScummEngine::openRoom</code>. It actually also encodes a lot of knowledge about SCUMM filenames. This is duplicated knowledge. Instead of first putting effort into generating a "good" filename, and then putting more effort into trying to substitute that file name, and doing that over and over again -- wouldn't it be much better to check once which kinds of filenames are needed, and then always generate the right names immediately? In particular, the 'create' function already has to determine the file substitution required for that game... why throw that away?


Hence the idea is born to unify the knowledge / data found in <code>ScummEngine::openRoom</code> and <code>substResFileNameTable</code> into a single mechanism. This is why I wrote above that <code>detectGames</code> should return a SubstResFileNames (or equivalent recorde). We already pass that to the ScummEngine object. We simply extend this to allow to be used directly by <code>ScummEngine::openRoom</code> (and of course other methods) to generate suitable filenames from given data.


=== Proposals ===
How to achieve this? I am not yet fully sure how to do this best... We probably need (at least) the following three parts:
# A way to determine what file naming scheme to be used (for the detector; i.e. a way to loop over all possible detect names for a given game)
# A way to encode the naming scheme (i.e. <code>substResFileNameTable</code> or a successor)
# A function that generates filenames from the <code>substResFileNameTable</code>, given some data (like room & disk number)


I have no complete proposal yet, only various ideas for parts of the problem; I am still working on putting them all together into one coherent solution, so bear with me; this is all relatively new.
The first part should be good enough to also allow the detector to extract useful extra information about the game. E.g. if the filename is 00.man, we know that it's Maniac Mansion Demo; if it's toffzeit.he0, then we are dealing with the German putttime, etc.
 
Parts 2 and 3 go hand in hand. Unfortunately, we can't just use a prinft format string, since in some cases, the file names may need to encode the room number, in others the disk number, and in some cases even letters instead of numbers are used (for HE 98+ games, "(a)" and "(b)" are used as file name extensions, at least)
 
This is still a bit fuzzy. To decide how to proceed, it might help to write down a list of all possible file names, or even a map between filenames <-> gameid, but since that's a lot of work and may just as well be *not* useful at all, I am not doing this for now... :-)
 
TODO: Finish this
 
=== Incomplete code snippets ===
The following snippets are sketches. We may end up doing things totally different. It's just a way for me to make visible what goes on in my head right now, and is neither complete nor necessarily the best way to handle things. You have been warned :-)
 
Random thought: The detector could automatically populate the 'basename' field, too.
In particular, the first use of generateSubstResFileName in Sound::openSfxFile
is used to translate the "basename" of the game; this would be unnecessary if
the basename was already set to the right value.
 
 
For regular use (e.g. in ScummEngine::openRoom, and most places where
generateSubstResFileName is currently being used), we could use a new function like this one:
<pre>
Common::String generateFilename(int room, int diskNumber) {
char buf[128];
 
if (_game.version == 4) {
if (room == 0 || room >= 900) {
sprintf(buf, "%.3d.lfl", room);
} else {
sprintf(buf, "disk%.2d.lec", diskNumber);
}
} else if (_game.heversion >= 98) {
char c;
int disk = 0;
if (_heV7DiskOffsets)
disk = _heV7DiskOffsets[room];
 
switch(disk) {
case 2:
c = 'b';
break;
case 1:
c = 'a';
break;
default:
c = '0';
}
sprintf(buf, _substEntry.formatStr, c);
 
} else if (_substEntry.method == kGenDiskNum) {
sprintf(buf, _substEntry.formatStr, diskNumber);
 
} else if (_substEntry.method == kGenRoomNum) {
sprintf(buf, _substEntry.formatStr, room);
} else {
error("FOO");
}
return buf;
}
</pre>
 
For detection: We should touch every file at most once (in particular, it would be very bad to compute the MD5 of a file multiple times, since many of our targets aren't exactly fast when it comes to disk I/O speed). So instead of looping over all games, let's loop over the files and then split down into cases. At the same time, squeeze any information we can from the filenames and files.
 
<pre>
fullCandidateList = ()
FOR filename in files DO
bool filenameKnown = false;
IF length of filename < 4 THEN
NEXT
ENDIF
tempList = ()
IF (filename contains '.' and is at most 8+1+3=12 chars long) THEN
filenameKnown = true;
SWITCH filename {
case 00.MAN
add to tempList:
gameid = Maniac Mansion, platform = ?, language = en, extra = demo
subst pattern = "%02d.MAN", kGenWithRoomNum
case *.sm0
add to tempList:
gameid = Sam & Max, ...
 
case 00.LFL
look at the file to distinguish between V1, V2/V3OldBundle and V3SH
Then add all possible game variants to tempList
...
case 000.LFL
...
case *.000
...
case *.la0
...
case *.he0
...
 
case *.d64 // Non-extracted C64
add to tempList:
gameid = Maniac Mansion, platform = C64
OR
gameid = Zak, platform = C64
OR
error
...
default:
filenameKnown = false;
ENDSWITCH
ELSE
// Must be a mac name / long name
IF filename = "Maniac Mansion (*).prg" THEN
add to tempList:
gameid = Maniac Mansion, platform = NES, language = *
ELIF name is in mac container list THEN
...
ELSE
search the generic subst list
...
TODO
...
ENDIF
ENDIF
 
 
if filenameKnown then
compute MD5 of the file (we only do this for known filenames to avoid
computing the MD5 of every single file in the directory!)
IF MD5 is found in the table THEN
optionally: perform a sanity check, show if the exact match
  agrees with at least one of our guesses
add this exact match to fullCandidateList
ELSE
add tempList to fullCandidateList
ENDIF
ENDIF
 
ENDFOR
</pre>
 
== Standardized descriptions ==
The following isn't quite about auto detection, but about a related subject: How to canonically construct a game description string from the following information tuple:
  (NAME, PLATFORM, LANGUAGE, EXTRA)
 
The game desc then is one of the following (which one to use in the end isn't the point of this text!):
  NAME (PLATFORM EXTRA LANGUAGE)
  NAME (LANGUAGE PLATFORM EXTRA)
  NAME (EXTRA PLATFORM LANGUAGE)
  ...
I'll stick with the first version for now, but it's not set in stone (yet).
 
Missing/unknown information is left our. So, if none of the extra values are known, the description simply becomes:
  NAME
An alternate approach would be to display "Unknown" or "Unk" instead, but it's not clear whether that would provide any advantage to the regular user.
 
For the language, the two letter abbreviation as returned by Commmon::getLanguageCode() is used (possibly converted to upper case). For the platforms, the following short names are used:
* Acorn
* Amiga
* Atari
* C64
* DOS
* FM-TOWNS
* Mac
* NES
* SEGA
* Win
Currently, the platform functions in common/util.h do not provide these values. We could either modify getPlatformCode() or getPlatformDescription() accordingly, or add a new getPlatformShortDescription() function.
 
Typical values for EXTRA include "Demo", "VGA", "EGA", "CD", "Floppy", "Talkie"
 
For example:
  ("Loom", Macintosh, "", Unknown)
becomes
  Loom (Mac)
 
Another example:
  ("The Secret of Monkey Island", DOS, "CD", French)
becomes
  The Secret of Monkey Island (DOS CD fr)
 
A third example:
  ("Broken Sword 2: The Smoking Mirror", Unknown, "Demo", English)
becomes
  Broken Sword 2: The Smoking Mirror (Demo en)

Latest revision as of 21:05, 8 April 2006

Introduction

This page talks about auto detection of games in ScummVM. It's mostly written from the perspective of the SCUMM engine, but a lot said here should hold true for arbitrary engines.


Detection vs. Instantiation

Insight: 'detect' and 'create' are extremly similar. The main difference is in their precise input and output; what happens internally, is actually very similar.

'create' receives some hints (gameid, platform, language), and using those plus MD5 data, has to figure out for a given path which (unique) game settings are appropriate for that path. In other words, it returns either a single game setting, or none at all.

'detect' is pointed at a directory, but given no further hints. It has to identify all (potential) games in that directory and returns a list of hints usable by 'create' (i.e. gameid, platform, language).

Seeing that, we should be able to combine the core parts of the two algorithm. For a 'create', we essentially run a 'detect' and compare the given game hint with the game hint(s) returned by 'detect', and if a unique match exists, proceed with that.

As a further consequence, gameids actually become optional: In some cases (in particular, in case of an MD5 match), ScummVM is able to determine the precise game settings given only the path to the game data and nothing else!


Detection in the SCUMM engine

What was said in the previous section can be used to improve the SCUMM detection code. To this end, we have to restructure and unify some of our current data sets. Right now, we use MD5 checksums in the SCUMM engine for two different purposes:

  • Mapping an MD5 to a triple of (gameid, platform, language)
  • Mapping an MD5 to a specific game instances, i.e.
     (game desc, SCUMM/HE version, feature flags, platform, language, etc.)

The SCUMM engine is also troubled by the fact that it currently uses the gameid to get an initial guess for the gamesettings. However, that guess can be quite inaccurate; e.g. for Monkey Island, it defaults to SCUMM version 4, even though some versions of MI are version 5 games. This leads to some annoying problems.


So, instead of having a table that maps a gameid to a single game setting struct, we could use a one-to-many relation instead.

More specifically: For each gameid, we would allow multiple different game settings. To the end user, everything remains transparent, because each game id still maps to a unique (basic) game description. The "multiple settings" would be internal only.

We then would extend the MD5Table table struct by (1) a way to specify which of the multiple variants for the game settings is meant, and (2) a field for an "extra description" string, like "CD", Floppy", "VGA", etc. (see also the section on "Standardized descriptions" below).

With this, it would be possible to get rid of multiple_versions_md5_settings again. (Well, actually, my explanations here are probably a bit sketchy and incomplete. Feel free to ask me, Fingolfin, for clarifications).


Detection done without MD5

MD5 detection is a very powerful tool. However, it's an all-or-nothing approach: Either we know a game variant and its MD5, or we don't. If we relied on MD5 based detection exclusively, then that would shut out users of e.g. fan made translations, or so far unknown game variants.

Hence, other means are necessary. Some thoughts on that follow.

Files can be distinguished by their content as follows:

  • For 00.LFL files, to distinguish OLD_BUNDLE vs. SMALL_HEADER: look at first two bytes of the file:
    • 0xF701 (or 0xF7010000), bytes 5+6 are '0R' -> V3 small header
    • 0xFFFE -> V2 or V3 old bundle (really is 0x0001, XOR encoded using 0xFF)
    • 0xCEF5 -> V1 game (really is 0x0A31, XOR encoded using 0xFF)
  • For 000.LFL files (V4 games:loomcd, monkeyega, monkeyvga, pass), bytes 5+6 are 'RN', not encoded (corresponds to 'RNAM' in newer games).
  • For *.000 files: they start with 0x3b272824 ('RNAM' XOR encoded using 0x69)
  • For *.la0 files: they also start with 'RNAM', not encoded this time.
  • For *.he0 files: Apprently they start with MAXS, encoded; sometimes also with RNAM.

In addition, of course the filename itself carries valuable information. One can narrow down the SCUMM version, and in the case of newer games, also may deduce the gameid from the filename. And sometimes even the platform and/or language. We currently do not take full advantage of this.


Building blocks

The core for the revised create/detect scheme would be a function detectGames that takes a path, and optionally a set of hints (like gameid, platform, language. (Note: I am not yet sure whether those 'hints' will be actually useful, so maybe we can do w/o them).

It returns a set of potential games. The set entries will contain: A ScummGameSettings record, a SubstResFileNames record (or equivalent, as I plan to modify the file name subsitution scheme, see below), an MD5 (computed, may be misssing), and the name/path of the 'detecfile' (the file that was used to identify that game, and to which the MD5 checksum belongs).

This function can then both be used by the regular detector, as well as the 'create' function. The latter would in fact become quite simple:

  1. Call detectGames to obtain gameList
  2. Remove all entries from gameList which do not match the user specified gameid
  3. (Not sure if this step is good/bad/necessary) Remove entries for which the platform does not match (UNK would act like a joker here)
  4. (Not sure if this step is good/bad/necessary) Remove entries for which the language does not match (UNK would act like a joker here)
  5. If the size of gameList does not equal (i.e. no unique match was found), abort with an error
  6. Finally, create a suitable Engine instance

Note: I am not yet sure how to handle target_md5 here. Maybe we don't need it anymore, maybe it could be passed as a hint to detectGames .


File name handling (revising the SubstResFileNames system)

The current file name substitution system, built around SubstResFileNames, tries to solve multiple goals at once: (TODO: Give a list here). This makes some things more complicated than necessary. Also, it can't do all that one would like to be able to do... for example, when used during detection, it should be possible to deduce language/platform information from those filenames (useful for the detector).

Also, there are some efficency issues: If we search through the full list of file name subsitutions for each game, we essentially arrive at a quadratical run time.

Taking a closer look at substResFileNameTable, one quickly notices that almost all entries in there really map a gameid to either a full fixed file name (usually for mac versions), or to a "pattern" filename (think of GAME.LA0, GAME.LA1, GAME.LA2 etc.). The exception to this is the code to detect MM/Zak variants for NES and C64. If we add dedicated code for the latter, the table finally is reduced to what I just described: A map from gameids to file name "patterns".

Now, take a look at ScummEngine::openRoom. It actually also encodes a lot of knowledge about SCUMM filenames. This is duplicated knowledge. Instead of first putting effort into generating a "good" filename, and then putting more effort into trying to substitute that file name, and doing that over and over again -- wouldn't it be much better to check once which kinds of filenames are needed, and then always generate the right names immediately? In particular, the 'create' function already has to determine the file substitution required for that game... why throw that away?

Hence the idea is born to unify the knowledge / data found in ScummEngine::openRoom and substResFileNameTable into a single mechanism. This is why I wrote above that detectGames should return a SubstResFileNames (or equivalent recorde). We already pass that to the ScummEngine object. We simply extend this to allow to be used directly by ScummEngine::openRoom (and of course other methods) to generate suitable filenames from given data.

How to achieve this? I am not yet fully sure how to do this best... We probably need (at least) the following three parts:

  1. A way to determine what file naming scheme to be used (for the detector; i.e. a way to loop over all possible detect names for a given game)
  2. A way to encode the naming scheme (i.e. substResFileNameTable or a successor)
  3. A function that generates filenames from the substResFileNameTable, given some data (like room & disk number)

The first part should be good enough to also allow the detector to extract useful extra information about the game. E.g. if the filename is 00.man, we know that it's Maniac Mansion Demo; if it's toffzeit.he0, then we are dealing with the German putttime, etc.

Parts 2 and 3 go hand in hand. Unfortunately, we can't just use a prinft format string, since in some cases, the file names may need to encode the room number, in others the disk number, and in some cases even letters instead of numbers are used (for HE 98+ games, "(a)" and "(b)" are used as file name extensions, at least)

This is still a bit fuzzy. To decide how to proceed, it might help to write down a list of all possible file names, or even a map between filenames <-> gameid, but since that's a lot of work and may just as well be *not* useful at all, I am not doing this for now... :-)

TODO: Finish this

Incomplete code snippets

The following snippets are sketches. We may end up doing things totally different. It's just a way for me to make visible what goes on in my head right now, and is neither complete nor necessarily the best way to handle things. You have been warned :-)

Random thought: The detector could automatically populate the 'basename' field, too. In particular, the first use of generateSubstResFileName in Sound::openSfxFile is used to translate the "basename" of the game; this would be unnecessary if the basename was already set to the right value.


For regular use (e.g. in ScummEngine::openRoom, and most places where generateSubstResFileName is currently being used), we could use a new function like this one:

Common::String generateFilename(int room, int diskNumber) {
	char buf[128];

	if (_game.version == 4) {
		if (room == 0 || room >= 900) {
			sprintf(buf, "%.3d.lfl", room);
		} else {
			sprintf(buf, "disk%.2d.lec", diskNumber);
		}
	} else if (_game.heversion >= 98) {
		char c;
		int disk = 0;
		if (_heV7DiskOffsets)
			disk = _heV7DiskOffsets[room];

		switch(disk) {
		case 2:
			c = 'b';
			break;
		case 1:
			c = 'a';
			break;
		default:
			c = '0';
		}
		sprintf(buf, _substEntry.formatStr, c);

	} else if (_substEntry.method == kGenDiskNum) {
		sprintf(buf, _substEntry.formatStr, diskNumber);

	} else if (_substEntry.method == kGenRoomNum) {
		sprintf(buf, _substEntry.formatStr, room);
	
	} else {
		error("FOO");
	}
	
	return buf;
}

For detection: We should touch every file at most once (in particular, it would be very bad to compute the MD5 of a file multiple times, since many of our targets aren't exactly fast when it comes to disk I/O speed). So instead of looping over all games, let's loop over the files and then split down into cases. At the same time, squeeze any information we can from the filenames and files.

fullCandidateList = ()
FOR filename in files DO
	bool filenameKnown = false;
	IF length of filename < 4 THEN
		NEXT
	ENDIF
	tempList = ()
	IF (filename contains '.' and is at most 8+1+3=12 chars long) THEN
		filenameKnown = true;
		SWITCH filename {
		case 00.MAN
			add to tempList:
				gameid = Maniac Mansion, platform = ?, language = en, extra = demo
				subst pattern = "%02d.MAN", kGenWithRoomNum
			
		case *.sm0
			add to tempList:
				gameid = Sam & Max, ...

		case 00.LFL
			look at the file to distinguish between V1, V2/V3OldBundle and V3SH
			Then add all possible game variants to tempList
			...
		case 000.LFL
			...
		case *.000
			...
		case *.la0
			...
		case *.he0
			...

		case *.d64	// Non-extracted C64 
			add to tempList:
			gameid = Maniac Mansion, platform = C64
			OR
			gameid = Zak, platform = C64
			OR
			error
			...
		default:
			filenameKnown = false;
		ENDSWITCH
	ELSE
		// Must be a mac name / long name
		IF filename = "Maniac Mansion (*).prg" THEN
			add to tempList:
			gameid = Maniac Mansion, platform = NES, language = *
		ELIF name is in mac container list THEN
			...
		ELSE
			search the generic subst list
			...
			TODO
			...
		ENDIF
	ENDIF


	if filenameKnown then
		compute MD5 of the file (we only do this for known filenames to avoid
			computing the MD5 of every single file in the directory!)
		IF MD5 is found in the table THEN
			optionally: perform a sanity check, show if the exact match
			  agrees with at least one of our guesses
			add this exact match to fullCandidateList
		ELSE
			add tempList to fullCandidateList
		ENDIF
	ENDIF

ENDFOR

Standardized descriptions

The following isn't quite about auto detection, but about a related subject: How to canonically construct a game description string from the following information tuple:

 (NAME, PLATFORM, LANGUAGE, EXTRA)

The game desc then is one of the following (which one to use in the end isn't the point of this text!):

 NAME (PLATFORM EXTRA LANGUAGE)
 NAME (LANGUAGE PLATFORM EXTRA)
 NAME (EXTRA PLATFORM LANGUAGE)
 ...

I'll stick with the first version for now, but it's not set in stone (yet).

Missing/unknown information is left our. So, if none of the extra values are known, the description simply becomes:

 NAME

An alternate approach would be to display "Unknown" or "Unk" instead, but it's not clear whether that would provide any advantage to the regular user.

For the language, the two letter abbreviation as returned by Commmon::getLanguageCode() is used (possibly converted to upper case). For the platforms, the following short names are used:

  • Acorn
  • Amiga
  • Atari
  • C64
  • DOS
  • FM-TOWNS
  • Mac
  • NES
  • SEGA
  • Win

Currently, the platform functions in common/util.h do not provide these values. We could either modify getPlatformCode() or getPlatformDescription() accordingly, or add a new getPlatformShortDescription() function.

Typical values for EXTRA include "Demo", "VGA", "EGA", "CD", "Floppy", "Talkie"

For example:

 ("Loom", Macintosh, "", Unknown)

becomes

 Loom (Mac)

Another example:

 ("The Secret of Monkey Island", DOS, "CD", French)

becomes

 The Secret of Monkey Island (DOS CD fr)

A third example:

 ("Broken Sword 2: The Smoking Mirror", Unknown, "Demo", English)

becomes

 Broken Sword 2: The Smoking Mirror (Demo en)