Auto detection

From ScummVM :: Wiki
Revision as of 17:21, 3 March 2006 by Fingolfin (talk | contribs) (→‎Building blocks: Explain how 'detect' would be implemented)
Jump to navigation Jump to search

Introduction

This page talks about auto detection of games in ScummVM. It's mostly written from the perspective of the SCUMM engine, but a lot said here should hold true for arbitrary engines.


Detection vs. Instantiation

Insight: 'detect' and 'create' are extremly similar. The main difference is in their precise input and output; what happens internally, is actually very similar.

'create' receives some hints (gameid, platform, language), and using those plus MD5 data, has to figure out for a given path which (unique) game settings are appropriate for that path. In other words, it returns either a single game setting, or none at all.

'detect' is pointed at a directory, but given no further hints. It has to identify all (potential) games in that directory and returns a list of hints usable by 'create' (i.e. gameid, platform, language).

Seeing that, we should be able to combine the core parts of the two algorithm. For a 'create', we essentially run a 'detect' and compare the given game hint with the game hint(s) returned by 'detect', and if a unique match exists, proceed with that.

As a further consequence, gameids actually become optional: In some cases (in particular, in case of an MD5 match), ScummVM is able to determine the precise game settings given only the path to the game data and nothing else!


Detection in the SCUMM engine

What was said in the previous section can be used to improve the SCUMM detection code. To this end, we have to restructure and unify some of our current data sets. Right now, we use MD5 checksums in the SCUMM engine for two different purposes:

  • Mapping an MD5 to a triple of (gameid, platform, language)
  • Mapping an MD5 to a specific game instances, i.e.
     (game desc, SCUMM/HE version, feature flags, platform, language, etc.)

The SCUMM engine is also troubled by the fact that it currently uses the gameid to get an initial guess for the gamesettings. However, that guess can be quite inaccurate; e.g. for Monkey Island, it defaults to SCUMM version 4, even though some versions of MI are version 5 games. This leads to some annoying problems.


So, instead of having a table that maps a gameid to a single game setting struct, we could use a one-to-many relation instead.

More specifically: For each gameid, we would allow multiple different game settings. To the end user, everything remains transparent, because each game id still maps to a unique (basic) game description. The "multiple settings" would be internal only.

We then would extend the MD5Table table struct by (1) a way to specify which of the multiple variants for the game settings is meant, and (2) a field for an "extra description" string, like "CD", Floppy", "VGA", etc. (see also the section on "Standardized descriptions" below).

With this, it would be possible to get rid of multiple_versions_md5_settings again. (Well, actually, my explanations here are probably a bit sketchy and incomplete. Feel free to ask me, Fingolfin, for clarifications).


Detection done without MD5

MD5 detection is a very powerful tool. However, it's an all-or-nothing approach: Either we know a game variant and its MD5, or we don't. If we relied on MD5 based detection exclusively, then that would shut out users of e.g. fan made translations, or so far unknown game variants.

Hence, other means are necessary. Some thoughts on that follow.

Files can be distinguished by their content as follows:

  • For 00.LFL files, to distinguish OLD_BUNDLE vs. SMALL_HEADER: look at first two bytes of the file:
    • 0xF701 (or 0xF7010000), bytes 5+6 are '0R' -> V3 small header
    • 0xFFFE -> V2 or V3 old bundle (really is 0x0001, XOR encoded using 0xFF)
    • 0xCEF5 -> V1 game (really is 0x0A31, XOR encoded using 0xFF)
  • For 000.LFL files (V4 games:loomcd, monkeyega, monkeyvga, pass), bytes 5+6 are 'RN', not encoded (corresponds to 'RNAM' in newer games).
  • For *.000 files: they start with 0x3b272824 ('RNAM' XOR encoded using 0x69)
  • For *.la0 files: they also start with 'RNAM', not encoded this time.
  • For *.he0 files: Apprently they start with MAXS, encoded; sometimes also with RNAM.

In addition, of course the filename itself carries valuable information. One can narrow down the SCUMM version, and in the case of newer games, also may deduce the gameid from the filename. And sometimes even the platform and/or language. We currently do not take full advantage of this.


Building blocks

The core for the revised create/detect scheme would be a function detectGames that takes a path, and optionally a set of hints (like gameid, platform, language. (Note: I am not yet sure whether those 'hints' will be actually useful, so maybe we can do w/o them).

It returns a set of potential games. The set entries will contain: A ScummGameSettings record, a SubstResFileNames record (or equivalent, as I plan to modify the file name subsitution scheme, see below), an MD5 (computed, may be misssing), and the name/path of the 'detecfile' (the file that was used to identify that game, and to which the MD5 checksum belongs).

This function can then both be used by the regular detector, as well as the 'create' function. The latter would in fact become quite simple:

  1. Call detectGames to obtain gameList
  2. Remove all entries from gameList which do not match the user specified gameid
  3. (Not sure if this step is good/bad/necessary) Remove entries for which the platform does not match (UNK would act like a joker here)
  4. (Not sure if this step is good/bad/necessary) Remove entries for which the language does not match (UNK would act like a joker here)
  5. If the size of gameList does not equal (i.e. no unique match was found), abort with an error
  6. Finally, create a suitable Engine instance

Note: I am not yet sure how to handle target_md5 here. Maybe we don't need it anymore, maybe it could be passed as a hint to detectGames.


The 'detect' part should be pretty obvious now:

  1. Invoke detectGames to obtain gameList
  2. Compute a new list from that, keeping only the gameid, platform, language values
  3. Remove any duplicates from that new list (since we drop some information, some entries may suddently become identical)
  4. Return this list to the caller

File name handling (revising the SubstResFileNames system)

TODO

Standardized descriptions

The following isn't quite about auto detection, but about a related subject: How to canonically construct a game description string from the following information tuple:

 (NAME, PLATFORM, LANGUAGE, EXTRA)

The game desc then is:

 NAME (PLATFORM EXTRA LANGUAGE)

Missing/unknown information is left our. So, if none of the extra values are known, the description simply becomes:

 NAME

An alternate approach would be to display "Unknown" or "Unk" instead, but it's not clear whether that would provide any advantage to the regular user.

For the language, the two letter abbreviation as returned by Commmon::getLanguageCode() is used (possibly converted to upper case). For the platforms, the following short names are used:

  • Acorn
  • Amiga
  • Atari
  • C64
  • DOS
  • FM-TOWNS
  • Mac
  • NES
  • SEGA
  • Win

Currently, the platform functions in common/util.h do not provide these values. We could either modify getPlatformCode() or getPlatformDescription() accordingly, or add a new getPlatformShortDescription() function.

Typical values for EXTRA include "Demo", "VGA", "EGA", "CD", "Floppy", "Talkie"

For example:

 ("Loom", Macintosh, "", Unknown)

becomes

 Loom (Mac)

Another example:

 ("The Secret of Monkey Island", DOS, "CD", French)

becomes

 The Secret of Monkey Island (DOS CD fr)

A third example:

 ("Broken Sword 2: The Smoking Mirror", Unknown, "Demo", English)

becomes

 Broken Sword 2: The Smoking Mirror (Demo en)