Auto detection

From ScummVM :: Wiki
Revision as of 13:26, 10 April 2005 by Fingolfin (talk | contribs) (Auto detection ideas; not finished)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The problem

Right now, the SCUMM engine suffers from an abundance of different game targets. We distinguish multiple game targets for many games. Of course this has very good and valid technical reasons, but from the users perspective, it is a nuisance.

Furthermore, we have a very good MD5 based auto detection system; but if the MD5 of a game is not known, we fall back to an extremly simplistic file name based detection scheme. This approach is very limited, e.g. "00.LFL" is a file name which implies a multitude of different possible games (Maniac Mansion, Zak, Indy3EGA, Indy3VGA, Indy3Towns, Loom, Loom Towns).

In order to detect based on filenames, we have to generate all possible file names; some code duplication exists here between the detector and the openRoom() method; this can lead to problems if the code doesn't perfectly match (a game might be detectable, but later not usable, or the other way around; see the problems with "monkey" vs. "monkey1").

Also some of the games are avaible in multiple versions which each use different filenames; most notably, the Mac versions of many games use long verbose filenames. We currently deal with these using the generateSubstResFileName() method. This works, but means yet another translation table is needed and has to be maintained.

Then, the MD5 system is not fully unified; for some games, we already distinguish multiple variants based on the MD5, so we have two seperate MD5 tables. This is not a major problem, but not nice either, and certainly somewhat arbitrary and awkward.

To sum it up:

  • Too many targets, confusing our users
  • MD5 system is split, only one part of it is generated from a common source file, the rest is maintained manually
  • Limited detection capabilities if no MD5 is present
  • File names are generated in at least two different places in the code, resulting in potential inconsistencies


Upgrading config

Upgrading should be smooth and easy. Existing config files should continue to work, and be updated automatically. So we need to put into place a system which translates old target names into new ones. That shouldn't be too hard. We could even automatically update the config file, although that would mean making it unusable for older ScummVM versions, so I'd rather not do it unless we have to. It's fine to add additional keys, though, e.g. platform/language settings.

A simple way to allow such backward compatibility: simply provide a table which maps old targets to the new ones, possibly containing some extra hinting info.


Target descriptions

If we reduce the number of game descriptions, we'd also loose some variety in the game descriptions; not a terrible problem, but annoying. However, it would be easy to allow a system were game variants can extend the base game description. In fact, multiple_versions_md5_settings already allows for this. But instead of specifying a full new desc string for each variant, we'd just specify an (optional) addition. Example: All indy3 variants share the base description "Indiana Jones and the Last Crusade". The EGA version could add "(EGA)", the VGA version "(VGA)", the FM-TOWNS version... you get the idea.

So, we would specify the description in three ways:

  1. The base target ("indy3") carries the base desc
  2. Every MD5 table entry can optionally augment the desc with a string; that string is put into paranthesis and added to the end of the base desc, whenever the game detector assembles a GameSettings record for that game
  3. For non-MD5 detection, we can add a table which allows differentiating based on whether the file is OLD_BUNDLE vs. SMALL HEADER vs. MODERN; and of course based on the actual file names.


Old bundle, USE_KEYS

Some games exist in both old bundle and small header variants. Distinguishing isn't hard, we just have to look at the first few bytes of 00.LFL.

Also, some game variants use XOR encrypted data files, some don't; in particular indy3ega does, while the VGA version don't use encryption. We could either try to automatically detect this


File names

Another note: We already support a "basename" config file entry, and base the "_gameName" on this. And in addition, we also have generateSubstResFileName. These should be unified, somehow.

Now, for the mapping of filenames:

 for the detection part, we could use a table like this:
 string "filename" is mapped to a struct
 struct {
    String target;  // E.g. "indy3"
    String additionalDesc;  // e.g. "EGA" or "Demo" or empty
    int heVersion; // if not equal to -1: alternative HE version
 };

Of course, multiple entries may be present for a single filename; like for "00.LFL", we would have several entries.


TODO: For detection, we only need one name, e.g. "samnmax.sm0"; but to actually use the game, we often need a pattern (in this case, "samnmax.sm%d". How to handle this best

TODO: Instead of forgetting the file name we found during detection, it would be clever to remember that name, somehow. Maybe a "basename" setting like in the config? openRoom() should take advantage of this; also the code in sound.cpp (and anything else which uses generateSubstResFileName right now).


Detection

TODO: Detection should be able to take "hints" from the user; i.e. if "indy3" is specified, then we'll assume any 00.LFL we encounter to be part of "indy3".

For 00.LFL files, to distinguish OLD_BUNDLE vs. SMALL_HEADER: look at first two bytes of the file: 0xF701 (or 0xF7010000), bytes 5+6 are '0R' -> V3 small header 0xFFFE -> V2 or V3 old bundle (really is 0x0001, XOR encoded using 0xFF) 0xCEF5 -> V1 game (really is 0x0A31, XOR encoded using 0xFF)

For 000.LFL files (V4 games:loomcd, monkeyega, monkeyvga, pass), bytes 5+6 are 'RN', not encoded (corresponds to 'RNAM' in newer games).

For *.000 files: they start with 0x3b272824 ('RNAM' XOR encoded using 0x69)

For *.la0 files: they also start with 'RNAM', not encoded this time.

For *.he0 files: Apprently they start with MAXS, encoded; sometimes also with RNAM.


Proposals

I have no complete proposal yet, only various ideas for parts of the problem; I am still working on putting them all together into one coherent solution, so bear with me; this is all relatively new.