Advanced Detector

From ScummVM :: Wiki
Jump to navigation Jump to search

Advanced Detector

If your engine supports a large number of games (or variants of games) then detecting them can be tricky.

Since some of the game variants will have files with the same name, but differing contents, detection by filename alone is not sufficient, and a fair number of variants may only differ by small sections of data within the entire file, so opening the file and looking for a "magic header" is not reliable either.

So instead, most engines take a checksum (or even better, a hash) of the file to detect the exact version, and to do that, you would need to write code to open the files and run this check into your custom MetaEngine...

Sounds like a lot of work?
Well, to avoid every engine author having to do this themselves (and the codebase ending up with the maintenance headache of 20+ implementations of this which are almost, but not exactly the same!), the ScummVM Infrastructure Team have provided the Advanced Detector!

This provides a standard framework for filename and MD5 based game detection.

The code for this can be found in engines/advancedDetector.*

To use this, you will have to follow the instructions here within your engine's detection.h and detection.cpp.

All you will have to provide is a standard data table of ADGameDescription entries describing each game variant, which is usually placed in a separate detection_tables.h header, which is included in detection.cpp for use there.

This structure plus other parameters are passed to the AdvancedMetaEngineDetection constructor, which can also contain overrides of the default parameters for detection e.g. _md5Bytes is the number of bytes used for the MD5 hash for each file, etc.

It is suggested you consult the code and header comments in engines/advancedDetector.* and look at the examples provided by current engines for a more complete example.

Game detection entry in ScummVM config file

When you look into your .scummvmrc or scummvm.ini (depending on the platform), you will find that generally it has following structure

[scummvm]
globalkey1=foo
globalkey2=bar
versioninfo=1.5.0git2516-g30a372d

[monkey2-vga]
description=Monkey Island 2: LeChuck's Revenge (DOS/English)
path=/Users/sev/games/scumm/monkey2
engindid=scumm
gameid=monkey2
language=en
platform=pc

What you see here is several sections designated by identifiers in square brackets and set of key/value pairs belonging to each such section.

The main section with predefined name 'scummvm' contains global options, which are mainly editable in Options dialog in GUI. Then there are sections for each separate game. Additionally some of ports define their own service sections.

Name of each game is what we are calling target. Target, which is in the sample above specified as monkey2-vga is user-editable identifier unique to the specific user, and could be used for launching the game from command line.

Then each entry has description which is also user-editable, path to the game, engineid and gameid. engineid is a service name which identifies the engine within whole ScummVM. There should be no clashes, and each engine knows which gameids it does support, so gameid is only unique to each engine.

Keys platform and language are used for narrowing down the possible game candidate but are fully optional.

How Advanced Detector works

Advanced detector tries to match the files in the probed directory against specified lists of file characteristics provided in an array of ADGameDescription structures. It can take into account the md5sum of a part of the file (by default, its first several hundred bytes), its size and name. It then creates a list of candidates which then it tries to narrow down to a single ADGameDescription instance, unless it is told to do otherwise. In case of ambiguous matches, it returns a list of games.

It is important to know that currently there are in fact two modes of the Advanced Detector.

The first one is used during the game detection when user tries to add a game (detection mode), and the second one when the user launches already detected game (running mode). Both modes call the same method findGames() which could potentially return a list of games.

In detection mode, the user is then presented with a list of games to choose from, but in the running mode, if the findGames() method returns more than one game, only first one in the list will be used. This may lead to situation when the game gets detected but doesn't run, thus it is important to test detection and avoid any ambiguous situations.

This is also the main reason for some of the features in Advanced Detector which are geared towards resolving such conflicts.

In the running mode Advanced Detector tries to match as much information stored in the config game entry as possible. The typical keys it matches against are gameid, platform and language, but it may also use extra when instructed to do so.

In case there are no matches in the ADGameDescription list, there are two additional fallback detection modes. One is file-based detection, which matches just the file names, and second one is a hook which gets called and could contain code of any complexity. The most prominent example of advanced fallback detection is SCI engine. Note that by default the file-based fallback detection doesn't allow to add the games to ScummVM or start them, and is only used to report to the user that a possible unknown variant was found. This behaviour can be changed by reimplementing bool canPlayUnknownVariants() const to return true in the derived class.

Generated targets

Targets generated by Advanced Detector have the following structure:

 GAMEID-DEMO-CD-PLATFORM-LANG

The target generation is affected by AD flags. The flags which have influence are: ADGF_CD, ADGF_DEMO, ADGF_DROPLANGUAGE.

PlainGameDescriptor table

struct PlainGameDescriptor {
	const char *gameid;
	const char *description;
};

This table contains all gameids which are known by the engine. Also each gameid contains a full human-readable description, which is used to provide the description field in the ScummVM configuration file.

Only gameids which are present in this table could be used in ADGameDescription table.

Typical PlainGameDescriptor table:

static const PlainGameDescriptor cineGames[] = {
	{"cine", "Cinematique evo.1 engine game"},
	{"fw", "Future Wars"},
	{"os", "Operation Stealth"},
	{0, 0}
};

Please note that it is NULL-terminated, and also contains the generic gameid cine which is used by fallback detection.

ADGameDescription table

ADGameDescription table has the following structure:

struct ADGameDescription {
	const char *gameid;
	const char *extra;
	ADGameFileDescription filesDescriptions[14];
	Common::Language language;
	Common::Platform platform;
	uint32 flags;
	const char *guioptions;
};

gameid -- This is the gameid. Mainly it is used for taking the game description from the PlainGameDescriptor table.

extra -- This is used to distinguish between different variants of a game. The content of this field is inserted in the generated description for the config file game entry. In case the kADFlagUseExtraAsHint ADFlag is set, the contents of this field are stored in the config file, and is used to additionally distinguish between game variants. Also, if the ADGF_USEEXTRAASTITLE game flag is set, the contents of this field will be put into description rather than one extracted from 'PlainGameDescriptor table.

filesDescriptions -- a list of individual file entries used for detection. 13 files (last is zero terminated) is the maximum number of files currently used in ScummVM. We are forced to specify a hardcoded number, due to a C++ limitation for defining const arrays.

language -- language of the game variant.

platform -- platform of the game variant.

flags -- game feature flags. Contains both engine-specific ones as well as global ones (see ADGameFlags)

guioptions -- game features which are user controllable. Basically this list reflects which features of GUI should be turned on or off in order to minimize user confusion. For instance, there is no point in changing game language in single language games or have MIDI controls with game which supports only digital music. (See GUI Options)


Typical ADGameDescription table will look as follows:

static const ADGameDescription gameDescriptions[] = {
	{
		"fw",
		"",
		AD_ENTRY1("part01", "61d003202d301c29dd399acfb1354310"),
		Common::EN_ANY,
		Common::kPlatformDOS,
		ADGF_NO_FLAGS,
		GUIO0()
	},
	{ AD_TABLE_END_MARKER, 0, 0 }
};

ADGameFileDescription structure

struct ADGameFileDescription {
	const char *fileName;	///< Name of described file.
	uint16 fileType; ///< Optional. Not used during detection, only by engines.
	const char *md5; ///< MD5 of (the beginning of) the described file. Optional. Set to NULL to ignore.
	int32 fileSize;  ///< Size of the described file. Set to -1 to ignore.
};

fileName -- name of the file. It is case insensitive, but historically we use lowercase names.

fileType -- rarely used field where ADGameFileDescription structure is used by the engine. May specify music file, script file, etc.

md5 -- MD5 of the file. Most often it is MD5 of the beginning of the file for performance reasons. See _md5Bytes setting of AdvancedMetaEngineDetection. If set to NULL, the md5 is not used in detection and the entry matches against any content.

fileSize -- file size in bytes. Optional too, set to -1 in order to match against any file size.

Special prefixes

The structure can be used to request some additional detection features by adding the following prefixes to the md5 field:

  • t: The specified MD5 checksum is counted from the tail of the engine. Equivalent to the ADGF_TAILMD5 ADGameFlag.
  • r: The provided file is in MacBinary format. The provided md5 sum is of the file's Resource fork.
  • d: The provided file is in MacBinary format. The provided md5 sum is of the file's Data fork.
  • A: The provided checksum is for a file embedded inside an archive. See details below.

Scanning inside archives

In cases where a game was distributed in a compressed format and all its files have commonly-encountered names (e.g. "data1.cab"), we want to avoid adding the archive files to detection entries, since they can generate false positives when detecting, Thus, it becomes useful for the Advanced Detector to scan for files inside these archives. This is supported by doing the following:

  • Adding an A: prefix to the md5 field.
  • Using the <archive type ID>:<archive name>:<file name> syntax in the fileName field.

As an example, the following structure will match with an InstallShield cabinet named "data1.cab", which contains a file named "file.dat". Advanced Detector will extract that file, calculate its checksum, and compare it against the one provided in the md5 field:

{ "is:data1.cab:file.dat", 0, "A:0ea0755ce0254cbce1cd21841ab1b83a", 4480956 }

This is currently only supported for InstallShield v3 cabinets (using the is3: prefix), and InstallShield v5-v13 cabinets (using the is: prefix).

Game Entry flags ADGameFlags

Game flags are used to tell the engine which features this particular game has. There are both engine-specific and Advanced Detector-specific game flags. The latter, besides being more or less universal, also affects the detection behaviour.

ADGF_ADDENGLISH -- Used for dual language games. In this case, the user will be presented with a selection between the localised and English versions of the game. Affects GUIOs.

ADGF_AUTOGENTARGET -- Used for games without distinct gameid, usually fanmade. In this case, the extra field will contain the full game name, and it will be used for generating the target name instead of a generic engine-based one. See the WAGE engine for a good example.

ADGF_CD -- Specifies a CD version. Generated target will get '-cd' suffix.

ADGF_DEMO -- Specifies a game demo. Generated target will get '-demo' suffix.

ADGF_DROPLANGUAGE -- the generated target will not have a language specified. Used mainly for multilanguage games which have language selector internally. Thus the language will be selected within the game and the setting stored in the config file, but the game entry will stay intact.

ADGF_DROPPLATFORM -- the generated target will not have a platform specified. Use this when the game was released only for one platform, so suffixes like '-win' make no sense.

ADGF_DVD -- Specifies a DVD version. Generated target will get '-dvd' suffix.

ADGF_MACRESFORK -- The provided file's checksum is in MacBinary (or another format). Thus, only the Mac Res fork is extracted and the MD5 calculated. This flag is deprecated, please use one of the special prefixes in the ADGameFileDescription structure.

ADGF_NO_FLAGS -- No flags are set.

ADGF_PIRATED -- Specifies a blacklisted game. The game will be detected but refuse to run.

There are known hacked variants for some of the games that exist in the wild. We used to ignore user reports on them, but with the number of engines growing, it became tough to remember that some particular game is really a hack. When it was widespread enough, we were getting recurrent reports that the game is not detected. To avoid this situation, we now accept md5s of such games but mark them accordingly.

ADGF_REMASTERED -- Specifies a remastered version. Generated target will get '-remastered' suffix.

ADGF_TAILMD5 -- The specified MD5 checksum is counted from the tail of the engine. Used in cases when the game data is concatenated at the end of the engine executable. Good example is the Director titles that contain the start movie in the main executable after the projector code that spans over hundreds of kilobytes.

ADGF_TESTING -- Specifies a game which was announced for public testing. The user will get a relevant warning when launching the game. These are added once the public testing is announced and removed right before the release.

ADGF_UNSTABLE -- Specifies a game which is not publicly supported and is in heavy development. The user will get a relevant warning when launching the game. This warning could be suppressed completely by setting enable_unsupported_game_warning=true in the global section of the scummvm config.

ADGF_UNSUPPORTED -- The game detection will not launch. Instead, a message from the extra field will be presented to the end-user. Used for knowingly broken versions or those that require additional code which is not yet implemented and we still want to document their presence.

ADGF_USEEXTRAASTITLE -- Instead of description specified in PlainGameDescriptor table, extra field will be used as game description. A good example is AGI fan games where the game title is known, but it is not feasible to add it to the PlainGameDescriptor table, or minor composer engine demos with games combined for the same reason.

ADGF_WARNING -- Specifies a game that produces a specified warning on launch. The warning is provided (usually in a translatable form) in the extra field. The game will proceed to be launched, in contrast to the ADGF_UNSUPPORTED flag.

Advanced Detector flags ADFlags

kADFlagUseExtraAsHint -- Specify this flag in situation when there is more than a single game stored in the same directory, with the same gameid. I.e. there is no way to know which game the user wants to run without asking him. The typical example is the VGA version of Lure of the Temptress, which contained both EGA and VGA datafiles in the game directory.

Upgrading obsolete gameids

static const Engines::ObsoleteGameID obsoleteGameIDsTable[] = {
        {"simon1acorn", "simon1", Common::kPlatformAcorn},
        {"simon1amiga", "simon1", Common::kPlatformAmiga},
        {"simon2talkie", "simon2", Common::kPlatformDOS},
        {"simon2mac", "simon2", Common::kPlatformMacintosh},
        {"simon2win", "simon2", Common::kPlatformWindows},
        {0, 0, Common::kPlatformUnknown}
};

AdvancedMetaEngineDetection

Is a generic MetaEngine wrapper which is aware of the Advanced Detector. It should be used whenever AD is used.

Engine constructor

AdvancedMetaEngineDetection(const void *descs, uint descItemSize, const PlainGameDescriptor *gameIds);

descs must point to a list of ADGameDescription structures, or their supersets.

descItemSize is sizeof of the descs element used for iterating over it.

gameIds must point to a list of PlainGameDescriptor structures defining supported gameids.

Additional Advanced MetaEngine parameters

_md5Bytes -- number of bytes used to compute md5. If set to 0 then whole file will be used. Beware of doing this if your detection might encounter large files, since that can dramatically slow down the detection. Typically a sane value is 5000 bytes (the default), but often experimentation is required for many game variants with subtle differences.

_flags -- same as individual game flags but user for engine-wide settings. For instance, we know for sure that all games in the engine are unstable, so instead of modifying every game entry, we specify it here.

_guiOptions -- same as individual GUI options, but applied engine-wide. For example, when none of the games have speech, we may specify it in this spot.

_maxScanDepth -- Maximum traversal depth for directories. Default is 1, that is do not go inside of subdirectories for detection.

_directoryGlobs -- Case-insesitive list of nested directoriy globs AD will search games in. Null-terminated. Must be set if detection should go into subdirectories.

AdvancedMetaEngine usage

TODO