Open main menu

Cloud Storages API comparison

Revision as of 17:46, 2 May 2016 by Tkachov (talk | contribs) (Added Comparison Table section)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

ScummVM would be using open REST API of such cloud storage providers as Dropbox, Google Drive and OneDrive due to cloud integration project. These API allow user to authenticate in order to recieve access token, which would be passed to an application (i.e. ScummVM). With this token application is allowed to make API calls and manage files in user's cloud storage.

In REST APIs there are scopes or permissions, which application can require. Those are shown to users when they authorize the application. Application would be limited to use only those methods which are defined by requested scopes.

Dropbox has only two scopes: one grants access to whole user's storage and the other - to application's special folder only.

Google Drive and OneDrive, on the other hand, have more finegrained scopes.

Each file in these services has its id, which is not changing while file is present. In Dropbox and OneDrive files can also be accessed by file path. In Google Drive there is only id option, but that allows files to have the same names under the same directory and files to have a lot of "parents" (meaning the same file could be in different directories at the same time but to be storaged only once, like in hardlinks).

Dropbox and Google Drive also have revisions with their own ids. Any file can be restored to one of its saved revisions, yet I'm not sure ScummVM would be using this feature.

REST APIs are HTTP-based APIs, so they use HTTP methods in order to specify which operation must be applied to specified resource. Server can use HTTP status codes in response in order to indicate whether request was processed correctly. Response is a JSON object (some REST APIs provide XML option, but I haven't seen such in Dropbox, Google Drive or OneDrive APIs).

These storages use ISO 8601 date format ("2015-05-12T15:50:38Z"). Google Drive uses RFC 3339 format, which is ISO 8601 with few modifications like decimal seconds.

Application Folder

All three storages have an application's folder option. In this case application is limited to manage only files within that folder and cannot access any other file on user's storage. In Dropbox and OneDrive this folder is open for users, so they can navigate there and manage their files manually. In Google Drive application folder is hidden, and user can only see quota usage in application settings.

It looks more preferable to ask user to give access to application folder only, so we can either ask for non-application folder access in Google Drive case or use the hidden folder with no user access to it. If not using application folder, we can ask user to specify the desired folder, so ScummVM would be keeping its files there, but also have access to whole storage, so user would be able to download games from any other directory.

I'd like to keep save files in "Saves" directory within application's root folder. Thus ScummVM would always know where to look for those and only app folder access would be required. Of course, we can make Saves folder path customizable, but I don't see why users would like to specify one folder for ScummVM and the other for saves. Special application folders cannot be moved though, they are created in special place of user's storage. That might not be that comfortable for users when they cannot move such folder, but this way it would be hard for them to forget where this folder is. ScummVM would be able to easily specify how to find such folder in F.A.Q., because it would be unified. For this reasons we might want to have visible "ScummVM" folder in Google Drive without asking users where they want to put it (it's kind of evil though).

File IDs vs File Paths

Local storage doesn't have any ids for files. Instead, we have unique file paths there. When we would like to download a file or sync files between storages, we would have to know which path corresponds to cloud storage file id.

First problem is quite technical: only Dropbox's file metadata contains full path (in lowercase and "display" representations). We would have to recreate full paths in case of Google Drive and OneDrive by listing ScummVM's folder, then listing all its subfolders and so on. Path prefix for files is prefix for its parent plus parent's name (full path is parent's prefix "ScummVM/Games/" + parent's name "Sam & Max/" + file name "file.ext").

At the same time, there is case-sensitivity problem: "file.txt" and "FiLe.txt" on Windows file systems are the same file name and are different on Unix. Names can have uppercase characters, but as Dropbox operates with lowercase path, it would, similarly to Windows, believe these files to be the same. OneDrive path's are not case-sensitive too. Google Drive doesn't have any paths, and it allows files to have the same name within one directory.

The solution is to avoid naming files in a way it would lead to such ambiguity. That means ScummVM should be case-insensitive too, and all files should have unique lowercase path. If there are files with the same name within one folder (in Google Drive), we should either ask user to select one he wants ScummVM to operate with or choose one ourselves: for example, newest one or one we know id of.

I believe that paths (even if those are lowercase) are more clear for both developers and users than ids are. If, for example, we would storage a special metadata file with ids of all downloaded files and user moved or removed file with known id, ScummVM would make an API call using this saved id. If file is moved, it might cause strange effects - user thought that when file is moved somewhere it won't be under ScummVM control anymore, but instead it still is. That might rewrite user's save files he thought wouldn't be touched anymore. If file was removed, ScummVM would recieve an error from cloud storage. Even if there is a file under that path (a new one with new id), application would believe that there is no file (because file with old id is inaccessible).

So, ScummVM should work in terms of file paths, and use ids only when it's actually needed (to download/update/sync files) without caching those in any metadata file.

HTTP errors

There are some HTTP status codes, which indicate whether application's request was successful.

Services use 200 and 201 to indicate everything's OK, files were created successfully.

There is 429 error, which says our application is making a lot of requests and we should retry after specified number of seconds.

Google recommends using exponential backoff (https://developers.google.com/drive/v3/web/manage-uploads#exp-backoff) error handling strategy when application recieves HTTP 503 or similar status codes.

File Metadata Representation

Each service sends metadata about the files, and I tried to mark the fields we're probably interested in.

Dropbox

	name - the last piece of display path
	path_lower, path_display - paths
	client_modified, server_modified - in ISO 8601
	size - in bytes

	//might be useful:
	id - inner Dropbox file id
	rev - inner Dropbox file revision id

Google Drive

	name - might not be unique within a folder
	id - inner Google Drive file id
	modifiedTime - RFC 3339 date-time	
	size - in bytes
	webContentLink - where to download

	//might be useful:
	originalFilename - probably might be used to determine unique filename
	properties, appProperties - key/value maps to keep whatever we want
	version - autoincreased every time file changes
	parents[] - list of folders which contain this file	
	headRevisionId - inner Google Drive file revision id

OneDrive

	name
	id
	lastModifiedDateTime
	size

	//might be useful:
	File.mimeType
	File.hashes
	Folder.childCount

Custom Metadata

Google Drive and OneDrive can also storage our own metadata with the files. In Google Drive there are two key/value maps for that: properties (public one, accessible to all applications) and appProperties (private one, accessible to our application only). OneDrive requires metadata to have some specified format. Developers must register their "facet" with schema and properties definitions. If we'd like to change those, we would have to do it "only in ways that can't break old apps", meaning we can't completely remove a field or change fields boundaries.

As this feature is not really common in these services and is not available in Dropbox, I believe we should not use it. I'm not sure we even need any custom metadata on files anyway.

Required API Methods

This is a list of API methods we would have to use.

Dropbox

	/create_folder
	/delete
	/download
		may get "rev:abcdef" in order to download specific version of a file
	/list_folder
		knows "recursive", returns "has_more" for /list_folder/continue call
	/upload
		less than 150 MB
	/upload_session/start
	/upload_session/append
	/upload_session/finish
		more than 150 MB (less than 150 MB per request)
	/get_current_account
		user id, name, email, photo url
	/get_space_usage
		used and allocated in bytes

Google Drive

	/about
		user (id, name, photo link, email)
		quota (usage, limit)
		maxUploadSize, appInstalled

	/files/<f>
		/create
			supports simple ("media"), multipart ("multipart") and resumable ("resumable") uploading
		/delete
		/get
			get metadata
		/list
			search (probably can list folder contents when using "path/to/directory/" prefix as query)
		/update
			change metadata or file contents

OneDrive

	/drives/special
		id, quota, owner
	/drives/special/approot
		folder's Item resource
	/drive/special/approot/children
	/drive/items/{id}/children
	/drive/special/approot:/{path}/children
		list children (directory contents)
		@odata.nextLink is the request url for next page
	/drive/special/approot/children
	/drive/items/{parent-id}/children/{name}
	/drive/root:/{parent-path}/{name}
		create item
	/drive/special/approot:/{fileName}:/content
	/drive/items/{parent-id}/children/{name}/content
	/drive/root:/{parent-path}/{name}:/content
		upload contents
		(less than 100 MB in one piece)
		(supports multipart)
	/drive/root:/{path_to_item}:/upload.createSession
	/drive/items/{parent_item_id}:/{filename}:/upload.createSession
		resumable upload
		(less than 60 MB in one fragment, 10 MB is recommended size for a fragment)
		(this method is recommended to use for any file >= 10 MB)	
	/drive/items/{id}
	/drive/root:/{path}
		update contents (HTTP method PATCH)
	/drive/items/{id}
	/drive/root:/{path}
		delete item (HTTP method DELETE)
	/drive/items/{id}/content
	/drive/root:/{path}:/content
		download

Other API Methods

Some API methods which might be useful.

Dropbox

	/get_temporary_link
		link to stream contents
	/list_folder/get_latest_cursor
		get cursor for files that were changed since last call
	/restore
		rollback file to specified revision
	/copy
	/copy_reference
		to other user's folder
	/get_metadata
	/get_preview
	/get_thumbnail
	/list_folder/longpoll
		await for changes
	/list_revisions
	/move
	/search
		indexed search, might not know about latest changes
	<lots of sharing methods we won't use>

Google Drive

	/files/<f>
		/copy
		/watch
		/emptyTrash
		/generateIds
	/changes/*
		list/watch changes (what was removed/added in which time)
	/channels/stop
		stop watching file
	/files/<f>/comments/*
		work with users comments (totally not used by ScummVM)
	/files/<f>/replies/*
		work with replies to comments
	/files/<f>/permissions/*
		work with permissions (why would we do this? =)

OneDrive

	/drive/root/view.search?q=
	/drive/items/{item-id}/view.search?q=
		search
	/drive/items/{item-id}/view.delta
	/drive/root:/{item-path}:/view.delta
		see changes from previous time
	<move, copy, view thumbnail, share>

Comparison Table

API calls \ Services Dropbox Google Drive OneDrive
list directory OK ID needed * OK
upload (in one API call) 150 MB max ID needed *, no limit 100 MB max
upload (session, multipart/resumable) 150 MB max per request ID needed *, no limit 60 MB max per request
download OK ID needed * OK
delete OK ID needed * OK
create directory OK ID needed * OK
touch use /move ** ID needed * OK
get service info OK OK OK
Legend Supported Supported, but requires some workaround Not supported

* Google Drive has no path access to files, so in order to specify a file within some subdirectory we would have to list these directories and determine its ID (should be done recursively for whole directory before syncing or downloading it).

** there is no method in Dropbox to change file's metadata. May be modification date might be changed without reuploading file using simple /move call.