Difference between revisions of "CVS2SVN"
(Initial version) |
(Fix typos as people seem to use this document.) |
||
(6 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
== Intro == | == Intro == | ||
Here I'll describe how we transformed our ScummVM CVS to SVN. It may be helpful for other big projects facing same challenge. | Here I'll describe how we transformed our ScummVM CVS to SVN. It may be helpful for other big projects facing the same challenge. | ||
Of course, SF.net has this nice "import CVS repository", but that one runs automatic [http://cvs2svn.tigris.org/ cvs2svn] script and wanted more: | Of course, SF.net has this nice "import CVS repository", but that one runs an automatic [http://cvs2svn.tigris.org/ cvs2svn] script and we wanted more: | ||
* Restore connections between moved files | * Restore connections between moved files | ||
* That includes merge of scummvm-old and scummvm modules. scummvm module was created when we performed major project directory tree restructuring | * That includes the merge of scummvm-old and scummvm modules. scummvm module was created when we performed a major project directory tree restructuring | ||
* Keep subtree with tags and branches for each of our subprojects | * Keep subtree with tags and branches for each of our subprojects | ||
== Overall idea == | == Overall idea == | ||
Generally connection between moved files in SVN could be restored as shown in following chunk of real ScummVM repository dump: | Generally, the connection between moved files in SVN could be restored as shown in the following chunk of a real ScummVM repository dump: | ||
Revision-number: 89 | Revision-number: 89 | ||
Line 23: | Line 21: | ||
+Node-copyfrom-path: trunk/scummvm-old/PocketSCUMM/missing/dirent.h | +Node-copyfrom-path: trunk/scummvm-old/PocketSCUMM/missing/dirent.h | ||
... | ... | ||
Revision-number: 93 | |||
Node-path: trunk/scummvm-old/PocketSCUMM/PocketSCUMM.vco | |||
Node-action: delete | |||
Node-path: trunk/scummvm-old/PocketSCUMM/missing/dirent.h | |||
Node-action: delete | |||
Note lines marked by '+'. Those were added. This successfully restores SVN history link. | Note lines marked by '+'. Those were added. This successfully restores the SVN history link. 53 is the last revision number where dirent.h file was altered. | ||
== The process == | == The process == | ||
To successfully fullfill the task I wrote several simple scripts. I had to do it in scripts, not manually because restoration of connections is pretty | To successfully fullfill the task I wrote several simple scripts. I had to do it in scripts, not manually, because restoration of connections is pretty long process, and I wanted to minimize the repository freeze time. So first I modified the dump file locally, and later reapplied my changes to the fresh dump. Those scripts are intentionally kept simple and no error checking is performed. | ||
The whole dump takes over 1.4GB, so there is no way to edit the file directly. Hence I came up with a simple idea of extracting just those Revision numbers and Node-paths, so they could be later replaced in the original dump. | |||
=== automation-pass1. | === automation-pass1.sh === | ||
wget http://.../scummvm-cvsroot.tar.bz2 | wget http://.../scummvm-cvsroot.tar.bz2 | ||
Line 61: | Line 64: | ||
bzip2 -9 scummvm.dump.nodes.in tools.dump.nodes.in | bzip2 -9 scummvm.dump.nodes.in tools.dump.nodes.in | ||
This stage took about 1.5 hours. Bottleneck is the disk | This stage took about 1.5 hours. Bottleneck is the disk subsystem. The overall size of produced data is over 1.4GB. | ||
=== Manual editing === | === Manual editing === | ||
Now I opened dump.nodes.in files in XEmacs and started to add those links. First, I searched | Now I opened the dump.nodes.in files in XEmacs and started to add those links. First, I searched it for the word 'delete' and studied each case. I had to consult files layout and CVS log messages to see if those files were simply killed or really moved or renamed. Due to fact that some files were really renamed, and there are name clashes between files in directories, it was not possible to fully automate the task, although big chunk of it could be scripted. | ||
So what I did is | So what I did is specified that Node-copyfrom-path: manually and left Node-copyfrom-rev blank. Then I recorded a simple macro in XEmacs, as that was quicker to do than writing yet another script. The macro was something like this: | ||
* It starts on Node-copyfrom-path line. | * It starts on Node-copyfrom-path line. | ||
* Put path | * Put path to yank buffer | ||
* Kill other windows | * Kill other windows | ||
* Split window (thus we have 2 views of buffer at the same place) | * Split window (thus we have 2 views of buffer at the same place) | ||
* Switch to another window | * Switch to another window | ||
* Search backwards contents of yank buffer | * Search backwards contents of the yank buffer | ||
With this I saw revision number in another window. | With this I saw the revision number in another window. I doublechecked that this is the correct place and put that revision number to Node-copyfrom-rev field. However I didn't see any inconsistencies here, so I guess it could insert those numbers fully automatic. | ||
Resulting .nodes file | Resulting .nodes file had inserted lines marked with leading + like on example at the top of this document. | ||
=== automation-pass2. | === automation-pass2.sh === | ||
This one is simple. | This one is simple. It merges back those inserted lines and modifies all internal paths, so it will put all modules into separate directories of the SVN repository: | ||
perl merge-dump.pl scummvm.dump.nodes scummvm <scummvm.dump >scummvm.dump.new | perl merge-dump.pl scummvm.dump.nodes scummvm <scummvm.dump >scummvm.dump.new | ||
Line 133: | Line 136: | ||
} | } | ||
The second regexp here is tricky, since Node-copyfrom-path could contain either /trunk/scummvm/blah or trunk/scummvm/blah and we have to keep that leading slash if it is present. | |||
So after this stage amount of data on disk doubles since we have both merged and non-merged dumps. I kept non- | So after this stage the amount of data on the disk doubles since we have both merged and non-merged dumps. I kept non-merged dumps intact, so pass2 could be redone without performing lengthy pass1 over again. | ||
=== automation-pass3.sh === | === automation-pass3.sh === | ||
Line 212: | Line 215: | ||
== Final step == | == Final step == | ||
Last step is dump the repository, bzip it and upload to sf.net. Later SF.net added possibility to import an existing dump automagically, but at the time we performed the move, we had to submit a PR. |
Latest revision as of 09:47, 7 September 2007
Intro
Here I'll describe how we transformed our ScummVM CVS to SVN. It may be helpful for other big projects facing the same challenge.
Of course, SF.net has this nice "import CVS repository", but that one runs an automatic cvs2svn script and we wanted more:
- Restore connections between moved files
- That includes the merge of scummvm-old and scummvm modules. scummvm module was created when we performed a major project directory tree restructuring
- Keep subtree with tags and branches for each of our subprojects
Overall idea
Generally, the connection between moved files in SVN could be restored as shown in the following chunk of a real ScummVM repository dump:
Revision-number: 89 ... Node-path: trunk/scummvm-old/wince/missing/dirent.h Node-kind: file Node-action: add +Node-copyfrom-rev: 53 +Node-copyfrom-path: trunk/scummvm-old/PocketSCUMM/missing/dirent.h ... Revision-number: 93 Node-path: trunk/scummvm-old/PocketSCUMM/PocketSCUMM.vco Node-action: delete Node-path: trunk/scummvm-old/PocketSCUMM/missing/dirent.h Node-action: delete
Note lines marked by '+'. Those were added. This successfully restores the SVN history link. 53 is the last revision number where dirent.h file was altered.
The process
To successfully fullfill the task I wrote several simple scripts. I had to do it in scripts, not manually, because restoration of connections is pretty long process, and I wanted to minimize the repository freeze time. So first I modified the dump file locally, and later reapplied my changes to the fresh dump. Those scripts are intentionally kept simple and no error checking is performed.
The whole dump takes over 1.4GB, so there is no way to edit the file directly. Hence I came up with a simple idea of extracting just those Revision numbers and Node-paths, so they could be later replaced in the original dump.
automation-pass1.sh
wget http://.../scummvm-cvsroot.tar.bz2 rm -rf scummvm scummvm.cvs tar xjf scummvm-cvsroot.tar.bz2 mv scummvm scummvm.cvs mkdir scummvm # We need to combine scummvm-old and scummvm modules mv scummvm.cvs/scummvm scummvm.cvs/scummvm-old scummvm # Due to bug in our CVS repository branch-0-5-0 is both tag and branch, so # we have to force it here cvs2svn --dump-only --force-branch=branch-0-5-0 --dumpfile scummvm.dump scummvm grep -E "^Node-|^Revision-number" scummvm.dump > scummvm.dump.nodes.in cvs2svn --dump-only --dumpfile tools.dump scummvm.cvs/tools grep -E "^Node-|^Revision-number" tools.dump > tools.dump.nodes.in # No efforts were performed with restoring links in following modules cvs2svn --dump-only --dumpfile web.dump scummvm.cvs/web cvs2svn --dump-only --dumpfile scummex.dump scummvm.cvs/scummex cvs2svn --dump-only --dumpfile residual.dump scummvm.cvs/residual cvs2svn --dump-only --dumpfile docs.dump scummvm.cvs/docs cvs2svn --dump-only --dumpfile engine-data.dump scummvm.cvs/engine-data # I performed all work offsite, so I had to transfer these dumps over slow dial-up line bzip2 -9 scummvm.dump.nodes.in tools.dump.nodes.in
This stage took about 1.5 hours. Bottleneck is the disk subsystem. The overall size of produced data is over 1.4GB.
Manual editing
Now I opened the dump.nodes.in files in XEmacs and started to add those links. First, I searched it for the word 'delete' and studied each case. I had to consult files layout and CVS log messages to see if those files were simply killed or really moved or renamed. Due to fact that some files were really renamed, and there are name clashes between files in directories, it was not possible to fully automate the task, although big chunk of it could be scripted.
So what I did is specified that Node-copyfrom-path: manually and left Node-copyfrom-rev blank. Then I recorded a simple macro in XEmacs, as that was quicker to do than writing yet another script. The macro was something like this:
- It starts on Node-copyfrom-path line.
- Put path to yank buffer
- Kill other windows
- Split window (thus we have 2 views of buffer at the same place)
- Switch to another window
- Search backwards contents of the yank buffer
With this I saw the revision number in another window. I doublechecked that this is the correct place and put that revision number to Node-copyfrom-rev field. However I didn't see any inconsistencies here, so I guess it could insert those numbers fully automatic.
Resulting .nodes file had inserted lines marked with leading + like on example at the top of this document.
automation-pass2.sh
This one is simple. It merges back those inserted lines and modifies all internal paths, so it will put all modules into separate directories of the SVN repository:
perl merge-dump.pl scummvm.dump.nodes scummvm <scummvm.dump >scummvm.dump.new perl merge-dump.pl tools.dump.nodes tools <tools.dump >tools.dump.new for i in web scummex residual docs engine-data do perl prepare-dump.pl $i <$i.dump >$i.dump.new done
merge-dump.pl
$logfile = shift; $module = shift; open(LOG, $logfile) or die "Can't open file $logfile"; $logline = <LOG>; while(<>) { $line = $lineorig = $_; $line =~ s/^Node-path: /Node-path: $module\//; $line =~ s#^Node-copyfrom-path: (/?)#Node-copyfrom-path: $1$module/#; print $line; if ($lineorig eq $logline) { $logline = <LOG>; while ($logline =~ /^\+(.*)/) { print "$1\n"; $logline = <LOG>; } } } close LOG;
prepare-dump.pl
$module = shift; while(<>) { $line = $_; $line =~ s/^Node-path: /Node-path: $module\//; $line =~ s#^Node-copyfrom-path: (/?)#Node-copyfrom-path: $1$module/#; print $line; }
The second regexp here is tricky, since Node-copyfrom-path could contain either /trunk/scummvm/blah or trunk/scummvm/blah and we have to keep that leading slash if it is present.
So after this stage the amount of data on the disk doubles since we have both merged and non-merged dumps. I kept non-merged dumps intact, so pass2 could be redone without performing lengthy pass1 over again.
automation-pass3.sh
At this straightforward stage I create local svn repository
rm -rf svn svnadmin create svn svnadmin load svn < init.dump for i in scummvm tools web scummex residual docs engine-data do svnadmin load svn < $i.dump.new done
It takes another hour, and then I can dump resulting repository with
svnadmin dump svn >scummvmrepo.dump
init.dump files contains skeleton of our new repository layout
init.dump
SVN-fs-dump-format-version: 2 Revision-number: 1 Prop-content-length: 112 Content-length: 112 K 8 svn:date V 27 2001-10-09T14:30:12.000000Z K 7 svn:log V 38 New repository initialized by cvs2svn. PROPS-END Node-path: scummvm Node-kind: dir Node-action: add Node-path: tools Node-kind: dir Node-action: add Node-path: web Node-kind: dir Node-action: add Node-path: docs Node-kind: dir Node-action: add Node-path: scummex Node-kind: dir Node-action: add Node-path: engine-data Node-kind: dir Node-action: add Node-path: residual Node-kind: dir Node-action: add
Final step
Last step is dump the repository, bzip it and upload to sf.net. Later SF.net added possibility to import an existing dump automagically, but at the time we performed the move, we had to submit a PR.