Open main menu

Difference between revisions of "CVS2SVN"

109 bytes added ,  03:01, 14 February 2006
m
improved spelling
m (Spelling)
m (improved spelling)
Line 1: Line 1:
== Intro ==
== Intro ==


Here I'll describe how we transformed our ScummVM CVS to SVN. It may be helpful for other big projects facing same challenge.
Here I'll describe how we transformed our ScummVM CVS to SVN. It may be helpful for other big projects facing the same challenge.


Of course, SF.net has this nice "import CVS repository", but that one runs automatic [http://cvs2svn.tigris.org/ cvs2svn] script but we wanted more:
Of course, SF.net has this nice "import CVS repository", but that one runs an automatic [http://cvs2svn.tigris.org/ cvs2svn] script and we wanted more:


* Restore connections between moved files
* Restore connections between moved files
* That includes merge of scummvm-old and scummvm modules. scummvm module was created when we performed major project directory tree restructuring
* That includes the merge of scummvm-old and scummvm modules. scummvm module was created when we performed a major project directory tree restructuring
* Keep subtree with tags and branches for each of our subprojects
* Keep subtree with tags and branches for each of our subprojects


== Overall idea ==
== Overall idea ==


Generally connection between moved files in SVN could be restored as shown in following chunk of real ScummVM repository dump:
Generally, the connection between moved files in SVN could be restored as shown in the following chunk of a real ScummVM repository dump:


   Revision-number: 89
   Revision-number: 89
Line 27: Line 27:
   Node-action: delete
   Node-action: delete


Note lines marked by '+'. Those were added. This successfully restores SVN history link. 53 is the last revision number where dirent.h file was altered.
Note lines marked by '+'. Those were added. This successfully restores the SVN history link. 53 is the last revision number where dirent.h file was altered.


== The process ==
== The process ==


To successfully fullfill the task I wrote several simple scripts. I had to do it in scripts, not manually because restoration of connections is pretty lengthy process, and I wanted to minimise repository freeze time. So I first modified dump file locally, and later reapplied my changes to fresh dump. Those scripts are intentionally kept simple and no error checking is performed.
To successfully fullfill the task I wrote several simple scripts. I had to do it in scripts, not manually, because restoration of connections is pretty long process, and I wanted to minimize the repository freeze time. So I first modified the dump file locally, and later reapplied my changes to the fresh dump. Those scripts are intentionally kept simple and no error checking is performed.


Whole dump takes over 1.4G, so there is no way to edit the file directly. Hence I came with simple idea of extracting just those Revision numbers and Node-paths, so they could be later replaced in original dump.
The whole dump takes over 1.4GB, so there is no way to edit the file directly. Hence I came up with the simple idea of extracting just those Revision numbers and Node-paths, so they could be later replaced in the original dump.


=== automation-pass1.pl ===
=== automation-pass1.pl ===
Line 64: Line 64:
   bzip2 -9 scummvm.dump.nodes.in tools.dump.nodes.in
   bzip2 -9 scummvm.dump.nodes.in tools.dump.nodes.in


This stage took about 1.5 hours. Bottleneck is the disk subsytem. Overall size of produced data is over 1.4G.
This stage took about 1.5 hours. Bottleneck is the disk subsystem. The overall size of produced data is over 1.4GB.


=== Manual editing ===
=== Manual editing ===


Now I opened dump.nodes.in files in XEmacs and started to add those links. First, I searched all words 'delete' and studied each case. I had to consult files layout and CVS log messages to see either those files simply killed or really moved or renamed. Due to fact that some files were really renamed, and there are name clashes between files in directories this was not possible to fully automate the task, however big chunk of it could be scripted.
Now I opened the dump.nodes.in files in XEmacs and started to add those links. First, I searched it for the word 'delete' and studied each case. I had to consult files layout and CVS log messages to see either those files were simply killed or really moved or renamed. Due to the fact that some files were really renamed, and there are name clashes between files in directories, it was not possible to fully automate the task, however a big chunk of it could be scripted.


So what I did is to specify that Node-copyfrom-path: manually and left Node-copyfrom-rev blank. Then I recorded some simple macro in XEmacs, as that was quicker to do than writing yet another script. The macro was something like this:
So what I did is to specify that Node-copyfrom-path: manually and left Node-copyfrom-rev blank. Then I recorded a simple macro in XEmacs, as that was quicker to do than writing yet another script. The macro was something like this:


* It starts on Node-copyfrom-path line.
* It starts on Node-copyfrom-path line.
Line 79: Line 79:
* Search backwards contents of yank buffer
* Search backwards contents of yank buffer


With this I saw revision number in another window. Doublecheck that this is correct place and put that revision number to Node-copyfrom-rev field. However I didn't see any inconsistencies here, so I guess it could insert those numbers fully automatic.
With this I saw the revision number in another window. I doublechecked that this is the correct place and put that revision number to Node-copyfrom-rev field. However I didn't see any inconsistencies here, so I guess it could insert those numbers fully automatic.


Resulting .nodes file has inserted lines marked with leading + like on example at the top of this page.
Resulting .nodes file has inserted lines marked with leading + like on example at the top of this page.
Line 85: Line 85:
=== automation-pass2.pl ===
=== automation-pass2.pl ===


This one is simple. What it does is to merge back those inserted lines and modify all internal path, so it will put all modules into separate directories on SVN repository:
This one is simple. What it does is merging back those inserted lines and modifying all internal paths, so it will put all modules into separate directories on the SVN repository:


   perl merge-dump.pl scummvm.dump.nodes scummvm <scummvm.dump >scummvm.dump.new
   perl merge-dump.pl scummvm.dump.nodes scummvm <scummvm.dump >scummvm.dump.new
Line 136: Line 136:
   }
   }


Second regexp here is tricky, since Node-copyfrom-path could contain either /trunk/scummvm/blah or trunk/scummvm/blah and we have to keep that leading slash of it is present.
The second regexp here is tricky, since Node-copyfrom-path could contain either /trunk/scummvm/blah or trunk/scummvm/blah and we have to keep that leading slash of it present.


So after this stage amount of data on disk doubles since we have both merged and non-merged dumps. I kept non-merget dumps, so pass2 could be redone without performing lengthy pass1 over again.
So after this stage the amount of data on disk doubles since we have both merged and non-merged dumps. I kept non-merged dumps, so pass2 could be redone without performing lengthy pass1 over again.


=== automation-pass3.sh ===
=== automation-pass3.sh ===
Line 215: Line 215:
== Final step ==
== Final step ==


Then repository was dumped, bzipped and uploaded to sf.net. Now it is possible to import existing dump, but at time when we do it, we had to submit a PR.
Then the repository was dumped, bzipped and uploaded to sf.net. Now it is possible to import the existing dump, but at the time we did it, we had to submit a PR.