$$SWYNK$$
Version Control and Version Differentiation
I stated last week that a key reason I don’t care for a number of software development tools is that they do not lend themselves to the ability to integrate with existing version control products allowing you to see in a human readable form the modifications from one version to the next.
Most diff tools available today perform a comparison of one text file to another. Even some low end database comparison tools simply script the definitions of data into text files, and then use a text diff tool to demonstrate the variance. I have used a number of tools over the years that fall into this category.
For example, when using Er Win a number of years ago, I would script the entire model into separate files, save the model, and then check in everything into version control. Using this technique, I could roll the model back to a specific version. I could also see the individual objects that had been modified in the model, view a difference, or even retrieve a single object to revert in a model.
David writes in with a different symptom of the restrictions with our current Diff capabilities:
You touch on a lot of subjects, programming languages, IDEs, source control, development practices, and so on.
I’d like to touch on two related subjects: issues with version control and certain “source” (not always human generated) files, and version control impacting development practices (and vice versa).
Most version control tools use line-by-line diffs because they are simplest and match most human-written source files well (they can determine just the few places where code was added or deleted and store only that difference) but generally don’t work ideally for swapped code (such as reordering functions), one of the two swapped pieces of code will be marked as deleted in one section and added in another, even if no major change occurred. Similarly, adding a new indent level generally deletes the old version entirely and adds a new version (while leaving the closing curly brace unchanged). These are faults of the diff tool not understanding the syntax of the language, but aren’t major nuisances and are largely ignored.
XML can completely break this simplistic heuristic because of how it is interpreted by programs. Technically by default whitespace is significant, and so is tag order, but not attribute order, but most programs do not care about whitespace significance in XML, and may or may not care about tag order, so each run through by a program auto-generating XML can produce wildly different text that represents the exact same data structure.
This can wreak havoc on actually determining what has changed per version and merging the changes of two users who used the same tool to update the XML file. (The same is true of any binary “source” like the Access ADP file.)
I don’t really see a solution for XML, since what is significant in the XML can vary from XML format to XML format, and even if you make a diff tool that understands DTD (or RelaxNG, or whatever validating format you use), how many people writing custom XML formats for their software actually demark insignificant whitespace as insignificant? And how do you even avoid the insignificant tag ordering? (I don’t think DTD lets you state such as insignificant.) It can only get worse for each binary format.
The only “solution” is a meta-language that describes the changes you make to such file formats and keeping the actual file outside of version control and reconstructed from these change sets (similar to managing a database schema with version control), but that can also get ugly quickly, and some of these changes may not be reversible like everything inside of the version control system (such as DROP DATABASE…).
Like I said, I don’t know of a solution to this problem (or I’d be looking for VC funding :-).
As far as version control and development practices, it’s a fairly symbiotic relationship there, but the most exciting (to me) is the recent introduction of git and github. It’s taking the open source software development paradigm (many disparate programmers across the world collaborating simultaneously on a single project [which Agile is very similar to a microcosm of such software development practices, in my opinion]) and “supercharging” it by reducing the barrier to contribution (both in getting the source onto your computer to modify and in getting the changes in a form “upstream” can accept) to just the coding ability of the developer interested in the project, while not undermining the authority of the originating project (unless it goes unmaintained, in which the forks fight for supremacy, as usual).
I could see corporations following a similar model where each department has a separate, central repository for their developers to contribute directly, but any developer anywhere can fork a copy and offer a contribution if they desire (encouraging developers to do such a thing for the benefit of the whole company would probably require some sort of incentive, though).
Do you have any tips or processes you use to work around managing code? Share them with us by sending your comments to btaylor@sswug.org.
Cheers,
Ben
Featured Article(s)
Performance Troubleshooting Using SQL Nexus
In this post, I am going to delve into how to use SQL Nexus and RML Utilities to perform post-mortem analysis of diagnostic data collected using SQLDIAG, specifically on getting information on a Lock Blocking scenario.
Featured White Paper(s)
Query Tuning Strategies for Microsoft SQL Server
Written by Quest Software
When you’re looking for a reliable tool to diagnose … (read more)
Featured Script
Removing IDENTITY column property
Somebody one day asked me how to remove the identity property from a column without using an external table. Here’s what I c… (read more)