What is SCM?
Source code management systems are a common feature of large software development environments. They are used by both commercial and open source projects. It is far less common, however, to see SCM used in Web development, although larger development firms and projects do use SCM to manage their code.
SCM solutions are based on a simple principle: the authoritative copies of your source code, and other project files, are kept in a central repository. Developers will check out copies of files from the repository, work on those copies, and then check them back in to the repository. This is where SCM becomes an important tool; SCM manages and tracks revisions by multiple developers against a single master repository and provides:
Locking and concurrency management
Versioning and revision history
Locking and concurrency management
Working in a team-based development environment that didn’t use an SCM solution, people will probably encountered examples of the concurrency problem and its implications. Concurrency refers to the simultaneous editing of a file by more than one developer. This creates a contention problem which can lead to loss of revisions by one or more developers, especially if they are editing a single master copy of a file.
Consider a simple example: developers A and B both need to make changes in a file at the same time:
1. Developer A opens the file.
2. Developer B opens the file.
3. Developer A changes the file and saves it.
4. Developer B changes the file and saves it overwriting A’s changes.
Clearly this has the potential for serious loss of work. Even if individual developers work on their own copies of files instead of a master set of files, after developers A and B make their changes, those independent changes to the same file must, somehow, be reconciled and then distributed out to all developers.
SCM systems manage the concurrency problem with file locking which makes it possible for files to be flagged as “in use” when a developer is editing them. Two main approaches exist to file locking: exclusive locks and unreserved locks.
With exclusive locking, the SCM prevents more than one developer from ever checking out a file to edit it. If a developer checks out a file for editing, all other developers are prevented from checking out the file; they will be able to view the file or get a copy (as opposed to checking it out) but they can’t edit the master repository copy until the current developer checks it back in and, in the process, releases their exclusive lock on the file.
This solution can provide a foolproof way of preventing simultaneous editing but comes with its own problem: what happens when Developer A checks a file out and forgets they have the file checked out and leaves the office? When Developer B has an urgent change to make to the file they can’t and would have to wait for Developer A to return to check the file back in. In a large development environment it’s a challenging problem of human management and communication, particularly in a distributed development environment common in web development spanning multiple time zones.
Because of the problems described with exclusive locking, most major SCM systems in widespread use adopt a different type of locking: unreserved locking. In this model, multiple developers can check a file out and obtain a non-exclusive lock. Multiple developers then edit the file as needed.
The SCM system then implements mechanisms and algorithms to manage the merging of changes as files are checked back in to the repository. These algorithms range from the simple (inform developers of conflicting changes and ask the developers to resolve the changes) to advanced (attempt to determine and combine changes intelligently and ask for developer intervention or confirmation only when needed).
At first glance, it may seem like this does not offer much more than not using an SCM at all, especially for working on a shared set of files. But, this isn’t the case. The SCM system knows who has checked out copies of files and prevents file overwriting by ensuring some type of manual or automatic merging of changes occurs. Combined with other SCM features discussed in the following sections, this makes an unreserved SCM system a powerful development management tool.
Versioning and Revision History
SCM systems not only handle editing by multiple developers and merging of changes when conflicts arise, they also implement versioning. Under versioning, a complete history of revisions of files in the repository is maintained. Every time a version of a file is checked back in to the repository, a copy of that version is archived. At any time, it is possible to pull back a previous version of a file, or roll-back the current version to any earlier revision.
Versioning systems also generate log reports of who checked in changes and when, as well as storing comments from developers about the changes they are committing back to the repository. Some systems can even show the specific changes made or each new version of a file that is checked in.
In some SCM models, individual files are checked in and out of the repository. In other SCM systems, a synchronization system is essentially built-in. Developers check out their own, complete, copy of the repository and work on files as they need, committing their changes back to the master repository. Developers can periodically update their personal copies of the repository to obtain new changes submitted by other developers.
This way, the online access to the repository is not necessary for development to continue. Instead, developers can work off-line if needed; only connecting to the repository periodically to commit their changes, and update new changes from the repository to their own, local working copies.
Sometimes it is necessary to separate a project into two separate development streams during the course of the development cycle. These streams of development may reflect multiple versions of an application or project, or completely separate projects, that share the same base (the code developed before the separation occurs). This separate is known as forking and most SCM systems provide the ability to fork a repository and establish separate versioning, history and locking for the two forks of the project. Changes in one fork have no impact on the other fork.
Why do a team need SCM?
If you look at the main features of SCM described above it is quickly evident that even small teams of two or three people benefit from a well-implemented SCM system, even for a single developer.
At first consideration, though, it is not immediately obvious how individual developers working alone benefit from SCM. They do, however.
There are several benefits:
Versioning: If you have ever been debugging a nasty bug and found that the changes you are making are only making the problem worse, then you can appreciate the benefits of SCM’s versioning. By being able to roll back changes you can back out of problematic changes at any time.
Backups: The separation of the repository from your working copy creates an effective backup mechanism: you can keep a copy of your repository checked out while there is an effective backup of the most recent version in the repository.
Multiple Computers: You can work on multiple computers without being worried about transferring changes between the systems. If you make sure you finish a session on one computer by checking your changes into the repository you can move to another computer and just synchronize to get the latest updates from the repository before you continue working. You won’t need to manually manage synchronization of changes against multiple development computers, as this is handled by the SCM automatically
Visual SourceSafe: This is Microsoft’s solution for SCM. Most large-scale commercial development environments that develop using Microsoft-based applications will use SourceSafe since it integrates well with Microsoft’s development tools.
Concurrent Versions Systems (CVS): This is the leading SCM platform in the open source development community, and widely used in commercial environments as well. An open source project itself, it is widely deployed in Linux and UNIX environments, but is cross-platform and available for Windows as well.
Subversion (SVN): Subversion is a popular emerging alternative to CVS. Another open-source project, it addresses some of the problems with the design of CVS, and adds features lacking in CVS. SVN also allows CVS-based development environments to keep their same workflow practices after switching to SVN.
Team Foundation Server (commonly abbreviated to TFS) is a Microsoft product offering source control, data collection, reporting, and project tracking, and is intended for collaborative software development projects
In addition, there are other tools which, while not full-blown SCM systems, can provide some of the benefits of SCM for small teams or individual developers:
ColdFusion’s Remote Development Services (RDS): RDS provides basic locking capabilities and integrates well with Macromedia’s development tools, even allowing direct editing of files on a ColdFusion server. It lacks the versioning and change merging features at the core of most SCM systems but can be the basis of a small shared development environment.
WebDAV: This is an emerging open standard that provides remote editing and management of files on a Web server through its native HTTP protocol. The full specification of DAV (Distributed Authoring and Versioning) would provide locking, versioning, and forking, but most present implementations do not offer all those features.
In my experience, we have used both VSS, CVS and SVN. EAch has their share luck of features and drabacks, but looking at the commercial angle and if SCM is a separate from a broader application programming tools like TFS , SVN , an open source solution looks to be the best, not only because it’s open source, but has a lot of advantagee, simple to use and easily integratabel with other tools.
If you use TFS correctly, you can use it to manage every aspect of your application life-cycle, including requirements, code development, testing, and SDLC reporting. And the great thing is, all aspects of the life-cycle can be linked from one part of the process to another. It becomes really difficult to do that if you’re using SVN for your repository and FogBugz for your bug tracking, and spreadsheets for your requirements (etc.).
However, if you are already existing system for tracking other things and just looking forward for a pure SCM solution, SVN is the guy for you.