Category: VCS

Automating clearcase

One of the biggest pain points we have encountered on the current project I am on is the use of ClearCase by the dev team.  I won’t go into depth on the time we have wasted wrestling with this most frustrating of source control systems but will describe some of the things we have gotten to work that automate the tasks we previously had to do manually.

There are two pieces that we have automated that have made our lives easier:

Checking development branch is in-sync with main branch – This was a simple implementation but a fantastic means of notification. Main branch check ins happen infrequently within the development environment here (1-3 times a month) and there is supposed to be email notifications that get sent out each time but, as with anything, they can sometimes not get sent or just get lost in the daily email delude. As such, we setup a simple check that compares our branch to the main branch and checks to see if anything on the main branch has not been merged into our branch:

    <taskdef name="cc-findmerge" classname="org.apache.tools.ant.taskdefs.optional.clearcase.CCFindmerge">
        <classpath path="${customAnt.dir}/ccfindmergeanttask.jar" />
        <classpath path="${log4j.dir}/log4j-1.2.14.jar" />
    </taskdef>

    <target name="check-insync-with-clearcase-main" description="Checks that the view is up-to-date with the main branch">
        <cc-findmerge viewpath="${view.dir}" version="/main/LATEST" />
    </target>

Here we are using the CCFindMerge task which compares the viewpath to the version and determines if a merge is required. If it is, it will cause the build to fail. We run this task on a nightly basis as part of a validation and metrics build and it gives us an instant heads up that something new has been checked into the main branch and we need to merge it.

Automating merge from dev branch into integration branch – The branching strategy at the client required that we do all development work in our development branch and then merge those changes into a shared integration branch. Since we are following an Agile methodology (and therefore commit code is small changes that are tested to work), we basically are able to merge into the integration branch on every succesful build of our development branch. Unfortunately, merging manually is a long process and was not worth the effort but we wondered if it would be possible to do an automated code merge on a nightly basis (is a fairly heavy process so don’t want it running on every checkin). Here is the solution we came up with:

  • Update the local copy of the integration branch with the latest codebase
  • Attempts to do an automated clearcase merge from our dev branch into the local copy of the integration branch
  • Find all files that have been checked out as a result of the automated merge attempt
  • If the automated merge was succesful, checkin all the files found
  • If the automated merge failed, undo all checkouts and fail the build

Rather than anything to fancy, most of these calls are done using the available command line interface from ClearCase – cleartool.exe. To do the auto merge attempt, we run the following ant target which puts the result of the command into a property called merge.outcome:

    <target name="merge-from-view" description="Attempts to merge 2 views - if merge is non-trivial, fails and undoes checkout" if="view.name">
        <exec executable="cleartool.exe" failonerror="false" resultproperty="merge.outcome">
            <arg line="findmerge" />
            <arg line="${btweb.view.dir}" />
            <arg line="-fversion /main/${view.name}/LATEST" />
            <arg line="-log NUL" />
            <arg line="-c 'Merging latest from ${view.name}'" />
            <arg line="-unreserved" />
            <arg line="-merge" />
        </exec>
    </target>

We then run a target to find all checkouts in the directory and put them into a property called checkout.files:

    <target name="find-checkouts" description="Finds all local checkouts">
        <exec executable="cleartool.exe" failonerror="true" outputproperty="checkout.files">
            <arg line="lscheckout" />
            <arg line="-cview" />
            <arg line="-recurse" />
            <arg line="-fmt '%n;'" />
            <arg line="${btweb.view.dir}" />
        </exec>
        <echo message="${checkout.files}" file="${btweb.view.dir}/mergeResult.log" />
    </target>

Finally we use the property merge.outcome to determine if we should checkin or undo checkout of all the files in the checkout.files property:

    <target name="decide-merge" depends="decide-merge-outcome, checkin-files, undo-checkouts" />

    <target name="decide-merge-outcome">
        <if>
            <equals arg1="${merge.outcome}" arg2="0" />
            <then />
            <else>
                <property name="merge.failure" value="true"/>
            </else>
        </if>
    </target>
    
    <target name="checkin-files" unless="merge.failure">
        <foreach target="checkin-file" param="file" inheritall="true" list="${checkout.files}" delimiter=";" />
    </target>
    
    <target name="checkin-file">
        <echo message="Checking in changes to ${file}" />
        <cc-checkin viewpath="${file}" comment="ANT - checking in from merge of ${view.name}" nowarn="true" keepcopy="false" identical="true" />
    </target>
        
    <target name="undo-checkouts" if="merge.failure">
        <foreach target="undo-checkout" param="file" inheritall="true" list="${checkout.files}" delimiter=";" />
    </target>

    <target name="undo-checkout">
        <echo message="trying to uncheckout ${file}" />
        <cc-uncheckout viewpath="${file}" keepcopy="false" failonerr="true" />
    </target>

As well, we use the merge.outcome to determine if we should fail the build due to the inability to automatically merge the files:

    <target name="fail-if-merge-failed" if="merge.failure">
        <fail message="Merging of changes from ${view.name} failed, all checkouts were undone - please attempt a manual merge" />
    </target>

The process isn’t perfect and could do with a bit of improvement. It succeeds about 95% of the time to automatically merge the changes but when it fails, the undo-checkout command can often fail, especially when dealing with files that have been newly added or deleted. The other major issue is when the branch is being used in some way while the merge is attempted. If a user has a Reserved Checkout or if the branch is in use by another process (i.e. being checked out or backed up), the automatic merge attempt fails.

Branch Development Fallacies

I have now been on 2 major projects within Australia that have used the branch development philosophy. The idea behind this is:

  1. Project is defined by business
  2. Copy of main branch is created and made available to new Project team
  3. All development for that project is confined to the branch
  4. When new changes are committed to the main branch – maintenance fixes or other projects going live – each branch merges the changes from main into their branch and deals with any conflicts
  5. As the project nears completion, it is assigned a production deployment date alongside one or more other projects
  6. Projects assigned to the same production deployment date merge together into an integration branch which is used to create deployment artefacts for testing that release
  7. Development for each project continues in isolated branches – code is merged into integration branch when teams ready to commit
  8. Once project goes live, the integration branch then gets merged back into the Main branch and all other outstanding development branches pull the changes into their codebases

In the ideal situation as outlined above, it makes perfect sense to the business. They argue that this type of development reduces risk by:

  • Making it simple to cancel a project and not have to worry about removing any completed code from the codebase
  • The main branch will always be stable
  • Projects that miss their deadlines or are parked in order to develop higher priority work do not adversely impact the main branch and the ability to push new releases

There are 2 problems with this type of thinking. The first is that very few long running projects that have cost millions of dollars are ever going to get cancelled. More likely they will get pushed to the next release but that means that all the testing that has taken place against the integrated code base must now be re-executed, likely leading to the other projects in that integration branch also missing the deploy window. The second is that while the approach may decrease business risk in that projects can be more easily managed, it actually increases technical risk. The problem with keeping each team isolated on its own branch until the last moment are:

  1. late integration – Since we are waiting until almost the end of a project to start merging our code with the other development branches going into Production with us, it gives us much less time to effectively test the codebase and resolve and merge issues that should arise. Regression defects are much more likely to be introduced as functionality that previously worked in one branch in isolation now breaks against the merged codebase. The teams involved now need to get together and hash through code that was potentially written months ago to determine why the functionality broke, who is responsible and how to get it fixed while preserving the changes both teams made.
  2. code divergence – When branches are used for very long projects, each branch will make changes that take it further and further from other branches. Though each will still remain in sync with Main, when one of the projects goes into main, the merge pain encountered by the other projects will be significant because of the large amounts of change between the old main and the new main.
  3. stale branches – If a branch gets parked in favor of other branches and remains so for any significant length of time, not only will the code on it become stale but also the knowledge of the developers who created that code will also become stale. After a short amount of time, the work will become effectively lost as the developed code no longer has any owner and is no longer in sync with the main branch.

I’m not completely against branch development but I think it should take one of 2 flavours:

  1. Discrete Functionality Branches – Branches that only exist for a very short amount of time (1-2 weeks) which are used to implement a discrete bit of functionality that can then be pushed back into the main branch and be deployed if necessary. The only issue with this type of setup is that the business must be reconfigured so that they are able to break down large projects into smaller components that can be deployed independently. This approach lends itself well to continuous deployment since each new piece of functionality can be pushed out to production as soon as it has been effectively tested.
  2. Early Integration – Ideally, a business would identify projects they want to be deployed into Production together as soon as they are approved to begin development. In that way those teams could either all share a single branch for development or, alternatively, each have a separate branch but create the integration branch immediately and merge to the integration branch as often as possible. In this way any integration pain would be encountered immediately and not at the tail end of a project

There is no magic bullet here but I think that Project managers have to take a longer view than just considering business risk when they insist on developing functionality in isolation. There is a lot of technical risk that will likely land up effecting them that is not taken into account.