Tuesday, February 01, 2011

Overriding PDE/Build with the Ant Import task

PDE/Build has a number of places where you can run custom ant scripts during the build.

The general pattern is the you would copy one of the customization templates from org.eclipse.pde.build/templates/headless-build into your builder directory and then modify the file as required. For simple builds it is often unnecessary to copy these files at all and PDE/Build will just use its original copy.

If you only have a small change to make to the customization scripts, then it can be cleaner to not copy the template file and instead use Ant's Import task.

Using Import to make minor changes

The ant Import task allows for overriding a target from the imported file. As an example, the Eclipse SDK includes the Build Id in its about box. The About Box contents come from "about.mappings" files inside plug-ins.

What we want to do is after getting all our source from CVS, we do a quick replace in all the about.mappings files to update them with the build id.

Instead of copying the customTargets.xml file into our builder, we create our own that contains just the following:
builder/customTargets.xml:
<project name="customTargets overrides" >
<import file="${eclipse.pdebuild.templates}/headless-build/customTargets.xml"/>

<target name="postFetch">
<replace dir="${buildDirectory}/plugins" value="${buildLabel}" token="@build@">
<include name="**/about.mappings" />
</replace>
</target>
</project>

${eclipse.pdebuild.templates} is a property that is automatically set by PDE/Build and it points the folder containing the template files. This small snippet is much cleaner than copying the entire customTargets.xml just to add a few lines.

This pattern can make for smaller and neater build scripts, but it turns out that this can also be a very powerful tool for modifying PDE/Build's behaviour.

Here is the Magic


During M5 milestone week, the Orion builds began failing about 90% of the time. The failure was "Unable to delete directory" in the middle of packaging ant scripts that are generated by PDE/Build. This seems to have been caused by an overloaded/lagged NFS server (or disk array).

When building a product for multiple platforms with p2, PDE/Build installs the product into a temporary directory, zips it up, deletes that directory, and then repeats for the next platform using the same temporary directory. If there is a problem deleting the directory then we are in trouble because even if we could ignore the problem, the next platform will be contaminated with contents from the previous one.

In order to work around the problem, we need to modify a target named cleanup.assembly which simply performs an ant <delete/> on the temporary directory. The problem is, that this target is in the middle of the PDE/Build generated configuration specific packaging scripts. A deeper understanding of how these scripts work is required.

Package Script Overview


When running a product build, the generated package scripts are organized per platform that we are building for. As an example, if we are building for windows, mac and linux, then we would have the following scripts:
package.org.eclipse.pde.build.container.feature.all.xml
package.org.eclipse.pde.build.container.feature.linux.gtk.x86.xml
package.org.eclipse.pde.build.container.feature.macosx.cocoa.x86.xml
package.org.eclipse.pde.build.container.feature.win32.win32.x86.xml
The "org.eclipse.pde.build.container" portion of the file name comes from this being a product build. In a feature build this would be the name of the top level feature being built. The first (*.all.xml) script is the main entry point for the packaging process. Each of the other scripts do the packaging for each platform. Every one of those platform specific scripts contain a cleanup.assembly target that needs to be modified.

Script Delegation

The top level packaging scripts does not call all the others directly, rather it uses a kind of delegation through the allElements.xml file. This file can be copied to your builder and modified to change the archive name or perform pre or post processing on the archive.

For each platform, the top level packaging script will call allElements.xml/defaultAssemble (or a platform specific assemble.*.[config] target if one is defined) passing it the name of the platform specific packaging script to invoke.

This is where we can insert our change in order to override the platform specific packaging scripts.

The Modified allElements.xml file

We copy the allElements.xml file from org.eclipse.pde.build/templates/headless-build/allElements.xml into our builder and change the "defaultAssemble" target to look like this:
<target name="defaultAssemble">
<ant antfile="${builder}/packageOverride.xml" dir="${buildDirectory}">
<property name="assembleScriptName" value="${assembleScriptName}" />
<property name="archiveName" value="${archiveNamePrefix}-${config}.zip"/>
</ant>
</target>

The name of the platform specific packaging script is specified by the ${assembleScriptName} property. Instead of calling this directly, we instead call a script of our own "packageOverride.xml" and pass it the script name. Product builds normally use their own allElements.xml provided by PDE/Build which also sets the archive name based on the configuration being built. Since we will be using our own allElements.xml file, we also set the archive name here.

Product Builds (using org.eclipse.pde.build/scripts/productBuild/productBuild.xml) are hardcoded to use their own copy of the allElements.xml file. In order to change this we must set a property allElementsFile which points to our copy. This property must be set before invoking productBuild.xml, which means setting it on the command line, or in a wrapping ant script. This is not necessary when doing a feature build.

The new packageOverride.xml script

The allElements.xml delegation script has now been modified to invoke our own packageOverride.xml script. Our script looks something like this:
packageOverride.xml:
<project name="package.override" default="main" >
<import file="${buildDirectory}/${assembleScriptName}" />

<target name="cleanup.assembly">
<condition property="doAssemblyCleanup" >
<or>
<not><isset property="runPackager" /></not>
<contains string="${assembleScriptName}" substring="package." />
</or>
</condition>
<antcall target="perform.cleanup.assembly" />
</target>
<target name="perform.cleanup.assembly" if="doAssemblyCleanup" >
<exec executable="mv" dir="${buildDirectory}" >
<arg value="${assemblyTempDir}" />
<arg value="${buildDirectory}/tmp.${os}.${ws}.${arch}" />
</exec>
<exec executable="rm" dir="${buildDirectory}" >
<arg line="-rf ${buildDirectory}/tmp.${os}.${ws}.${arch}" />
</exec>
</target>
</project>


Here, we import the package script that was passed to us from allElements.xml, each time the packaging script calls us, we will be importing a different script. We inherit all the generated targets and override the cleanup.assembly target. Our modified version moves the temporary folder to a different location before trying to delete it. If the delete fails, that is ok because it is no longer in the way of the next platform. I used the native 'mv' and 'rm' hoping that they would behave better with a slow NFS server.

The packageOverride.xml script must specify default="main" as that setting is not inherited.

It is important to note that this override also affects the generated assemble.* scripts which are very similar to the package scripts. The assemble scripts also have an cleanup.assembly target which is getting overridden here. However, that target is only supposed to run during assembly if we are not going to be doing packaging. This is why we need a condition here to make sure the temporary folder is only deleted when it should be. The condition I used here would be wrong for feature builds where the top level feature name contains "package." because the generated scripts in a feature build contain the top level feature id.

Final Notes


The change I have outlined here mas made to fix a specific problem with the Orion build. Care must be taken when applying these techniques to other problems and builders.

The exact details here have been modified from the changes I actually made, so I have not actually tested the scripts as they are written above. Specifically, the condition on the overridden target has been added.

This specific problem can also be fixed in pde.build itself, this is tracked by bug 336020.


No comments: