Defence of Britain Dataset: About the data we are using part 3
One of the datasets we are including in the tools is the Defence of Britain data set collected by the Council for British Archaeology and stored in the Archaeology data Service Archive. The dataset describes the anti-invasion defences built primarily between 1940 and 1941 to defend against a German Invasion.
What is the Defence of Britain Dataset
It is a dataset that records the militarised landscape of the UK originally collected between 1995 and 2001 and updated in 2002. The dataset records the location of anti-aircraft/tank defences (and lots of other types of defences) and identifies if the site is still present today and provides a brief description of the particular defence. In a few records there is even an image of the defence. It is an early example Volunteered Geographic information (VGI) dataset.
An example record extracted from the Defence of Britain dataset
- Defence Type: ANTI LANDING TRENCH:
- Defence Location: Fairlop Oak Recreation Ground [now Redbridge Sports Centre, London Playing Fields, and Hainault Recreation Ground], Forest Road, Fairlop.
- Description: Anti-landing trenches filled in and sown with oats [1947?]. Sub-drainage was destroyed.
- Site Condition: Infilled
Who collected the Defence of Britain Dataset?
600 volunteers made a total of 17000 visits to collect data and for sites and to gather information on sites that have since been removed, information was collected from published works, archival sources and oral testimony: For more info click here.
Who can use it?
- Non-commercial, educational or research purposes with attribution, for more information see the terms and conditions
- All data is copyright © Council for British Archaeology
- DOI : http://dx.doi.org/10.5284/1000327
What we did to the data to integrate it into our spatial database?
The data is available for download as a KMZ file for use in Google Earth with all the information available contained in a set of HTML tags within one column. The KMZ was imported into a GIS and the projection set. When the KMZ was imported into the GIS it created an array of map layers for all the different types of defences that existed (infact more than 253 layers). This many layers will not be useful for users within our tools as there are simply too many to select from and consolidate information. Therefore, the first step to transforming the data was to use a series of SQL union queries to create a set of layers based on the following five site groupings:
- Ancillary Sites,
- Anti Aircraft Measures,
- Anti Personnel Measures,
- Anti Shipping Measures
- Anti Tank Measures
The next step was to create a set of attributes for each of the map five layers by extracting information from the description column which was contained within the original KMZ. To compile a set of useful attributes, the data had to be transformed to a tabular format. This is because within the tools we want to defence sites to be represented by a point ( or other symbol) to mark the location of the defence site which can be clicked on to reveal more information about the location. An example of the format of the original data is below:
<center><h1><u>The Defence of Britain</u></h1></center><center><table width=”400″><tr><td><b>Location: </b>Sturt Common, Portland Bill.<br /><br /><b>Condition: </b>Removed<br /><br /><b>Description: </b>1997/06/__ Stone boulders placed at intervals across the Common. </td><td><img src=”http://ads.ahds.ac.uk/logos/dob_logo.jpg” width=”85″ /></td></tr><tr><td><a href=”http://www.britarch.ac.uk/projects/dob/index.html”><img src=”http://www.britarch.ac.uk/cbalogo.gif” height=”80″></a><br />The Council for British Archaeology</td><td><a href=”http://archaeologydataservice.ac.uk/archives/view/dob/ai_full_r.cfm?refno=5803″><img src=”http://ads.ahds.ac.uk/logos/adslogo/logoandbar.gif” height=”80″ /></a><br />Click the logo above for more details on this Artifact.</td></tr></table></center><font size=”-1″><font face=”Verdana”>The information and images in this placemark are copyright of the CBA and ADS and are reproduced here with their kind permission.</font></font><br />
The attribute information we want to extract is:
- Short name
- Site location
- Detailed description
- Data copyright
- File path for any images
- Site condition
- Link to archive record
So to extract the information I have been grappling with regular expressions! A regular expression is a text string ( I suppose a little like short hand) that looks for patterns in blocks of text to enable it to be extracted automatically. I have to say regular expressions are not intuitive (to me) and it is like learning a new language but with a little help from a developer friend, I made it ….and now have the Defence of Britain dataset in a format for our tools. An example of a regular expression to extract the site location from the original text block:
RegExp([Description],”(.*?)(>Location:.</b>)(.*?)(<br.*)” , “$3”
If you want to learn how to manipulate text with regular expressions I found the following resources really useful:
Example of the Dataset
…and here is an example output of the dataset that will be integrated into the tools.