The National Archives become official project partner

This is just a quick post to say that The National Archives have become an official partner of the Stepping Into Time Project: Mapping the Blitz in London.

Over the last months since the project started I have been working with Andrew Janes, Map archivist at The National Archives (TNA) to determine if they would like to be an official partner in the project. Andrew kindly helped me fill in the paper work and earlier this month the TNA executive approved their official involvement in the project.

This is great news. Thanks Andrew for all your support and enthusiasm for the project.

 

Posted in project details | Tagged , | 3 Comments

What’s in it for me! Working on a JISC project the benefits for an undergraduate

Hello, firstly by way of an introduction I’m Felix, one of the researchers helping on this project. As the title suggests I am still a lowly undergraduate at the University having just completed my second year (if you can call mid-September to the end of May a year).

My task in the project is to geo-reference the weekly bomb maps and digitally capture, as point data, each bomb that fell (a somewhat humbling experience given the sheer number of points – 400 in a single week for one map sheet!).

Today I’d like to share a few of the benefits I perceive this project has given me.

Note: These are not ranked according to any particular criterion.

  1. Sense of Purpose – It is nice to know that the work you are doing actually serves some purpose in forming the base for other things and that you are working within something bigger than yourself. By necessity the vast majority of the work you do as an undergraduate (for practicals) is individual and designed to prove you can understand the question asked whilst demonstrating the skills you have been taught (to varying degrees of success). It has little use beyond this, especially given that your work will be one of countless others and once assessed it’ll, in all likelihood, never see the light of day again. Knowing this work will continue to be useful therefore makes doing it more worthwhile.
  2. Improve my skills- Before starting this project the process of geo-referencing was one I could probably have described but not really known how to do. Today I type with an almost intermate familiarity of ArcMap’s geo-referencing tools (to the point I got visibly excited upon discovering the latest version [10.1] includes some new tools!) meaning an exercise initially taking almost two days I can now do in a few hours. Perhaps more importantly I have also become much more confident (after making copious backups!) in experimenting with different tools and processes, especially when something isn’t quite working correctly. This means I’ve gone more “off-script” as it were compared to our carefully constructed practicals that are used in teaching. These self-help skills will no doubt be useful for my dissertation next year and beyond (whatever that happens to be).
  3. …and learn new ones – Whilst geo-referencing is performed using ArcGIS (due to its reporting of error) the point collection uses another package Manifold. Previous to the project I had no experience in using any alternative packages, it was therefore interesting to see the differences (and similarities) between software packages to see what each does best and their subtle differences (such as Arc randomly? switching the direction the scroll wheel moves in). For example adding attributes to a point is several orders of magnitude quicker in Manifold compared to Arc, and how Manifold bundles its project assets (which has advantages and disadvantages). Backups!, file organisation and differences between different mapping providers (from Ordinance Survey’s omission of lighthouses to Open Street Maps obsession with coffee shops) are all new considerations I have had to contend with, not alone, but it  nevertheless gives very useful insights into the day to day concerns in this type of task and in committing to a process on a much bigger dataset than I have previously worked on. Perhaps the biggest eye-opener, for me, was simply how it takes so long to get things done (including this blog post!) such as finding where Painters Lane meets Magpie Way in both a historic map and a contemporary map, I am learning to manage my time.
  4. A shift in approach – This may only apply to me given my work history but in addition to this project I am employed by the University’s Information Services (IS) department as part of their front line support for students (ranging from I’ve forgotten my password to I think I just deleted the only copy of my dissertation). Whilst in both of these jobs I sit behind a desk with a laptop, with IS I was purely reactionary  – a student has a problem, comes to be desk and I help them. In the intervening time I do my coursework. With this JISC Project I have a large task that I have to manage getting done over a period of days and weeks. Therefore I have had to train myself to work consistently with attention to detail, not to go flat out at the beginning of the week or else I’ll have no energy, will power  or attention by the end . In other words it’s a long jog not a short sprint.
  5. Finally this job is nicely compensated, which I won’t lie is great. Ideally future employers will find my experience and involvement in this project more than a noteworthy bullet point on my CV too!

So that’s what I’ve found so far, I’ll finish by saying if there’s anyone else that is offered the same opportunity definitely go for it!

Felix 🙂

Posted in knowledge sharing, project details | Tagged , | Leave a comment

Sharing my experience of setting up a sustainable technical development infrastructure compliant with university standards

This week I have been working towards obtaining a virtual server/virtual machine on which we can develop our tools. The aim is to have an infrastructure which will be technically and financially sustainable following the completion of the project and provide an Open Source technical infrastructure that could also be utilised in future as part of the GIS curriculum in the Geography department (extending the impact and benefits of the project) and could be expanded by future research projects.

Ideally, we needed to find a solution that matches our requirements together with the costs of different configurations and the sustainability trade-offs for keeping things going for 5 years. So we needed to balance the pressures of fixed term funding with the medium term sustainability requirements of funding bodies. This blog post charts my increased understanding of Virtual Machines and University Information Systems Infrastructure and the correct procedures and considers the impact on project sustainability. I hope it is useful to any new non-technical Principal Investigators who need to get their heads round this topic and the associated University processes. Whilst some details will be most relevant to staff at the University of Portsmouth (UoP) other details are more generic and will apply to other institutions.

There are a number of decisions that you need to work through – this blog post discusses them. At the end of the post I summarise my learning experiences by developing a small process that will act a guide for anyone at UoP trying to do something similar. What is a virtual machine?For the uninitiated a virtual machine (VM) emulates a physical computing environment that utilises a virtualisation layer to translate requests for CPU, memory, hard disk, network and other hardware resources to the underlying physical hardware. They are useful because:

  •     Applications and services that run within a VM cannot interfere with the host OS or other VM
  •     They can be moved, copied, and reassigned between host servers to optimise hardware resource utilisation.
  •     Administrators can also take advantage of virtual environments to simply backups, disaster recovery, new deployments and basic system administration tasks.

(Adapted from original source for this text: http://searchvmware.techtarget.com/definition/virtual-machine-configuration )

University Private Cloud verses Amazon Web Services

Currently, the project has 2 options for obtaining a virtual machine, Amazon Web Services (AWS) or the University of Portsmouth ‘Private Cloud‘. At first sight, there are a number of  advantages of the AWS cloud over the University Private Cloud: more cost effective, can set up any OS you want, very flexible in terms of storage space, amount of RAM and number of cores which can be selected but of course this also implies extensible costs. The project did apply for an Amazon education grant  but whilst we were short listed, unfortunately we were not successful. The difficultly of AWS is that the service is paid for monthly – so if you want to develop a sustainable solution with respect to the JISC funding guideline – this proves awkward.

The University Private Cloud necessarily has less flexibility with respect to the supported operating systems, RAM, Storage and processor but it comes with a set of standard fixed costings – which are useful for managing the future sustainability of a short project with finite financial resources. The advantage of the University’s Private Cloud means I can purchase my virtual machine and pay upfront the annual cost of running the machine for 5 years following the completion of the project. (The money for this was accounted for in my grant proposal under the heading technical infrastructure). This method is more compliant with University procedures and will prove the path of least resistance. Which Operating System to Install on the Virtual Machine?

Back in March we set up an AWS service and began development on a html prototype of the application using the wireframes as a template. On the AWS service we are currently running Ubuntu Linux OS with PostGIS, Geoserver and GeoDjango: See my blog post on the technical infrastructure. We chose Ubuntu OS as there is a good deal of supporting documentation for running geoserver etc on this operating system. When we installed Ubutu I was not aware it would prove awkward for the University’s IS department. At the University of Portsmouth a virtual machine can currently be set up with the following standard operating systems:

If you are requesting a VM that is part of the University Private Cloud, you must be mindful of these options otherwise to install a non-standard Operating Systemhas the following repercussions:

  •     Greater impact on information services time to assist with a non-standard install (note in summer IS are always really busy preparing for the forthcoming teaching year). You need to account for this time in your project plan.  To set up a VM with a standard supported OS is straightforward and has less impact on IS staff resources – and can be done within a matter of days.
  •  The OS is outside the managed offering so any future support is on a best endeavours basis – this impacts the sustainability of the project in the future, particularly if your technical developer is linked to your project and not a member of staff in the University. If you choose a non-supported OS – how will you support it in the future?

Based on these issues we are going to run an install of the SLES operating system and see what the impact of changing the base installation would be. The benefit of this will enable a smoother transition to a production server on the University Cloud – or on a University maintained physical server – ensures the machine is managed. This is beneficial for project sustainability

Matching project server requirements to University VM facilities

Ideally, the University Private Cloud would provide as much flexibility as the AWS but this is not possible as it is working on an institution specific scale with finite budgetary constraints. The University’s Private Cloud has mainly been designed for a large number of smaller configurations and on request  IS will provide you with a number of  VM options. The entry level configuration starts with 1 processor core, 1 GB memory and 50GB of NetApps Storage.  (The physical hardware components of the Private Cloud run high-end enterprise-class multi-core chips). Information Systems have a guide for costing virtual infrastructure so it is important to access to this and discuss your requirements with IS to weigh up the advantages of VM versus physical servers.

As a starting configuration our development VM will have 2 cores, 6 GB RAM (which will be monitored to either up or down scale) and upwards of 50 GB of high performance fibre-channel storage.

My hints and tips for new non-technical PI’s:

  1.     Know the sustainability requirements of your funders – and then incorporate costs (if allowed) into budget.
  2.     Recruit to your project advisory board a member of staff from your department/faculty who can mentor you through the process and provide valuable insights. For this project, Prof. Richard Healey has been supporting me through the process, providing invaluable experience.
  3.     When recruiting a technical developer remember to make sure they can support the server  infrastructure development side of the grant -and if they are a sub -contractor include this in their scope of work, I have this set this up as a work package within the developer’s work schedule.
  4.     Identify the standard OS  available on the Private Cloud
  5.     Design your technical infrastructure base it on University standard operating systems – to make this easier – this can be done with the project technical lead.
  6.     Obtain the costs breakdowns of virtual machines from information services
  7.     Write the server spec for supporting the technical infrastructure  – work with both your technical developer and the relevant person from University Information Services Infrastructure –  to make sure requirements fit with technical capabilities that can be provided.
  8.     Liaise with Information Services Infrastructure personnel to finalise the spec of the VM and ensure they approve its set up and feasibility.
  9.     Once you have agreed on a suitable VM set up – at the University of Portsmouth you then need to raise a service desk request that details your requirements and provides a cost code for billing.
Posted in knowledge sharing, technical | Tagged , | Leave a comment

Defence of Britain Dataset: About the data we are using part 3

Defence of Britain Dataset: About the data we are using part 3

One of the datasets we are including in the tools is the Defence of Britain data set collected by the Council for British Archaeology and stored in the Archaeology data Service Archive.  The dataset describes the anti-invasion defences built primarily between 1940 and 1941 to defend against a German Invasion.

What is the Defence of Britain Dataset

It is a dataset that records the militarised landscape of the UK originally collected between 1995 and 2001 and updated in 2002.  The dataset records the location of anti-aircraft/tank defences (and lots of other types of defences) and identifies if the site is still present today and provides a brief description of the particular defence. In a few records there is even an image of the defence. It is an early example Volunteered Geographic information (VGI) dataset.

An example record extracted from the Defence of Britain dataset

  • Defence Type: ANTI LANDING TRENCH:
  • Defence Location: Fairlop Oak Recreation Ground [now Redbridge Sports Centre, London Playing Fields, and Hainault Recreation Ground], Forest Road, Fairlop.
  • Description: Anti-landing trenches filled in and sown with oats [1947?]. Sub-drainage was destroyed.
  • Site Condition: Infilled

Who collected the Defence of Britain Dataset?

600 volunteers made a total of 17000 visits to collect data and for sites  and to gather information on sites that have since been removed, information was collected from published works, archival sources and oral testimony: For more info click here.

Who can use it?

What we did to the data to integrate it into our spatial database?

The data is available for download as a KMZ file for use in Google Earth with all the information available contained in a set of HTML tags within one column. The KMZ was imported into a GIS and the projection set. When the KMZ was imported into the GIS it created an array of  map layers for all the different types of defences that existed (infact more than 253 layers). This many layers will not be useful for users within our tools as there are simply too many to select from and consolidate information. Therefore, the first step to transforming the data was to use a series of SQL union queries to create a set of layers based on the following five site groupings:

  1. Ancillary Sites,
  2. Anti Aircraft Measures,
  3. Anti Personnel Measures,
  4. Anti Shipping Measures
  5. Anti Tank Measures

The next step was to create a set of attributes for each of the map five layers by extracting information from the description column which was contained within the original KMZ. To compile a set of useful attributes, the data had to be transformed to a tabular format. This is because within the tools we want to defence sites to be represented by  a point ( or other symbol) to mark the location of the defence site which can be clicked on to reveal more information about the location.  An example of the format of the original data is below:

<center><h1><u>The Defence of Britain</u></h1></center><center><table width=”400″><tr><td><b>Location: </b>Sturt Common, Portland Bill.<br /><br /><b>Condition: </b>Removed<br /><br /><b>Description: </b>1997/06/__ Stone boulders placed at intervals across the Common. </td><td><img src=”http://ads.ahds.ac.uk/logos/dob_logo.jpg&#8221; width=”85″ /></td></tr><tr><td><a href=”http://www.britarch.ac.uk/projects/dob/index.html”><img src=”http://www.britarch.ac.uk/cbalogo.gif&#8221; height=”80″></a><br />The Council for British Archaeology</td><td><a href=”http://archaeologydataservice.ac.uk/archives/view/dob/ai_full_r.cfm?refno=5803″><img src=”http://ads.ahds.ac.uk/logos/adslogo/logoandbar.gif&#8221; height=”80″ /></a><br />Click the logo above for more details on this Artifact.</td></tr></table></center><font size=”-1″><font face=”Verdana”>The information and images in this placemark are copyright of the CBA and ADS and are reproduced here with their kind permission.</font></font><br />

The attribute information we want to extract is:

  • Short name
  • Site location
  • Detailed description
  • Data copyright
  • File path for any images
  • Site condition
  • Link to archive record

So to extract the information I have been grappling with regular expressions!  A regular expression is a text string ( I suppose a little like short hand) that looks for patterns in blocks of text to enable it to be extracted automatically.  I have to say regular expressions are not intuitive (to me) and it is like learning a new language but with a little help from a developer friend, I made it ….and now have the Defence of Britain dataset in a format for our tools. An example of a regular expression to extract the site location from the original text block:

RegExp([Description],”(.*?)(>Location:.</b>)(.*?)(<br.*)” , “$3”

If you want to learn how to manipulate text with regular expressions I found the following resources really useful:

Example of the Dataset

…and here is an example output of the dataset that will be integrated into the tools.

Example Map showing the Defence of Britain Dataset

Posted in data | Tagged , , , | Leave a comment

Work continues on the licence agreement

Over the last few months I have been working with a colleague Kate Charles here at the University of Portsmouth  along with the Licensing Team at The National Archives  to ensure that the tools we are developing can use the digital scans of the bomb damage maps without breaching any licencing laws.

The details of the licence are still being agreed but we are working on obtaining a non-commercial license that will be valid for 10 years and allow us to integrate the maps and photographs into the web-mapping tool and the mobile phone app.

Posted in data, project plan | Tagged , , , | Leave a comment

Creating digital maps from paper maps – Georeferencing

This blog post summarises how we are transforming paper maps of from the Bomb Census into interactive digital maps through a process known as geo-referencing.

We start with scanned paper map which is stored as a digital image that is transformed by adding a geographical reference system.

Example of the types of target image and reference image (Click for enlarged view)

To do this you need two types of data elements. A target element which is the image that has no coordinate system and the reference element which is a geographical referenced image containing spatial coordinates.

  1. Digital images of the bomb census maps (image space – target element)
  2. Geographical raster dataset with embedded spatial coordinates (coordinate space – reference element)

The bomb damage maps are matched to a contemporary map that has a set of spatial coordinates (eg open street map, bing maps, virtual earth) by matching a point location in the historical map image to a location in the contemporary map. This process is known as creating control points. The control points facilitate the geo-referencing process and leads to the  georegistration of the historical map image. The image above shows two control points which identify the same location in both the target image and reference map, as an example.

The number of control points required will depend on the mathematical model that you use to conduct the transformation but essentially (1) use as many as possible ; (2) distribute them evenly and (3) make sure you have them placed at the edges of the target map.

Once a sufficient number of control points have been identified the target image (the bomb census maps) can be transformed to match the coordinate system of the reference map. This transformation enables the target bomb maps to be rotated, scaled and shifted in order to re-project the bomb census maps in with a set of geographic coordinates. The initial results are shown in the sample image below.

Extract of the georeferenced bomb damage map layered on top of a Virtual Earth Hybrid Map. Image Source: HO193/13 map 5620 SE

Posted in data, project details | Tagged , , | Leave a comment

OpenSource Technical Infrastructure: PostGIS, Geoserver and Django …

The technologies underpinning Stepping Into Time will take advantage of free or low cost computing resources and open-source software now prevalent in web and app development. The final technical infrastructure is being finalised but this post summarises the fundamental components.

Technical Infrastructure Diagram
Stepping Into Time Project

Server Side 

The server-side of Stepping Into Time will be built using the Python GeoDjango web-app framework, supplemented by Geoserver, with the generated data hosted in a PostGIS database. The server will run in a industry standard Linux-Apache server environment, deployable on a wide variety of server providers (eg Amazon EC2,  University hosted – to be determined).

The PostGIS database provides an appropriate method for data storage, management, spatial retrieval and processing, and is the most appropriate Spatial  Relational Database Management System (RDBMS) for integration with the rest of the proposed technology stack, ie it is fully supported and the recommended spatial RDBMS for GeoDjango. It will hold the geographic data  in the format of either a point, line or an area (ie vector data) for example each bomb that fell is marked at the street level as point with an X, Y coordinate that denotes its location.

The original maps are scanned they are processed in the desktop GIS to add a geographic reference system  – a method known as geo-referencing (we will use the low cost GIS called Manifold), they will be saved in the GeoTiff file format and saved on the server. The middleware Geoserver will then enable the geographic data to be accessed and served to the tools we are developing, in a variety of standard formats (eg KML, SHP, Map Tiles and WMS georeferenced images) by either the desktop GIS, mobile application or web-mapping application.

GeoDjango is a web framework designed to facilitate the building of GIS-based web applications, and to enable the use of spatial data on the web.  It is based on the Django framework, a Python framework originally designed to handle fast-moving news websites.  Django offers an in-built administration interface, along with the ability to define database data models and queries, and a template language to separate design, content and code.

Client Side

The project’s client-side will make use of modern Ajax design standards to achieve an easy-to-use,  interactive user experience. This will be achieved through a combination of HTML, CSS & JavaScript, facilitated through JQuery, with the essential web-mapping and geo-data contribution mechanism enabled via the Leaflet framework.

The Leaflet open source JavaScript library will be used to provide to end clients (web and mobile) both access and display of various base maps as well as custom data, along with spatial interaction functionality.  JQuery provides a framework of JavaScript functions simplifying the development of rich and interactive applications.

Posted in project details, technical | Tagged , , , , | 1 Comment