This report describes the year 2 activities of the SCAPE project in the Characterisation Components work package, and presents an evaluation of the suitability of format identification tools for execution in a parallelised Map Reduce environment. Also the publication of format identification data, and the implications for data curation and publication are described. A novel tool to mine domain specific semantic meaning from web archives is presented, and an Azure based application that facilitates the conversion and quality assurance of large document collections is described. Finally a conclusion presents some overall findings before a roadmap for the coming year is presented.
Upcoming Events
- The SCAPE Project has closed on 2014-09-30. See Past Events above.
OPF Blogs for SCAPE