The Conversion Software Registry - PowerPoint PPT Presentation

Loading...

PPT – The Conversion Software Registry PowerPoint presentation | free to download - id: 6fdac5-YjVkZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

The Conversion Software Registry

Description:

... Blender Cinema 4D K-3D LightWave 3D Maya Wings 3D Shortest conversion path Input/Output Graphs 2010 MS eScience -7 Software Reuse Layer Exists for the sole ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 33
Provided by: mond156
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: The Conversion Software Registry


1
The Conversion Software Registry
  • Michal Ondrejcek, Kenton McHenry, Rob Kooper,
    Luigi Marini, and Peter Bajcsy

2
Overview
With an increasing number of file formats used
each year preservation of electronic records has
become one of the major challenges for the
National Archives and Records Administration
(NARA). The Strategic Plan of the National
Archives and Records Administration (NARA)
2006-2016, Preserving Past to Protect Future
2006, URL http//www.archives.gov/about/plans-repo
rts/strategic-plan/
  • Why?
  • Will there be software to load the file in the
    future?
  • If not will the specification for the format
    still exist?
  • What would be the best file format conversion in
    terms of information preservation?
  • Was the specification ever available in the case
    of closed/proprietary formats to begin with?

This research is partially supported by a
National Archive and Records Administration
supplement to NSF PACI cooperative agreement CA
SCI-9619019.
2010 MS eScience -1
3
Conversions
  • Convert files to an open standardized format to
    store with original
  • How and which format?
  • Conversions often result in some information
    loss, which format would have the least?
  • If we had a universal converter we could test
    conversions and compare before after files to
    estimate information loss
  • How do we convert!?
  • MANY file formats!
  • MANY closed/proprietary formats!
  • MANY with large/complex specifications

2010 MS eScience -2
4
Available 3D File Formats
  • Many applications to create/view/save 3D content
  • MANY of them introduce a new file format for that
    content!

2010 MS eScience -3
5
  • Most applications support a handful of imports
    and exports
  • Perform differently based not only on the
    algorithm used but also on the purpose of the
    format domain
  • emphasis on the texture in 3d
  • morphology vs. color histogram in 2d

2010 MS eScience -4
6
NCSA file conversion technologies
Visualization (I/O Graph)
Conversion (Polyglot) Software Reuse Closed Source Software
Comparison (Versus) I/O Graph Weights Tool
2010 MS eScience -5
7
Input/Output Graphs
Visualization (I/O Graph)
Conversion (Polyglot) Software Reuse Closed Source Software
Comparison (Versus) I/O Graph Weights Tool
  • Software import and export options are
    visualized.
  • I/O Graph chooses the shortest path with the
    minimum applications.

2010 MS eScience -6
8
Input/Output Graphs
3DS Max Adobe 3D Reviewer AutoCAD Blender Cinema
4D K-3D LightWave 3D Maya Wings 3D
2010 MS eScience -7
9
Software Reuse Layer
Visualization (I/O Graph)
Conversion (Polyglot) Software Reuse Closed Source Software
Comparison (Versus) I/O Graph Weights Tool
  • Exists for the sole purpose of providing an API
    interface to functionality in 3rd party software
  • Controls software via wrapper scripts
  • AutoHotkey, AppleScript, various shell scripts
  • Vision based scripts
  • Hides away details of using 3rd party software
  • Attempts to recover from errors, can throw
    exceptions

Making Use of 3rd Party Software - We define this
as the wrapping of 3rd party software, utilizing
whatever interfaces the software vendors have
made available, in order to re-introduce an API
like interface to embedded functionality.
2010 MS eScience -8
10
Software Reuse Layer
  • Exists as a service on the machine where the 3rd
    party software exists
  • Clients provide the Java API interface
  • Many servers can exists on many machines of
    different platforms

2010 MS eScience -9
11
Visualization (I/O Graph)
Conversion (Polyglot) Software Reuse Closed Source Software
Comparison (Versus) I/O Graph Weights Tool
Polyglot
  • The sole purpose of this layer is conversions.
  • Uses multiple software reuse servers
  • Merges available script operations into an
    I/O-Graph
  • Searches I/O-Graph for conversion paths between
    an input format and a desired output format
  • Has no knowledge of underlying 3rd party software
  • Can use redundancy in software reuse servers to
    improve performance and work around faults

2010 MS eScience -10
12
Comparison Layer
Visualization (I/O Graph)
Conversion (Polyglot) Software Reuse Closed Source Software
Comparison (Versus) I/O Graph Weights Tool
  • The sole purpose of this layer is to compare
    files.
  • Versus, a framework for pair-wise digital object
    comparisons. The library extracts the same
    features from both objects
  • and computes the similarity based on the chosen
    measure.
  • Uses Polyglot layer to convert many test files
    across many of
  • the possible paths
  • A -gt B -gt A
  • Compare files before and after conversion
  • I/O Graph Weights Tool - Converts a set of files
  • across many paths using Polyglot and scripts.
  • Adds information losses obtained from Versus
  • as edge weights to I/O Graph.

2010 MS eScience -11
13
Conversion Software Registry (CSR)
2010 MS eScience -12
14
  • http//isda.ncsa.illinois.edu/NARA/CSR
  • Complementary to format registries such as PRONOM
    and GDFR
  • No similar service that we are aware of.
  • Community contributions encouraged
  • A database focused on
  • Conversion software!
  • Finding subsets of software for specific
    conversion needs
  • Find conversion paths between pairs of formats

2010 MS eScience -13
15
The CSR pseudo-tables block design
Parts 1) Conversions, 2) Software, 3) Formats
and Files, 4) Scripts, 5) User login and history
2010 MS eScience -14
16
Adding Conversions
2010 MS eScience -15
17
Adding Conversions - scripts
Script headers are standardized with up to four
lines with Software name and version, software
domain (image, 3d, document, etc.), and
input/output formats.
  • Script types present
  • Convert - full conversion
  • Monitor - monitoring software
  • behavior
  • Kill - terminating the software
  • Open/Save/Import/Export

2010 MS eScience -16
18
Editing Pane
  • Software
  • Vendors
  • Software platforms
  • Interfaces
  • Formats
  • Equivalent extensions
  • Sample files

2010 MS eScience -17
19
File formats identifiers and extensions
CSR relies on the identifiers.
Canonical and derived identifiers Common
usage TIFF MIME image/tiff UTI
public.tiff PRONOM puid fmt/10
PUID is used for different format versions. For
example, a tiff extension is represented as PUID
fmt/10 for the version 6.0, fmt/155 for
GeoTiff.
CSR search by extensions, MIME, PUID
2010 MS eScience -18
20
Test files
Any file which can be used for conversion
accuracy and software validation. The files are
uploaded and verified through the UNIX File
command and against the file extension entry in
the CSR database. Additional file validation has
been performed semi-automatically by NARA using
GTRI (Georgia Tech Research Institute) File Type
Identifier.
W. Underwood, Extensions of the UNIX file
command and magic file for file type
identification, Technical report ITTL/CSITD
09-02, Georgia Tech Institute, 2009, URL
http//perpos.gtri.gatech.edu/publications/index.h
tm
2010 MS eScience -19
21
Searching for Software
Find a conversion path for converting a file
format A to a file format B.
2010 MS eScience -20
22
Searching for Software
Find a conversion path for converting a file
format A to a file format B.
2010 MS eScience -20
23
Searching for Software
Find a conversion path for converting a file
format A to a file format B.
2010 MS eScience -21
24
Shortest path from file A to B
  • Dijkstra's algorithm - path with lowest cost
    (e.g. the shortest path) between one vertex/node
    and every other vertex with edges defined by some
    measure
  • Subjective measure - software ranking by user
    propagates to all conversions.
  • Quantitative measures within the domain (images,
    3d etc.).
  • Images Normalized cross correlation measure,
    Histogram distance measure,
  • 3d Surface area, Statistics, Spin images, Light
    fields
  • Document (pdf)
  • Audio

User specified measures for example a linear
combination of measures.
2010 MS eScience -22
25
Searching for Conversion Paths
2010 MS eScience -23
26
Searching for Conversion Paths
2010 MS eScience -23
27
Searching for Conversion Paths
2010 MS eScience -23
28
Searching for Conversion Paths
2010 MS eScience -23
29
Searching for Conversion Paths
2010 MS eScience -23
30
Searching for Conversion Paths
2010 MS eScience -23
31
Future Directions
  • Compiling known good data of various formats
  • Systematically measuring information loss across
    software and formats
  • Possibly distributing task among a community
  • Ranking software based on performance
  • Integration of CSR and Polyglot.

2010 MS eScience -24
32
Summary
  • Currently contains 2,006 software packages
  • 1,682 format extensions
  • 233,810 conversions
  • No similar service that we are aware of
  • Complementary to format registries such as PRONOM
    and GDFR
  • Free
  • Community contributions encouraged

http//isda.ncsa.illinois.edu/NARA/CSR
2010 MS eScience -25
About PowerShow.com