Don't wanna be here? Send us removal request.
Text
An old paper/abstract on ambiguity analysis
âQuantifying Ambiguity and Gamifying the Writing Process: A Case Study on William Blakeâs âThe Sick Roseâ
Dana Milstein and I wrote this for the Digital Humanities 2015 conference.
Sharing here for reference
0 notes
Text
Emulation makes WORM media a reasonable digital preservation choice
Iâm posting this on my personal blog as itâs not ready for anywhere more formal. There are few citations, and itâs not fully formed, but hopefully of interest.
In a paper (P225-labeled/P226-actual) I recently presented with Daniel Chadash from Twist BioScience at iPRES 2022 Dan and I discussed how Write Once Read Many (WORM) media is becoming a more viable and attractive option for ensuring the long term preservation of digital content. In the paper we go into detail about why WORM media is now more attractive and discuss how DNA data storage is becoming a particularly practical WORM storage option (and soon part of a write-many solution also). As part of the explanation we briefly discuss the role of emulation in making WORM media more viable. It wasnât the focus of the paper (which was primarily about DNA data storage) and therefore didnât get a great deal of exposition. So for this yearâs World Digital Preservation Day (#WDPD2022) I thought Iâd spend a little time expanding on it.Â
Most organizations doing digital preservation, and most digital preservation software systems assume and are designed around regular content migration (i.e. some information from interactive software-dependent experiences stored in software and âdataâ files is going to be selected and reimplemented in new, more modern, software and data files - a process often called âformat migrationâ). Content migration based approaches donât seem to be compatible with WORM media as they seem to require stored data to be regularly changed or deleted. For example:
There seems to be an expectation of regular file deletion: If content migration is ever going to happen (itâs hard to find many large scale examples of it âin the wildâ so Iâm going to be particularly cagey with my language) the process should result in new files being added to the organizationâs storage. And the old files that make up the original content are usually proposed to be deleted to save resources. Alternative, (at least in theory) sometimes the original and most recent sets of files after multiple content migrations are kept and interim file sets deleted. So, content migration as an approach likely requires files to regularly be deleted from storage.Â
There may be a requirement for changing the stored file structure: There is sometimes an expectation in the digital preservation community that the files that make up preserved digital objects, along with their metadata, should be stored in a file structure that maintains them together as a findable intellectual unit. Whether or not any of us agree with that (in my opinion there are good arguments both ways in different contexts), if we assume this is a requirement, a content migration approach likely requires regularly adding new files to those file structures in order to ensure each digital object remains together on the storage media. In other words, content migration requires changing the file structure of the stored content.Â
This expectation of having to delete or change stored objects if following a content migration based digital preservation approach has led to an assumption that WORM media is not a great solution for long term digital preservation as it doesnât allow for either action.Â
Emulation changes this. A digital preservation approach that instead assumes that we will preserve and use emulation to maintain access to the original software files that make up the digital experiences weâre preserving, doesnât have this same problem. With an emulation based approach, outside of unavoidable changes resulting from the right to be forgotten, or take-down requests, files never need to be changed once stored on WORM media. Provided the original interaction software files are stored with the âdataâ files, and emulators stored along with them, then the content should still be accessible at any point in the future.Â
A concern could be raised that at some point, our current emulators will not work on modern computers. We have at least two options for managing this.Â
We can rewrite the emulators
We can create new emulators that can run the operating systems required to run the emulators we already have, then nest the old emulators inside the new ones
Both of these have pros and cons and both require writing new software which could be considered expensive. In our iPRES 2022 paper on the Emulation as a Service Infrastructure (EaaSI) Program of work we briefly discuss the economics involved which also seem to come out in favour of emulation: âRe-writing one emulator could ensure access to many emulated computers, which could ensure access to many legacy software applications which could ensure access to virtually unlimited digital objects.âÂ
e.g.Â
But most importantly, once either solution has been selected, the new emulators can be added to the same type of storage and donât need to be stored in the same folder structure. Given the economy of scale of emulation (indicated in the quote and diagram above), even if we re-write the emulators occasionally and store those along with the old âobsoleteâ ones, that should only add a relatively insignificant amount of additional storage usage to our requirements. In other words, an emulation based approach can treat storage as something we only ever add to. By only being additive, and not destructive or requiring changes, an emulation based approach makes WORM media much more feasible.Â
To finish, we do seem to be living in interesting times. If there were a major event after which society one day rebuilt itself such that it could recover data from long life WORM media, having usable software available to decode all the other digital files out there seems like an additional valuable benefit of this approach. In alignment with this, the team I manage recently deposited a collection of both software installation media, and usable emulated environments with software installed on them (along with the EaaSI source code and binaries), to the Arctic World Archive stored on Piql film (a WORM media). And as the paper that motivated this post (P225) describes, weâre also testing DNA storage as a long-life WORM storage option. With large scale use of emulation now practical through endeavours like our EaaSI program of work I hope that weâre helping to ensure some digital artifacts persist regardless of how âinterestingâ our times become.
0 notes
Text
Referencing my Archives New Zealand Legacy - Risks, Public Sector Datasets, and Rendering/Interaction
I fairly regularly find myself wanting to reference work I was involved in during my time at Archives New Zealand. Itâs become more and more difficult to do so over the years as the web pages where the work was published have been taken down. Itâs both gratifying (as it indicates that it was worthwhile work) and somewhat sad (that these references havenât been supplanted by newer work) that I still find myself referencing this work, but there are definitely parts of each of these projects that have value today (well, at least to me).Â
In this post Iâll highlight three pieces of work I was involved in while working at Archives New Zealand and provide links to where you can find the outputs. I hope having a central reference to these legacy project outputs will be useful for anyone who needs to find them, and may also serve as an opportunity to highlight the work to anyone who might get some new value from it.Â
Rendering Interaction Matters
Firstly, Iâve already revisited the âRendering Mattersâ report. In retrospect I wish Iâd used the term âinteractionâ rather than âRenderingâ but the outputs are still relevant. Here is the last time I revisited the report on this blog in 2014. And here is the post on the Open Preservation Foundation blog where I announced the report. And finally here are links to the report and the visual results from the Internet Archive.Â
I most recently referenced this report to highlight an appendix discussing how long it took to manually assess each digital object (also highlighted in the 2014 revisit). I highlighted it to provide data to support the great analysis Johan van der Knijff wrote up about the recent publication of âThe Significant Properties of Spreadsheetsâ. This was the final report of a six-year research effort by the Open Preservation Foundationâs Archives Interest Group (AIG).Â
"For a start, the sheer number of properties would make any manual, non-automated analysis extremely cumbersome and time consuming. Interestingly, this is confirmed by the following quote from the DNA stakeholder analysis (p. 18)" 1/x https://t.co/F9U1d228fG
Specifically addressed in the Rendering Matters report as I describe here: https://t.co/Z8Gm5Ur1xp "[to test] 100 files comprehensively ([took] at least 13.5 hours). Scaling this to 0.5% of 1,000,000 files would give 675 hours or nearly 17 weeks at 40 hours per week." 2/2
Preservation of Public Sector Datasets
I undertook this research and wrote up the report before I was full time at Archives NZ. I was brought over to work on this and some other work while on secondment from my position at Statistics New Zealand where I was working on the establishment of the data archive for official statistics. Â
The goal of this research was to evaluate the challenges involved with preserving structured data, or âdatasetsâ across the New Zealand Public Sector. âThe findings are a result of interviewing a number of public sector organisations about their dataset holdingsâ
While, in retrospect, I donât see it as my best work, the Preservation of Public Sector Datasets report is available in the internet archive here.
I reference this somewhat often when discussing using emulation and Stabilize or EaaSI to maintain access to databases. In particular I like to highlight this diagram:
and to highlight my notes regarding the complexity of databases that makes them particularly suitable for emulation-based long-term access strategies:
âDatabases contain many relationships that have to be preserved to make their data usable, for example the relationships between each of the sub-tables within the databases. They are also often managed using DataBase Management Systems that are unique and can be closed and/or proprietary. Furthermore, the applications that are used to access the data are normally proprietary and built as one-off interfaces for the purpose at hand. This makes them difficult to preserve as their usability is dependent upon the operating environment, e.g. the software (such as the operating system or Microsoftâs .NET framework) and hardware (such as a 64-bit architecture processor) that they were created to be used within.â
âBecause of the interconnected nature of most datasets there are often complicated relationships between parts of datasets (or sub-sets) which make it difficult to remove any part without destroying some of the provenance information and quantitative value contained in the remaining parts.â
â It will also be important to record which parts have been removed from any dataset so that where other remaining parts refer to them or have a relationship with them any future users will be aware of this and will be able to make an informed use of the remaining data as a result. Documenting such redaction could be a major task for archivists and may lead to the Archives New Zealand coming to the same conclusion as the Dutch Digital Preservation Testbed (DPT) team who decided that for pragmatic reasons the best approach would be not to break up datasets but to preserve them as a whole.â
EDIT 12th October 2021:
I also completed a small project while at Archives NZ to migrate a SQL Server 2000 database to emulated hardware. It went quite well. I shared the process documentation here:Â https://openpreservation.org/blogs/migrating-windows-2000-database-server-virtualized-and-emulated-hardware/Â
Risks to Digital Public Sector Information
I worked on this research with Monica Greenan. It âidentified a set of risks which affect the long-term access to trusted digital information in public sector agencies. It also established a set of indicators that can be used to identify the risks in public sector agencies.âÂ
â[The] survey was sent out to CIOâs where they existed in agencies and to equivalents or nearest-equivalents where there was no CIO role. CIOâs were targeted for their strategic view-point and because of their role in controlling resources that might be used for digital information management. The survey was intended to provide a view of the key risks to digital information being faced by CIOâs and was made up of 26 risks divided into four sectionsâ
The survey is available here and the Digital Information Risk Identification Tool here. We referenced this recently when evaluating risks in our digital preservation infrastructure and it is interesting to compare with the âDiAGRAM (the Digital Archiving Graphical Risk Assessment Model created by the Safeguarding the Nationâs Digital Memory project)â
An example risk from the tool is below:Â
That risk is particularly relevant in NZ as earthquakes are incredibly common there.Â
Conclusion
So I hope this is a useful post, and perhaps highlights some publications that might still have value to some folks. I once asked Archives NZ to re-publish the rendering matters report after one of the redesigns that had removed it, at that time the website administrator was surprised to find it had been of the most accessed pages on their site. It was re-published and has since gone again, but thanks to the Internet Archive it and all of these are still accessible, thanks Internet Archive! (Internet archive donation link).
0 notes
Text
Editing Guymager User Interface Field Names
Yesterday on twitter a thread started about the tools used for digital curation and how many come from a history in law enforement which is problematic:
Lots of GLAM folks (including me) using tools from vendors that primarily service law enforcement. Thinking about forensic bridges and disk imaging software especially. Kinda fucked up right? Are there alternatives?
â Eddy Colloton (@EddyColloton)
March 21, 2019
Elizabeth England highlighted the problems with field names/terms used in the Open Source disk imaging tool Guymager:
Yes! Even with guymager, which is open source, there's still adaptations we make to crosswalk terms like "evidence number" and "examiner" for our purposes that I'd love to not have to do.
â Elizabeth England (@elizabeengland)
March 21, 2019
I decided to try putting in a feature request with the developer, Guy to see if we could get this changed: https://sourceforge.net/p/guymager/feature-requests/13/
He responded incredibly quickly:
Hello Euan, you currently could change the names by creating a new language file. However, I must admit that this might be too complicated for the standard user. Would you like me to create one for you? You could then tell me if it fits your request. If yes, please send me your "translations" for
  case number   description   examiner   evidence number   notes
Remark: Please be aware of the fact that those text fields will keep the original labels inside the EWF files . This is due to the fact, that the EWF format does not allow for specifying own labels. Unlike it would be done nowadays (as for example in XML, JSON) the text fields in EWF are simply positional and thus there's no label that could be changed.
Guy
I decided to try this change myself and thought I should write up the process for others to try.
First you need to find one of the language files to edit. For the most recent version the english language file is here: https://sourceforge.net/p/guymager/code/HEAD/tree/tags/guymager-0.8.8/guymager_en.ts#l26Â
(there is a âDownload this fileâ link at the top of the page)
Other languages are also available in the parent location. The files have the â.tsâ file extension.
You can then edit that file and convert it to a .qm file. The easiest way I found to do this was to open and edit the file in QT Linguist. That program is a free program that you can install in Ubuntu/BitCurator with the following commands (as root):
apt update
apt install qt4-dev-tools qt4-linguist-tools
It is also available for windows and mac here: https://github.com/lelegard/qtlinguist-installers/releases
In Ubuntu you can start linguist by typing linguist in terminal and pressing enter.
You can then use the file menu to open the .ts file you downloaded from the Guymager sourceforge site.

You can then select the user interface context you want to edit the translation for on the left panel e.g. ât_DlgAcquireâ which corresponds to the acquisition dialog box. That presents you with the strings in the middle box. After selecting one you can edit itâs translation in the bottom box:
Once you are happy with your changes you can save the .ts file (with a dffierent name if you like) then use the âRelease asâ function in the file menu to generate a .qm file to be added to the guymager configuration.
That file then needs to be be given an appropriate name that follows the pattern of the original file, e.g. guymager_en-CH.qm. In that example the âCH: represents âCultural Heritageâ. Everything past the first underscore â_â and before the extension can be altered but Iâd recommend not using unusual characters or spaces. Note what you have put into that location in the file name as it is used to identify the file in a subsequent step.
Next you copy the .qm file to /usr/share/guymager/
e.g. in ubuntu as root:
cp guymager_en-CH.qm /usr/share/guymager/guymager_en-CH.qm
Then you need to edit the Guymager config to use that translation file. In Ubuntu you can do that with Gedit or another text edit with (as root):
gedit /etc/guymager/guymager.cfg
You want to then look for the language variable and change it from âautoâ to the name you used in the qm file name, e.g. âen-CHâ:

Update from Guy:
"Please do not change /etc/guymager/guymager.cfg directly. Create a new file /etc/guymager/local.cfg and put the line Language = en_CH into it.
The idea behind: If an update is going to be installed, your change would be overwritten if it resides in guymager.cfg. As local.cfg is loaded later (see INCLUDE statement at the end of guymager.cfg) the settings residing there overwrite those from earlier configuration files.
Another remark: It's really nice to see that Guymager is getting used beyond the world of computer forensics. If there's anything else you need for your work just let me know."
Save that file and Guymager should now have a different translation which should carry into the log files for those acquisition workflows.
The files I created are available here
0 notes
Text
Recovering data from 8-inch (8âł) floppy disk media - a list of resources
Today we started looking at imaging and recovering data from 8-inch floppy disks again and I started sharing resources with folks in my team and thought it might be good to consolidate them all somewhere.Â
So here is a list of potentially useful resources for recovering data from 8âł floppy disks, somewhat ordered by usefulness/reasonable reading order:
A Case Study on Retrieval of Data from 8-inch Disks. Of the Importance of Hardware Repositories for Digital Preservation. Denise de Vries Flinders, Dirk von Suchodoletz University of Freiburg, Willibald Meyer. https://web.archive.org/web/20181128205423/https://ipres2017.jp/wp-content/uploads/54Denise-de-VriesA.pdf
Data Recovery and Investigation from 8-inch Floppy Disk Media: Three Use Cases. Walker Sampson (Digital Archivist, University of Colorado Boulder), Abby R. Adams (Digital Archivist, Harry Ransom Center), Austin Roche (Independent collector)  https://osf.io/6gcky/
FDADAP floppy disk adapter http://www.dbit.com/fdadap.html
FDDC DC-DC converter for 8" floppy drives http://www.dbit.com/fddc.html
Kryoflux forum threads:Â https://forum.kryoflux.com/viewtopic.php?f=3&t=159Â https://forum.kryoflux.com/viewtopic.php?t=56
â 8âł Disk Recovery: Kryoflux and Catweasel. The story so far. Denise de Vries http://openpreservation.org/blog/2016/09/14/8-disk-recovery-kryoflux-and-catweasel/
The next two are not about 8-inch disks but are interesting for the custom stream converter that was created:Â http://openpreservation.org/blog/2012/01/03/digital-archaeology-and-forensics/Â http://openpreservation.org/blog/2012/07/17/conclusion-ctos-forensics/
And finally a couple of pieces of media showing success!
youtube
The scatter plot off an 8 inch floppy. #digitalpreservation pic.twitter.com/Xd6TPVcVmn
â DP PIG (@dp_pig)
October 20, 2015
2 notes
¡
View notes
Text
Floppy Disk Format Identifer Tool
I created this tool https://github.com/euanc/DiskFormatID Â recently and this post is an attempt to document it a bit more thoroughly.
What it is good for?
âAutomaticallyâ identifying floppy disk formats from kryoflux stream files.
Enabling âsimpleâ disk imaging workflows that donât include a disk format identification step during the data capture process.
What does it do?
It processes copies of floppy disk data saved in the kryoflux stream file format, creates a set of disk image files formatted according to assumptions about the diskâs format (e.g. that it is an AmigaDOS disk), and allows the user to try mounting the image files as file systems. If the mounting works that is good because it:
Implies that the mountable disk image file is of the âcorrectâ format
Allows the user to copy the files from the disk image
It requires the Kryoflux program to function and runs instances of the command line version of the kryoflux software (called âDTCâ) to create the various disk images. In the interface you can choose how many instances of DTC you would like to run concurrently. This can make it a lot quicker to process many disks and create many disk images from each, but it has a significant  CPU overhead so the default is one instance at a time.
How do you use it?
Clone or download the set of files in the GitHub Repository https://github.com/euanc/DiskFormatID
Install pyqt4-dev-tools, python27, default-jre (i.e. java)
Download the kryoflux software and extract it somewhere
Navigate to the kryoflux/dtc directory and run âjava - jar kryoflux-ui.jarâ then accept the license agreement (assuming you agree to the terms). This seems to be necessary to get the software to work properly. It must update a required file somewhere.
Open a terminal window, navigate to the DiskFormatID folder and run âsudo python diskIDMain.pyâ Kam Woods (https://twitter.com/kamwoods) packaged it! Thanks Kam! Follow the install instructions in the readme file and run it then....
You should get a window like this:

Use the buttons to select input, output and DTC directory folders. The DTC directory is the folder that the kryoflux DTC program is in. The Input folder should contain folders of kryoflux stream files, the names of which will be used for the disk image files this tool creates. The output folder will be used to store folders of disk images created by the tool.
Click âChoose formats to createâ to choose image formats to create. You should get a window like this:

It will show the pre-configured options at the bottom. You can clear them by clicking âRest allâ
You can choose options and click âSave (and add another) to save the option to the settings file. The configured types output box should update on click.
At any point you can close the window and the configured types will be saved as long as they appeared in the bottom output.
Back in the main window you can then select how many concurrent DTC instances you want to run and click âCreate imagesâ to start creating them.Â
It will first create subfolders within the output folder names with the original folder names with spaces converted to â_â as DTC cannot accept spaces in pathnames. It will then copy the tracks to a subfolder of those called âtrackâ
It will then go on to run the instances of DTC. Note: It may hang/freeze the window/system at that point but after some time you will start to see results in the output box. If you check your running processes (e.g. using the âtopâ command in another terminal instance) you should see instances of DTC starting and stopping. e.g.:

It will eventually complete and a âCreation of images completeâ message will appear in the output box.
After that you can choose whether to delete or keep the unmountable images and then click âTry mountingâ to start trying to mount and/or delete unmountable files. This process should be very quick.
At any point you can click one of the âSave Resultsâ buttons to save the relevant output box contents to a text file.
You will end up with results looking something like this:

What is known to be wrong with it?
Quite a lot:
It doesnât format the DTC command line output properly in the results window. It dumps the success-output for each image first then the error-output.
The interface layout is all weird.
The âchoose formatsâ window doesnât automatically close.
The main settings and the format creation settings are saved in the same JSON file. It would be nice to save them separately so useful sets of image formats to test could be shared. They still could be shared but the main settings (data locations etc) would, by default, be included in the JSON.
Right now there is no license on github with it. I will put up something. As far as I am concerned it is free to use, reuse and build on under a cc-by type license like the MIT license http://choosealicense.com/licenses/mit/ except where it has to be restricted by any other licenses it relates to via itâs dependencies.
I still donât know what image settings matter. E.g. many of the MFM optional settings produce disk images can be used to create mountable disk images from the same source streamset.
It is not packaged. You need to install python 2.7 and pyqt4 to make it run and you need to start it from the terminal in linux. Kam Woods (https://twitter.com/kamwoods) packaged it! Thanks Kam!Â
It only runs on linux. The disk creation part runs on windows (with the dependencies met - it does work with the windows version of DTC I believe) but there is no obvious or as-functional equivalent to âmountâ on windows (cygwin is a can of worms) so that part does not work on Windows.
If it wasnât clear, this relies on the Kryoflux software, DTC. I canât see why you would have stream files and not rights to use the software, but it should be pointed out regardless. Please support the team at http://www.kryoflux.com, they are great!
Somewhere I got lists of RPM speeds, both basic and extended lists (you can see them in the âchooseFormats.pyâ file. I donât know where from but perhaps in the Kryoflux documentation. Only the basic ones are included in the dropdown selection menu
The disk image file names use the kryoflux parameters to distinguish between variants of the same format. These arenât really human readable, I should probably change them to the corresponding text from the drop down menus.
I think I left out one of the kryoflux/DTC parameter options. I canât remember which right now
It has to be run as root - this is because of the kryoflux software. I think if you install it in linux using the instructions given by the kryoflux team then this shouldnât be necessary. It results in all of the disk images being created with root permissions only - and the output results files.
Thanks to https://twitter.com/j_w_baker and these test files at the Internet Archive I realised that DTC cannot accept filepaths with spaces in them. So Iâve implemented a process whereby it copies all the data into subfolders (described above) before running dtc.
What might actually be improved?
It depends on whether it needs improvement and how much interest there is.
Things Iâve considered doing include:
Adding an option to export disk contents to a âcontentâ folder - the images get mounted anyway so I could include a copy step before unmounting them.
Adding an option to export mounting results into a CSV report, one row per input disk
Adding an option to parse DTC output to identify 40/80 track and single or dual sided disks by parsing the output. I donât know this is possible but the Kryoflux GUI shows green squares every second column for 40 track disks when they are imaged as 80 track disks so I assume that means it is interpreting the data to understand that somehow. Same for single and dual sided disks
Adding an option to save DTC log files alongside images
Changing how the file names are generated so they are more human readable.
Changing permissions on outputted files so they arenât assigned to root only.
How was it made/what is the story behind it/confessions of a novice programmer?
I created a python script some time ago (2015?) to do much of what this tool does but with a recursive method used to set which disk image formats to create. That method was not easy to change and if all options were configured it would have created 500,000+ images per disk. The main problems with it were that I didnât know which disk image format settings mattered and didnât have an easy way to set them using the command line.
Iâd already implemented a workflow for imaging disks where the student-workers created kryoflux stream files and didnât try to identify what type of disk they had. So I was somewhat locked into this approach and people had (very reasonably) been bugging me about whether it was sustainable or not. So recently I decided to finish the work to make it at least a practical, if not ideal, approach.
My first programming lessons were in BASIC when I was about 11/12 years old. All I remember from them was being disappointed with one of the example lessons in which we generated a USA flag and not a New Zealand flag.I now live in the States so go figure.  I also have a vague recollection of unsuccessfully trying to copy code for a text adventure game from the back of a how-to book. Since then Iâve done very little aside from COMP101 when I was at university and I keep trying to make it through the python http://www.codeacademy.com  course but I still havenât yet. ------ What Iâm getting at is that I made this through a bunch of trial and error and it comes with absolutely no warranty. Hence extensive comments throughout the files -- they were mainly for my benefit!
I used pyqt4 and qt designer to make the Graphical User Interface (GUI) windows . The qt designer files are in github (they have â.uiâ extensions). From those i used pyuic4 to create the python programs that generated the GUI windows/UIs. I was then able to include those files in the main program file(s) by importing them.
The multithreading/parallel processing works, Iâm pretty sure. But Iâm not really sure how/why. I copied the code from the web somewhere and adjusted it for this use.
The settings are saved in the âsettings.jsonâ file. I feel like Iâm not using the most efficient methods to read and write the file (python dictionaries confuse me) but it works, so I guess that is what matters. Â
I ended up having to create two dictionaries for each DTC parameter, one with the elements/items reversed. Iâm sure I could have solved the reverse lookup programmatically if Iâd tried harder but that was a nice quick work around. Â
I need to learn more about using GitHub
Anything else worth knowing?
Probably but my brain is full right now.
Please let me know if it is useful, if there are any bugs, any thoughts at all. I can be contacted via twitter https://twitter.com/euancpa
0 notes
Text
Acquiring software alongside born digital acquisitions & automatically identifying what software should be acquired.
I wrote this post in a hurry, it is not cited. It has a terrible title. Itâs one of those posts that is as much notes for myself as anything but I absolutely welcome any and all feedback.Â
There are good reasons to assume that when an organization acquires born-digital files for long term preservation and access they should also ensure that they have the software that is required to enable interaction with the content in the files in full and without distortion. While Iâm assuming this in this post I do also give some justification for it as well.Â
Acquiring this software at point of acquisition of the born digital files is quite hard today for a number of reasons including (but not necessarily limited to):
We donât have any tools that automate the identification of necessary interaction software (NIS)
We donât have time to try to identify the necessary interaction software manually
We often donât have enough information to identify the necessary interaction software (so even automated processes canât identify it).
Even when we can identify the NIS we often have no way to legally acquire a copy of it or preserve access to it long term
There is more to be said about all of these points. But Iâm going to use this post to discuss some thoughts about 1. in particular.Â
(1.) Automating identification of NIS presents an interesting opportunity.Â
With multiple projects and processes underway to develop databases of both software, and information about software, it may eventually be possible to associate the software identifiers in those databases with the format identifiers created by format identification tools like DROID from the National Archives. DROID already does this to a degree with the associations in the PRONOM database between file formats and software.Â
Unfortunately all the file format information databases currently suffer from the curse of generality. All, to a large degree, assume that all .docx files or .odt or .ods files are the same. This is not the case. Different software applications can create files purporting to be of a standard format differently. This means that the same interaction software could take two files of the same âformatâ created with two different pieces of software and be able to enable interaction with components of one but not enable interaction with components of the other. Furthermore different software can take the same file and present different content to users. Both of these reasons mean that our databases (letâs face it, I primarily mean PRONOM for now) would benefit from assuming that:
each piece of software creates files of a particular âformatâ differently
each piece of software enables interaction with files of a particular âformatâ differently
until proven otherwise (many open source tools, for example, use the same code to created and/or enable interaction with the same sets of digital objects so we will likely find that these assumptions donât apply to them).Â
This in turn assumes that each software & format combination is effectively a different âSoftware Defined Formatâ (SDF) (for want of a better term).Â
Iâve discussed (and -unlike this post- actually cited references to back up) these points numerous times elsewhere. The point of this post is to highlight that if we had databases of information about software that included information about what formats that software can create and enable interaction with, i.e. databases of SDFs, and we developed signatures to identify different SDFs, then it would be fairly straight forward to develop an appraisal/acquisition tool to automate identification of NIS. This tool could then be used by practitioners acquiring or appraising born-digital collections. And further to that big âifâ, I believe we should have such databases and should have such a tool. Software is important, it is a first class archival object and can (and I believe should) arguably be considered part of each born digital object. If we are going to take those (mostly-unsubstantiated in this post) claims seriously then we need to start establishing processes to enable the appropriate identification and long term access to such software.
Writing this post raised another interesting question for me. Should we be ensuring access to the creation software as well as the NIS where they differ? E.g. should we acquire Adobe Acrobat Pro as well as Adobe Acrobat Reader for all pdf files created with Adobe Acrobat Pro? There are reasons we may want to do so, e.g. to enable future researchers to understand how the digital artifacts were created, not just how they were normally interacted with. And if economies of scale mean that ensuring access to certain software just means subscribing to a service that makes them available and checking that they are there when needed, then this could be quite trivial (and possibly inexpensive) to enable, so why not?Â
1 note
¡
View note
Text
Validating migration via emulation
As discussed numerous times elsewhere on this blog. Automated migration of content between files of different formats can often lead to content being lost or altered. Verifying the migration of content at scale is also not-cost effective as currently it is mostly a manual process for most types of content.Â
A âsensibleâ digital preservation policy might therefore involve automatically migrating as much content as possible into formats that are usable in modern computing environments while also offering users the option of interacting with content via âoriginalâ software using emulation tools and services.Â
Were one to implement this policy it may then also be possible to have users âvalidateâ migrated versions of content against the emulated versions. This process could be âautomaticallyâ managed in software, and when multiple users agree about the validity of a migrated digital object it could be marked as valid. Others would then know that they can reuse the migrated version with a degree of trust in the integrity of the content.Â
This process would be much like the process used for validating manual transcription of digitized content.
This could also possibly be coupled with migration on demand (possibly via emulation) so that only once an on-demand-migrated, then validated, digital object had been validated multiple times would it actually be separately ingested into a digital preservation system and preserved along side the original version, possibly replacing a previously migrated version, rather  than just migrated on demand. This would cut down on storage of migrated versions by only preserving âvalidatedâ migrated versions, but also ensure that a reusable bitstream (or bitstreams) was available and properly preserved if this became a priority (rather than only relying on migration on demand for re-usability).
1 note
¡
View note
Text
Digital evidence
Iâm nervous about posting this idea. I worry the main suggestion in it, if at all useful, is of limited use. But this is what blog posts are good for, getting ideas out there into the discourse regardless of their value. So here goes nothing.Â
I've blogged here previously about the need to be able to preserve software to be able to validate digital evidence in court cases. I still think this is a very important and under-explored issue that ought to be tested in the courts. But I'm also regularly reminded of the general fragility of the authentication apparatus that we assume exists around digital content. I'm currently going through the implementation of a digital preservation system that does a very admirable (excellent really!) job of ensuring authenticity by keeping logs of actions related to stored digital content and thoroughly securing access to the content. It would be extremely difficult to alter content stored in our digital preservation system without some record being made of that or without it causing an error being presented to us administrators. There is, however, a very small chance it could be done by a sufficiently motivated and resourced malicious actor. To my knowledge, this is true of all digital preservation systems currently in wide usage. Undetectable alteration would require manipulation of multiple data stores and databases and provided the digital preservation system was configured to store copies of the data and metadata in one or more off-line and near line storage systems it would be even more difficult as a single attack would not allow immediate access to all copies (the malicious actor would have to find a way to alter the off line copies also, a-la [the admittedly ridiculous]Â Mr Robot). The possibility, however small, remains (I believe) that a sufficiently motivated and resourced individual may be able to alter digital content in our possession in a way that could avoid technical detection. Someone equipped with an excellent memory may notice that the content has been altered, but that would be of limited use if it can't be proven.Â
I've been surprised that this area hasn't blown up more in the courts and in the media. It's all too easy to fake digital evidence. A quick google will find multiple seemingly reputable organizations giving examples of how easily manipulated digital information is. Maybe there just haven't been enough sufficiently motivated and resourced individuals (or maybe they have (or had) just been too good [WARNING -- the previous link resolves to a post on the Intercept about the Snowden documents which may be off-limits to some readers]). Why do I raise all of this you might ask?  Well in thinking about this I wondered whether there was anything the digital preservation community could do to ensure the things that we look after are as able-to-be-authenticated as possible, and I have a very tentative suggestion answering that thought.
I have been following with great interest and admiration the pre-study registration model being used in the pharmaceutical industry. The idea of this is that in order to make sure a clinical trial or study is as unbiased as possible researchers are required to preregister a description of the study, its parameters and the method of evaluation. In doing so they help to ensure that bias cannot be introduced at a later stage such as after the data has been collected when that a researcher could try a different method of evaluation/analysis to get a more 'useful' outcome. This pre-registration seems to have helped a lot also. Well this got me wondering, could we do something similar? Could we pre-register our data with a reputable third-party before it needs to be validated for authenticity in order to ensure our data (like the clinical trialâs methods) was not compromised along the way? Â I think maybe we could, and maybe we should. So thatâs what Iâm tentatively proposing here:
What if we registered all our digital object identifiers and checksum values in one or more public databases as soon as we received the digital objects? This would at least increase the difficulty of manipulation of the data by malicious actors. Malicious actors would have to alter the public databases in addition to the private ones we administer in the digital preservation community. Â Such databases shouldn't cost a huge amount to host and administer. Checksum values are not large and the databases need not be hugely complex. And they could perhaps be offered by organizations that use them on a reciprocal basis. E.g. each digital archive could host a checksum database for all others and vice versa.
This doesn't necessarily link the content of the files to the checksums, but at worst the archives would only be able to say, when asked about a file registered in such a database, that they no longer have any file with that checksum value and/or ID. And it would force a malicious actor, that was attempting to alter the content, to change both the files and their persistent identifiers, in each system in which each is stored. It also doesn't help in situations where either a weak checksum algorithm was used that could allow for multiple files having the same checksum, or situations where a sufficiently powerful CPU cluster can use brute force to achieve the same results. It would also not stop someone completely deleting all evidence of an object aside from the checksum registry databases. However it does make things harder for a malicious actor to make undetectable changes to our digital heritage and digital evidence; a situation that could be extremely important to prevent given appropriate circumstances. The potential of someone presenting seemingly authentic (but actually manipulated) digital evidence has the potential to be more powerful than destroying the original and authentic evidence. Surely this is a good thing to prevent.
Is this all just more budget-increasing red-tape though? Perhaps, but as with all perceived red-tape itâs possible that this is actually valuable enough to be worth the effort. After all, why preserve records if you can't tell if they are fake or not?
----------
Update/Edit
The consensus from twitter and offline seems to be that:
a) Clearly define the threat model before spending any time or money on anything like this
b) There may be some value in this, but I should show exactly what the value is first, and either way, hashing algorithms become weaker over time so
 b1) you would need to update the values over time
 b2) you would have to trust the organization doing those updates
c) Blockchain tech sounds interesting but as David Rosenthal regularly points out, its not necessarily adding anything to this challenge.
0 notes
Text
I want the right to be forgotten for 100 years
Leisa commented on my earlier post about preserving data to enable time travel with a question about how we get enough user data to give the context needed to replicate the experience of living in earlier time periods (in Virtual Reality (VR)).Â
Another topic that was raised at the NYC Archivists Roundtable panel discussion yesterday was the European Unionâs Right to be Forgotten law. I responded to questions about it with the suggestion that instead of a right to be forgotten permanently Iâd like a right to be forgotten for a period of time, e.g. 100, or 150 years. On further thought Iâd also like to add a caveat that Iâd like to be able to restrict access to my personal data for âxâ (100?) years or âyâ (10?) years after I die, whichever comes first.
If we would and could all do that with our personal data, there could be a wealth of information to give context to VR time travel experiences.Â
Of course right now there arenât many services Iâd feel comfortable handing my data over to to look after for 100 years from a preservation perspective. But thatâs a different issue.Â
0 notes
Text
Why I want everyone to preserve as much data as possible
Last night I was on a panel for an NYC Archivistâs roundtable event entitled âArchives in the Electronic Age: Part IIâ. At one point someone in the audience questioned the value of the comprehensive Twitter archive that the Library of Congress is developing.  I responded with an example I came up with and used in a talk five years ago.
Star Trek and time travelÂ
To get more specific, Star Trek, starting with The Next Generation series I believe (Iâm not much of a trekkie), introduced the concept of a Holodeck. This is a space on their starships that you can physically go into that morphs around you to reproduce any experience you would like. With the rise of Virtual Reality (VR) solutions this concept is becoming more and more feasible.
In the series they often use the Holodeck to go back to earlier periods in time. This always enthralled me as I love the idea of time travel. Â However I always wondered how they managed to reproduce the experience so accurately. Which brings me to my point. If we want to be able to use VR technology to experience earlier times we are going to need a lot of data about what they were like.Â
Iâd love to be able to âgo back in timeâ in VR and have everything be as accurate as possible (even down to the twitter feeds). This is not going to be possible if the data is not there.Â
So, everyone, preserve as much data as possible, so we can all time travel (kind of)!
0 notes
Text
Software dependent content
A big bugbear of mine is that people often assume an emulation based digital preservation approach is only useful for preserving the âlook and feelâ of digital objects.Â
I believe that this is not true. Examples Iâve posted elsewhere on this blog, mostly coming from the research I led at Archives New Zealand, show that software can change the actual meaningful content that a user is presented with. The software used to interact with a digital object can add content, the software can remove content, the software can alter content. Its not just the âlook and feelâ that is altered, it is the content itself.
These examples have motivated me to begin using what I think may be a new term/phrase: âsoftware dependent contentâ.
I use this term to refer to content that requires a particular and limited range of software environments in order to be interacted with, rendered, viewed or consumed.Â
Iâve found it a useful term, especially when discussing the need to invest in emulation based digital preservation approaches, and thought I should release the term into the wild for others to use.Â
Consider it released.
0 notes
Text
Revisiting "Rendering Matters"
I had occasion today to look up the "Rendering Matters" report I wrote while at Archives New Zealand (I was looking for this list of questions/object attributes that were tested for and included as an appendix in the report) and got distracted re-reading the findings in the report. From the findings you can draw the conclusion that the use of emulation services is likely to be the most cost-effective way to preserve digital content over the long term. In the light of the recent progress with emulation services such as the bwFLA Emulation as a Service framework and the excellent work that has been going on with Javascript MESS Â I thought it might be worth reposting the findings here as a reminder for readers.
Summary findings from "Rendering Matters":
The choice of rendering environment (software) used to open or ârenderâ an office file invariably has an impact on the information presented through that rendering. When files are rendered in environments that differ from the original then they will often present altered information to the user. In some cases the information presented can differ from the original in ways that may be considered significant.
The emulated environments, with minimal testing or quality assurance, provided significantly better rendering functionality than the modern office suites. 60-100% of the files rendered using the modern office suites displayed at least one change compared to 22-35% of the files rendered using the emulated hardware and original software.
In general, the Microsoft Office 2007 suite functioned significantly better as a rendering tool for older office files than either the open source LibreOffice suite or Corelâs Word Perfect Office X5 suite.
Given the effectiveness of modern office applications to open the office files, many files may not need to have content migrated from them at this stage as current applications can render much of the content effectively (and the contentâs accessibility will not be improved by performing this migration as the same proportion of the content can currently be accessed).Â
Users do not often include a lot of problematic attributes in their files but often include at least one. This in turn indicates a level of unpredictability and inconsistency in the occurrence of rendering issues which may make it difficult to test the results of migration actions on files like these.Â
Detailed findings
There were more detailed findings towards the end of the report:
"The [findings] show quantitatively that the choice of rendering environment (software) used to open or ârenderâ an office file invariably has an impact on the information presented through that rendering. When files are rendered in environments that differ from the original they will often present altered information to the user. In some cases the information presented can differ from the original in ways that may be considered significant. This result is useful as it gives a set of ground-truth data to refer to when discussing the impact of rendering on issues of authenticity, completeness and the evidential value of digital office files.Â
The results give an indication of the efficacy of modern office suites as rendering tools for older office files. Risk analysis of digital objects in current digital repositories could be informed by this research. Digital preservation risk analysts could use this research to evaluate whether having access to these modern office suites means that files that can be âopened âby them are not at risk.
The results highlight the difficulty and expense in testing migration approaches by showing how long it took to test only ~100 files comprehensively (at least 13.5 hours). Scaling this to 0.5% of 1,000,000 files would give 675 hours or nearly 17 weeks at 40 hours per week. This level of testing may be considered excessive depending on the context, but similarly comprehensive testing of only 100 files per 1,000,000 of each format (.01%) would take at least 13.5 hours per format, per tool. More information on how long testing would take for a variety of different sample sizes and percentages of objects (e.g. 1% of 100,000 objects would take 150 hours) is available in Appendix 3.
The results also show the promise of running original software on emulated hardware to authenticate the rendering of files to ensure that all the content has been preserved. Although emulated environment renderings were not shown to be 100% accurate in this research, they were shown to have a far greater degree of accuracy in their renderings than current office suites (which are the tools currently used for migrating office files). Additionally, some of the changes introduced in the emulated environments may have been due to poor environment configuration.
The results give an indication of how prevalent certain attributes are in office files. With a greater sample size this research this could help to show whether or not it is true that âmost users only use the same 10% of functionality in office applicationsâ (the data from this small sample indicates that in fact they only use about 10% of the functionality/attributes each, but often it is a different 10%)."
Findings specific to the prevalence of rendering "errors"
Personally, I found the findings related to the prevalence of problematic attributes in the files tested to be most enlightening. The relevant findings from the report are included below:
"The likelihood that any single file has a particular attribute that does not render properly in a particular rendering environment is low,
and
The likelihood that the same file will have at least one attribute that doesnât render properly in a particular environment is quite high (~60% and above).âÂ
In other words, the results indicate that users do not often include a lot of attributes in their files that caused rendering issues when rendered in modern environments but often include at least one. This in turn indicates a level of unpredictability and inconsistency in the occurrence of rendering issues.
A significant challenge for digital preservation practitioners is evaluating the effectiveness of digital preservation approaches. When faced with a large and ever increasing volume of digital files to be preserved, practitioners are forced to consider approaches that can be automated. The results in this report indicate that the occurrence of problematic attributes is inconsistent and they therefore may be difficult to automatically identify. Without identifying such attributes pre-migration it will not be possible to test whether the attributes exist post-migration and so the effectiveness of the migration will not be able to be evaluated. Without automatically identifying such attributes pre-migration then it is unlikely that any effective evaluation will be able to be made cost-effectively. The cost to manually identify these attributes for every object would likely be prohibitively large for most organisations given reasonably sized collections."
Time to manually validate object rendering
Also included in the appendices was a table estimating the time it would take to manually validate a set % of objects for a given collection size. This was based on the average of 9 minutes it took to undertake the tests as part of the rendering matters research. I've included this table below, it is sobering.
Visual examples
Also included as an appendix in the report, and included in a separate web page, are some examples of the types of rendering issues that were identified, including screenshots e.g.:

Replicating the results
It has now been three and a half years since the publication of this report and as far as I am aware nobody has attempted to replicate the approach or the findings. Personally I found the process as enlightening as the results, and would welcome (and where possible, help with) the replication of this research by others.
0 notes
Text
What to call non-Google-worthy questions
Here is something I've been discussing frequently recently and would love some input on.Â
I believe we need a word for questions that you don't know the answer to, but which are so unimportant/insignificant/uninteresting that you can't even be bothered googling them.Â
Since realising that there is a need for this I've noticed that the concept/situation comes up quite frequently (though it might just be a case of the Baader-Meinhof Phenomenon).
Any ideas?
0 notes
Text
Reply to DSHR's Comment
I ran over the comment limits on David Rosenthal's blog when I tried to reply to his reply to my comment on his blog. I've included my reply below instead.Â
Hi David,
 The problem I see is that we fundamentally disagree on the framing of the digital preservation challenge. I meant to reply to your last "refutation" of Jeff Rothenberg's presentation at Future Perfect 2012 but hadn't gotten around to it yet. Perhaps now is a good time. I was the one that organised Jeffâs visit and presentation and I talked with him about his views both before and after so I have a pretty good idea of what he was trying to say. I wonât try to put words into his mouth though and will instead give my (similar) views below.
The digital preservation challenge, as I see it, is to preserve digitally stored or accessed content over time. I think we can both agree that if we arenât leaving something unchanged then we arenât preserving anything. So, to me, the digital preservation challenge requires that we ensure that the content is unchanged over time
Now Iâm not sure if you would agree that that is what we are trying to do. If you do, then it seems we disagree on what the content is that we are trying to preserve. Â If you disagree that that is what we are trying to do then at least we might be able to make some progress on figuring out what the disagreement stems from.
So if you can at least understand my perspective Iâd also like to address your comments about format obsolesce. Iâm not a proponent of the idea of format obsolescence. The idea makes little sense to me. However I am a proponent of a weak form of the idea of software obsolescence and, more importantly, the associated idea of content loss due to software obsolescence.
The weaker form of the idea of software obsolescence that Iâm a proponent of is that because of hardware changes, software loss and loss of understanding about how to use software, software becomes unusable using current technology without active intervention.
The associated idea of content loss that I am a proponent of is the idea that to successfully preserve many types of content you need to preserve software that that content relies upon in order to be presented to users and interacted with. A stronger way of putting that is to say that in many cases, the thing to be preserved is so inextricably connected to the software that the software is part of that thing.
If you take that leap to accepting (whether fully or in order to simplify the explanation) that the software is part of the thing to be preserved, then it becomes obvious that practitioners who are  only doing migration are in many cases not doing real preservation as they are not preserving the entirety of the objects.  Hence Jeffâs presentation in which he reprimanded the community for not really making progress since the early 2000s.  Almost nobody is preserving the software functionality.
As it is relevant to your post and comments, Iâll use a web page as an example to illustrate what I mean. The content presented to users for interaction with by a traditional web page, is presented using a number of digital files including the server hosted files, e.g. the web server & applications, the html/XHTML pages, scripts, images, audio, and the locally hosted files such as the browser, fonts, browser skins, extensions etc. The combination of these files mediated by usually at least two computers (the server and the client) together present content to the user that the user can interact with it. Changing any one of the files involved in this process may change the content presented to the user. To preserve such a page it is my view that we need to start by deciding what content makes up the page so that we can both begin to preserve it and so that we can also confirm that that content has been preserved and is still there in an unchanged form at a point in the future. In most cases itâs likely that all that needs to be preserved is the basic text and images in the page and their general layout. If this is all then migration techniques may well be appropriate if the browser ever becomes unable to render the text and images (though I agree with you that that doesn't seem necessary yet or likely to be necessary in a hurry). However there are two difficulties with this scenario:
There will be many cases where the content includes interactive components and/or things that include software dependencies.
 When you donât know, or canât affordably identify the content to be preserved, preserving as much as possible, cheaply, is your best option.Â
(A) means  that you will require some solution that involved preserving the softwareâs functionality, and I believe that (B) means you should use an emulation based technique to preserve the content.
Emulation based techniques are highly scalable (across many pieces of digital content) and so benefit from economies of scale. Emulation strategies and tools, once fully realised, I believe will provide a cheaper option when you factor in the cost of confirming the preservation of the content.
Itâs a bit like the global warming problem. Most products and services do not include the carbon cost in them. If they did they would likely be much more expensive. Well I believe digital preservation solutions are similar: if you factor in the costs of confirming/verifying the preservation of the content you are trying to preserve, then many solutions are likely to be prohibitively expensive as they will require manual intervention at the individual object level. Â Emulation solutions, on the other hand, can be verified at the environment level and applied across many objects, greatly reducing costs.
So as I see it, it is not about format obsolescence, it is about (a weak form of) software obsolescence and preservation of content that canât be separated from software.
In your post you seemed to be suggesting something similar, that content needed to be preserved that was heavily reliant upon browsers and server based applications. You also discussed a number of approaches including some that involved creating and maintain virtual machines, and followed that with the statement that: âthe most important thing going forward will be to deploy a variety of approachesâ. I took that to mean you had softened a little in your attitude towards using emulation to preserve content over time<a<< <<.
Sorry, I seem to have misunderstood.
3 notes
¡
View notes
Text
How could son B could rely on the emulation solution?
Commenting on my last post about the importance of digital object integrity Barbara asked:
"Nice post! But son B need to do something more if I was the judge.If in court, how could son B show that the firm that did the emulation was trustworthy? By asking 3 firms and compare the results?"
I'm going to take this to mean: how might the firm prove to son B that their emulation solution was trustworthy so that son B could prove it in court?
Its a good question, and one that I get asked a lot, so here is my answer:
I think the bar for how "authentic" an emulation solution needs to be should be set as follows: For running a particular software environment emulated hardware should meet the same expectations as were expected of  hardware that the environment would have run on when it was in use.Â
I don't think that (in most cases) it has to be the exact same hardware. The reason I believe this is because that is not expected of digital objects today (and wasn't expected of the objects created in 2009/2010). I.e. it is not expected that when one person opens a file another person created that it has to be opened on the exact same hardware. On the other hand it is often expected that the user should use the same software. Software is often recommended or required by organisations when users are given files to use from them. Websites used to (and sometimes still do) regularly list the software that should be used to interact with them.
I could speculate as to why hardware is deemed less important than software by actual users but (to an extent) it doesn't matter why, it just matters that that is the way it is.
When hardware vendors develop new PCs they test them to ensure they function as they should. These same tests (which are mostly automated) should be run on emulated hardware and the hardware should pass if it is to be accepted. The first Google hit I found for such a test suite was for EuroSoft.
So in order to prove that their emulation solution is trustworthy the firm providing the service needs to prove the emulated hardware stands up to the same integrity tests any hardware would have had to have passed if it was to be used with Windows Vista & Office 2003 back in 2009.Â
2 notes
¡
View notes
Text
A scenario illustrating the importance of digital object integrity
I've been commenting on some blog posts recently about the importance of being able to be sure that your preservation actions don't change the content/message(s) of a digital object and today I thought of a scenario that seems both very plausible and hopefully effective at illustrating the issue.
In 2009 a mother wrote her last will and testament using Microsoft Word 2003, she never printed it and saved it in the default Word 2003 format. In her will she split her assets evenly between her two sons, sons A and B. In 2010 she had a falling out with son A and updated the document to reflect that by removing her son from receiving a large portion of the inheritance that he otherwise would have received She died 15 years later without ever updating her will. It survived on the home server backups her sons had made for her over the years.Â
Come 2025 the two sons both retrieved the will and prepared to use it to argue their case for their portion of her assets. Son A presents the will in court as rendered using LibreOffice 2023, the only contemporary software that will open it. When opening the file LibreOffice misinterprets the changes that had been saved using the track changes feature of Word 2003 and presents a slightly mangled version that includes portions of the 2009 version in which Son A gets half his mother's assets.Â
When presented with this, Son B knows something is not right. His mother (who he looked after as her Alzheimers set it) often and repeatedly told him that she had disowned Son A and that "he's not getting anything!".Â
However when the object was presented in the court Son B was surprisingly calm. Son B had come prepared! He had also tried accessing the object by rendering the file in LibreOffice 2023, however when he saw the result he immediately set about looking for someone who could help him access the "real" object.Â
After allowing his thoughts about searching to interface with the BrainCloud* he had quickly found a company specializing in old digital objects and digital access. Within seconds his problem had been analysed by them and, after he had provided his authorization thought pattern, the file was opened using a remotely hosted emulated environment that included Office 2003 running on Windows Vista (his mother had bought her laptop and just the wrong time and refused to upgrade it as she "liked the colour" of wndows vista).Â
Back in the court brother B presented his version of the object. The version that used Word 2003 running on Vista as the Interaction environment, with the change tracking presentation layer turned on, clearly showed what he was entitled to and, coupled with other evidence of his late mother's change in relationship to son A, was the deciding factor that won him his full inheritance.Â
 (*TM)Â
2 notes
¡
View notes