Wednesday, July 01, 2009
Jumping the gun on HTML 5
With the release of Firefox 3.5, references to the "HTML 5 standard" are on the rise in tech news. CNET refers to "new standards such as HTML 5." The Christian Science Monitor says "this latest version [of Firefox] adds support for the HTML 5 web standard." BNET Technology has a whole article on HTML 5 without once suggesting that it isn't in final form.
But as the World Wide Web Consortium notes, HTML 5 is still a long way from being settled. The latest working draft states: "Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways." (Emphasis in the original.)
This most likely means that there will be several different "HTML 5"s from different vendors, and that it will be a long time before they all come close to agreement. HTML 5 was intended to close the gap among different implementations by specifying more aspects of the language and basing it on an object model rather than a syntactic one, but we're probably in for more of the same techno-Babel.
Thursday, June 11, 2009
DROID 4.0
The National Archives (UK) has released DROID 4.0, the latest version of its file format identification tool.
The focus of the latest major release is the inclusion of the DCS (Digital Continuity Service) and Planets, Collection Profiler. DROID now runs in two modes. The original file identification mode, and the new “profile” mode that allows users to obtain file format information gathered from large distributed sources of digital files, providing users with aggregated statistical data and reports to help them take appropriate management decisions regarding risk associated with such large collections of files.
Labels: DROID
Saturday, June 06, 2009
JHOVE 1.3 is out
JHOVE 1.3 is now available. This version includes fixes to some serious bugs in the PDF module. It now has a much lower rate of spurious rejections. Read the release notes for full details.
I've already made a few post-release changes to source code, removing a stack dump for debugging purposes which I'd inadvertently left in, and updating version information where I'd forgotten to. Sigh ... I need to put all the configuration information in one place, instead of having it scattered through a dozen source files, before the next release.
Labels: JHOVE, preservation
Friday, June 05, 2009
WARC is ISO standard
The WARC (Web Archive) format is now an ISO standard, available for a mere 118 Swiss francs. Or you can grab a near-final draft (PDF) for free.
Found by way of digitizationblog.
Labels: WARC
Thursday, May 21, 2009
W3C rescinds four drafts
This is unusual; W3C has rescinded four XHTML-related drafts. This means that they roll back to the previous versions. The rescinded drafts are:
- XHTML™ 1.1 - Module-based XHTML - Second Edition
- XHTML™ Basic 1.1 - Second Edition
- XHTML-Print - Second Edition
- XHTML™ 1.0
Tuesday, May 19, 2009
Survey results on JPEG2000
The results of a University of Connecticut study on how libraries are using JPEG2000 format are available online. Thanks to Digitization 101 for the link.
Labels: JPEG2000
Friday, May 08, 2009
Spoofing characters
You wouldn't think there's a security issue in the UTF-8 character set, but there is, in an indirect way. In a note on a Sun Java blog, it's explained that in the old (20th century) definition of UTF-8, some characters could be represented by more than one byte sequence. For example, any ASCII character (0X01 through 0X7F) could be preceded by 0XC0 without changing its interpretation. This could create problems if security filters are looking for certain characters in order to stop cross-site scripting or SQL injection; the spoofed characters could get past such filters if they don't take the alternate byte representations into account. For this reason, the current UTF-8 requirements specify that the shortest byte representation of a character is the only legitimate one.
This change was made in 2000, but not all implementations of UTF-8 have caught up. Sun has only recently fixed this in Java, with JDK7, Open JDK 6, JDK 6 update 11 and later, JDK5.0u17, and Java 1.4.2_19. (If you're using Java 1.3 or earlier, you're probably stuck, but why would you do that?)
