What Aaron understood / Alf Eaton, Alf

I didn’t know Aaron, personally, but I’d been reading his blog as he wrote it for 10 years. When it turned out that he wasn’t going to be writing any more, I spent some time trying to work out why. I didn’t find out why the writing had stopped, exactly, but I did get some insight into why it might have started.

Philip Greenspun, founder of ArsDigita, had written extensively about the school system, and Aaron felt similarly, documenting his frustrations with school, leaving formal education and teaching himself.

In 2000, Aaron entered the competition for the ArsDigita Prize and won, with his entry The Info Network — a public-editable database of information about topics. (Jimmy Wales & Larry Sanger were building Nupedia at around the same time, which became Wikipedia. Later, Aaron lost an election bid to be on the Wikimedia Foundation’s Board of Directors).

Aaron’s friends and family added information on their specialist subjects to the wiki, but Aaron knew that a centralised resource could lead to censorship (he created zpedia, for alternative views that would not survive on Wikipedia). Also, some people might add high-quality information, but others might not know what they’re talking about. If everyone had their own wiki, and you could choose which trusted sources to subscribe to, you’d be able to collect just the information that you trusted, augment it yourself, and then broadcast it back out to others.

In order to pull information in from other people’s databases, you needed a standard way of subscribing to a source, and a standard way of representing information.

RSS feeds (with Aaron’s help) became a standard for subscribing to information, and RDF (with Aaron’s help) became a standard for describing objects.

I find — and have noticed others saying the same — that to thoroughly understand a topic requires access to the whole range of items that can be part of that topic — to see their commonalities, variances and range. To teach yourself about a topic, you need to be a collector, which means you need access to the objects.

Aaron created Open Library: a single page for every book. It could contain metadata for each item (allowable up to a point - Aaron was good at pushing the limits of what information was actually copyrightable), but some books remained in copyright. This was frustrating, so Aaron tried to reform copyright law.

He found that it was difficult to make political change when politicians were highly funded by interested parties, so he tried to do something about that. He also saw that this would require politicians being open about their dealings (but became sceptical about the possibility of making everything open by choice; he did, however, create a secure drop-box for people to send information anonymously to reporters).

To return to information, though: having a single page for every resource allows you to make statements about those resources, referring to each resource by its URL.

Aaron had read Tim Berners-Lee’s Weaving The Web, and said that Tim was the only other person who understood that, by themselves, the nodes and edges of a “semantic web” had no meaning. Each resource and property was only defined in terms of other nodes and properties, like a dictionary defines words in terms of other words. In other words, it’s ontologies all the way down.

To be able to understand this information, a reader would need to know which information was correct and reliable (using a trust network?).

He wanted people to be able to understand scientific research, and to base their decisions on reliable information, so he founded Science That Matters to report on scientific findings. (After this launched, I suggested that Aaron should be invited to SciFoo, which he attended; he ran a session on open access to scientific literature).

He had the same motivations as many LessWrong participants: a) trying to do as little harm as possible, and b) ensuring that information is available, correct, and in the right hands, for the sake of a “good AI”.

As Alan Turing said (even though Aaron spotted that the “Turing test” is a red herring), machines can think, and machines will think based on the information they’re given. If an AI is given misleading information it could make wrong decisions, and if an AI is not given access to the information it needs it could also make wrong decisions, and either of those could be calamitous.

Aaron chose not to work at Google because he wanted to make sure that reliable information was as available as possible in the places where it was needed, rather than being collected by a single entity, and to ensure that the AI which prevails will be as informed as possible, for everyone’s benefit.

IMLS Webinar: strengthen your executive skills / District Dispatch

How about investing a couple hours to learn how libraries and museums can strengthen executive skills?

Mind in the Making (MITM), a program of the Families and Work Institute (FWI), and partner, the Institute of Museum and Library Services (IMLS), will present a free webinar for museum and library professionals on executive function life skills. The webinar will feature findings from the just-released groundbreaking report, Brain-Building Powerhouses: How Museums and Libraries Can Strengthen Executive Function Life Skills.
The webinar presenters include report contributors Mimi Howard and Andrea Camp, Mind in the Making author and FWI President Ellen Galinsky, and IMLS Supervisory Grants Management Specialist Helen Wechsler. They will discuss new findings from research on brain development, the importance of executive function skills, and how museums and libraries across the country are incorporating this research into their programs and exhibits.

Some of the outstanding initiatives in museums and libraries featured in the report will be presented in the webinar by the following:

• Laurie Kleinbaum Fink, Science Museum of Minnesota
• Stephanie Terry, Children’s Museum of Evansville
• Kerry Salazar, Portland Children’s Museum
• Kimberlee Kiehl, Smithsonian Early Enrichment Center
• Holly Henley, Arizona State Library
• Anne Kilkenny, Providence Public Library
• Kathy Shahbodaghi, Columbus Metropolitan Library

Executive function skills are built on the brain processes we use to pay attention and exercise self control, to hold information in our minds so that we can use it, and to think flexibly. These skills become foundational for other skills, including delaying gratification, understanding the perspectives of others, reflection, innovation, critical thinking, problem solving, and taking on challenges.

Webinar: Brain-Building Powerhouses: How Museums and Libraries Can Strengthen Executive Function Life Skills
Date: Tuesday, September 22, 2015
Time: 2:00 PM EDT

Link: Join the webinar with this link to the Blackboard Collaborate system.
Ph code: 1-866-299-7945, Enter guest code 5680404#

Note: IMLS-hosted webinars use the Blackboard Collaborate system. If you are a first-time user of Blackboard Collaborate, click here to check your system compatibility in advance of the webinar. You will be able to confirm that your operating system and Java are up-to-date, and enter a Configuration Room that will allow you to configure your connection speed and audio settings before the IMLS webinar begins. (If you choose to enter a Configuration Room, please note that the IMLS webinar will use Blackboard version 12.6.)

# # #

The post IMLS Webinar: strengthen your executive skills appeared first on District Dispatch.

How CareerBuilder Executes Semantic and Multilingual Strategies with Apache Lucene/Solr / SearchHub

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Trey Grainger’s session on multilingual search at CareerBuilder. When searching on text, choosing the right CharFilters, Tokenizer, stemmers and other TokenFilters for each supported language is critical. Additional tools of the trade include language detection through UpdateRequestProcessors, parts of speech analysis, entity extraction, stopword and synonym lists, relevancy differentiation for exact vs. stemmed vs. conceptual matches, and identification of statistically interesting phrases per language. For multilingual search, you also need to choose between several strategies such as 1) searching across multiple fields, 2) using a separate collection per language combination, or 3) combining multiple languages in a single field (custom code is required for this and will be open sourced) each with their own strengths and weaknesses depending upon your use case. This talk will provide a tutorial (with code examples) on how to pull off each of these strategies. We will also compare and contrast the different kinds of stemmers, discuss the precision/recall impact of stemming vs. lemmatization, and describe some techniques for extracting meaningful relationships between terms to power a semantic search experience per-language. Come learn how to build an excellent semantic and multilingual search system using the best tools and techniques Lucene/Solr has to offer! Trey Grainger is the Director of Engineering for Search & Analytics at CareerBuilder.com and is the co-author of Solr in Action (2014, Manning Publications), the comprehensive example-driven guide to Apache Solr. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, semantic search, big data analytics, customized Lucene/Solr scoring models, data mining, and recommendation systems. Trey is also the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.

Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger, CareerBuilder from Lucidworks

Join us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post How CareerBuilder Executes Semantic and Multilingual Strategies with Apache Lucene/Solr appeared first on Lucidworks.

New Contract Opportunity: Request for Proposals for RightsStatements.org / DPLA

The Digital Public Library of America (http://dp.la) and Europeana (http://europeana.eu) invites interested and qualified individuals or firms to submit a proposal for development related to the infrastructure for the International Rights Statements Working Group.

A PDF version of this request for proposals is also available at: http://dp.la/info/wp-content/uploads/2015/09/rs-rfp.pdf
A PDF version of the Requirements for the Technical Infrastructure for Standardized International Rights Statements is available at: http://dp.la/info/wp-content/uploads/2015/09/irswg-tech-white-paper-rfp.pdf

Timeline

RFP issued: 18 September 2015
Deadline for proposals: 00:00 GMT, 6 October 2015
Work is to be performed no sooner than 8 October 2015.
Functional prototypes for components A and C must be completed by 24 December 2015.
Work for components A, B, and C below must be completed by 15 January 2016.

Overview

This document specifies the project scope and requirements for technical infrastructure supporting a framework and vocabulary of machine-readable rights statements under development by the International Rights Statements Working Group, a joint Digital Public Library of America (DPLA)–Europeana Foundation working group.

The working group shall provide and maintain RDF descriptions of the rights statements, with canonical serializations in Turtle, modeled as a vocabulary in the Simple Knowledge Organization System (SKOS). These descriptions will include multiple official translations of each statement, and support versioning of the statements and/or vocabulary scheme. Alongside the descriptions statements, the working group will produce a summary of the data model and properties used.

The contractor will provide an implementation that acts as a platform for hosting these statements. The platform consists of an application for publishing the rights statements according to linked data best practices and a content management system to be used by the working group to publish materials related to the project. These two components should provide the feel of an integrated user experience, and must be served publicly from a single domain (http://rightsstatements.org/). As part of this contract, the contractor will also provide one year of maintenance, support, and security updates for these components, their dependencies, and the operating systems for servers on which they are deployed.

Components

Component A. Rights statements application

A web application that provides both machine-readable representations of the rights statements (in RDF serializations including JSON-LD and Turtle) and human-readable representations. The working group will provide a canonical version of the rights statements in Turtle-serialized RDF as needed for launch, as well as a testing version used to implement and test specific features, including, but not limited to, versions (see 3a) and translations (see 4b and 4c).

Human readable representations
1. The application shall provide a human-readable web page representing each rights statement, with provision for versions, multiple language support, and additional request parameters as described in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
2. All human-readable representations shall be generated from the canonical Turtle-serialized RDF.
3. Human-readable representations must be available as HTML5 with RDFa 1.1 or RDFa 1.1 Lite.
4. Human-readable representations must provide links to the RDF representations listed below.
RDF representations
1. The application shall provide multiple RDF serializations of the individual rights statements through content negotiation on the statement URI. Minimally, it must support Turtle and JSON-LD. Additional serializations are desirable but not required.
2. The application shall provide multiple RDF serializations of the entire vocabulary through content negotiation on the vocabulary version URI. The vocabulary shall support the same RDF serializations as the individual statements.
3. All RDF serializations must be equivalent to the canonical Turtle-serialized RDF.
Versions
1. The application shall support multiple versions of each statement. The structure of the versions is described in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
2. Otherwise valid statement URIs that omit the version number should respond with code 404.
Languages and translation
1. Human-readable representations should dynamically handle requests for translations of the statements through HTTP Accept-Language headers and through the use of parameters as specified in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
2. The working group will provide text in one or more languages for each statement as RDF language-tagged literals in compliance with IETF BCP47. All language-tagged literals will be encoded as UTF-8.
3. The working group will provide translations for content not derived from the statement RDF, e.g., navigational elements. The application will support this through an internationalization framework, e.g., GNU gettext.
Additional request parameters
1. For specific statements, human-readable representations must accept query string parameters and generate a view of the statement enhanced by additional metadata described in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
Resource URIs and HTTP request patterns
1. The HTTP behavior of the application shall follow the URI structure and interaction patterns described in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
2. Resources must follow best practices for serving both human- and machine-readable representations for linked data vocabularies.
Visual identity
1. The working group will provide static HTML templates developed by another vendor charged with implementing the site’s visual identity.
2. These templates must be transformed to work in the context of the application to ensure that human-readable representations follow the visual identity of the site as provided by the working group.

Component B. Content management system

An implementation of an off-the-shelf, free/libre/open source content management system (CMS), and associated plugins to publish pages about the project and initiative, related publications, etc.

The CMS will be separate from the rights statements application.
The CMS may be a static site generator.
The CMS should support multilingual versions of content, either natively or through the use of plugin modules.
A theme or templates for the CMS must be provided, which follow the visual identity defined for the site.
The CMS must provide export of static content (text and multimedia).
All content will be edited and maintained by members of the working group.

Component C. Server configuration, deployment, and maintenance implementation

An implementation of an existing free/libre/open source configuration management and deployment automation system, and any needed templates, scripts, etc., used to install dependencies, to configure and deploy components A and B above, and to manage the servers.

The implementation must be published to a version control repository under the working group’s organization on GitHub.
The implementation should support a shared set of configuration with templating to allow the components above to be deployed to a staging virtual machine and a production virtual machine using a common set of procedures.
An implementation of an agentless configuration and deployment management system (e.g., Ansible) is strongly preferred.
The implementation must include a configuration for an HTTP proxy server (e.g., Nginx, Apache HTTPD, etc.) that will allow components A and B to be presented together through a single domain name.
1. The proxy server configuration must allow components A and B to be served from a common domain name (http://rightsstatements.org/).
2. The proxy server configuration should provide caching for requests that respects the HTTP interaction patterns described in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
The vendor will also develop, execute, and provide reports for a load testing strategy for the implemented configuration.

Other restrictions

All components must run within a shared Linux virtual machine, preferably running Debian stable. The virtual machine will be hosted on a server physically located in a Luxembourg-based data center. The working group is providing both a staging environment and a production environment.

All materials developed during this project shall be released under open source/open content licensing. Source code will be licensed under the European Union Public License, version 1.1. Documentation will be licensed under a CC0 Public Domain Dedication.

Guidelines for proposals

All proposals must adhere to the following submission guidelines and requirements.

Proposals are due no later than 00:00 GMT, 6 October 2015.
Proposals should be sent via email to rights-rfp@dp.la as a single PDF file attached to the message. Questions about the proposal can also be sent to this address.
Please format the subject line with the phrase “RightsStatements.org Proposal – [Name of respondent].”
You should receive confirmation of receipt of your proposal no later than 00:00 GMT, 8 October 2015. If you have not received confirmation of your proposal by this time, please send an email to mark@dp.la, otherwise follow the same guidelines as above.

All proposals should include the following:

Pricing, in US Dollars and/or Euros, as costs for each work component identified above, and as an hourly rate for any maintenance costs. The exchange rate will be set in the contract. The currency for payment will be chosen by the agent of the working group that is the party to this contract.
Proposed staffing plan, including qualifications of project team members (resumes/CVs and links or descriptions of previous projects such as open source contributions).
References, listing all clients/organizations with whom the proposer has done business like that required by this solicitation with the last three years.
Qualifications and experience, including
- General qualifications and development expertise
  - Information about development and project management skills and philosophy
  - Examples of successful projects, delivered on time and on budget
  - Preferred tools and methodologies used for issue tracking, project management, and communication
  - Preferences for change control tools and methodologies
- Project specific strategies
  - History of developing software in the library, archives, or museum domain
  - Information about experience with hosting and maintenance of RDF/SKOS vocabularies and linked data resources
Legal authority/capacity, or proof that the vendor is authorized to perform the contract under national law. Proof of the above is to be provided by (a copy of) a certificate of registration in the relevant trade or professional registers in the country of establishment/incorporation.

Contract guidelines

Proposals must be submitted by the due date.
Proposers are asked to guarantee their proposal prices for a period of at least 60 days from the date of the submission of the proposal.
Proposers must be fully responsible for the acts and omissions of their employees and agents.
The working group reserves the right to extend the deadline for proposals.
The working group reserves the right to include a mandatory meeting via teleconference with proposers individually before acceptance. Top scored proposals may be required to participate in an interview to support and clarify their proposal.
The working group reserves the right to negotiate with each contractor.
There is no allowance for project expenses, travel, or ancillary expenses that the contractor may incur.
Ownership of any intellectual property will be shared between the Digital Public Library of America and the Europeana Foundation.

Putting Pen to Paper / LITA

Back in January, The Atlantic ran an article on a new device being used at the Cooper Hewitt design museum in New York City. This device allows museum visitors to become curators of their own collections, saving information about exhibits to their own special account they can access via computer after they leave. This device is called a pen; Robinson Meyer, the article’s author, likens it to a “gray plastic crayon the size of a turkey baster”. I think it’s more like a magic wand.

description of how the Cooper Hewitt pen can interact with museum exhibits

Not only can you use the pen to save information you think is cool, you can also interact with the museum at large: in the Immersion Room, for example, you can draw a design with your pen and watch it spring to life on the walls around you. In the Process Lab, you use the pen to solve real-life design problems. As Meyer puts it, “The pen does something that countless companies, organizations, archives, and libraries are trying to do: It bridges the digital and the physical.”

The mention of libraries struck me: how could something like the Cooper Hewitt pen be used in your average public library?

The first thing that came to my mind was RFID. In my library, we use RFID to tag and label our materials. There are currently RFID “wands” that, when waved over stacks, can help staff locate books they thought were missing.

But let’s turn that around: give the patron the wand – rather, the pen – and program in a subject they’re looking for…say, do-it-yourself dog grooming. As the patron wanders, the pen is talking with the stacks via RFID asking where those materials would be. Soon the pen vibrates and a small LED light shines on the materials. Eureka!

Or, just as the Cooper Hewitt allows visitors to build their own virtual collection online, we can have patrons build their own virtual libraries. Using the same RFID scanning technology as before, patrons can link items to their library card number that they’ve already borrowed or maybe want to view in the future. It could be a system similar to Goodreads (or maybe even link it to Goodreads itself) or it could be a personal website that only the user – not the library – has access to.

What are some ways you might be able to use this tech in your library system?

jobs.code4lib.org studied / Code4Lib

Creating Tomorrow’s Technologists: Contrasting Information Technology Curriculum in North American Library and Information Science Graduate Programs against Code4lib Job Listings by Monica Maceli recently appeared in the Journal of Education for Library and Information Science 56.3 (DOI:10.12783/issn.2328-2967/56/3/3). As the title states, it studies listings on jobs.code4lib.org:

This research study explores technology-related course offerings in ALA-accredited library and information science (LIS) graduate programs in North America. These data are juxtaposed against a text analysis of several thousand LIS-specific technology job listings from the Code4lib jobs website. Starting in 2003, as a popular library technology mailing list, Code4lib has since expanded to an annual conference in the United States and a job-posting website. The study found that database and web design/development topics continued to dominate course offerings with diverse sub-topics covered. Strong growth was noted in the area of user experience but a lack of related jobs for librarians was identified. Analysis of the job listings revealed common technology-centric librarian and non-librarian job titles, as well as frequently correlated requirements for technology skillsets relating to the popular foci of web design/development and metadata. Finally, this study presents a series of suggestions for LIS educators in order that they continue to keep curriculum aligned with current technology employment requirements.

Open Knowledge Founder Rufus Pollock, Ashoka Fellow / Open Knowledge Foundation

Open Knowledge Founder Rufus Pollock was recently recognized as Ashoka UK’s fellow of the month. This brief video highlights his thoughts on open knowledge, and his vision for an information age grounded in openness, collaboration, sharing, and distributed power.

Video produced and provided by Ashoka UK Ashoka builds networks of social innovators and selects high-impact entrepreneurs, who creatively solve some of the world’s biggest social challenges, to become Ashoka Fellows. Their work also extends to the education sector where we are creating a network of schools that are teaching students skills for the modern world, empathy, teamwork, creativity. Read more from Ashoka UK here.

Enhancing the LOCKSS Technology / David Rosenthal

A paper entitled Enhancing the LOCKSS Digital Preservation Technology describing work we did with funding from the Mellon Foundation has appeared in the September/October issue of D-Lib Magazine. The abstract is:

The LOCKSS Program develops and supports libraries using open source peer-to-peer digital preservation software. Although initial development and deployment was funded by grants including from NSF and the Mellon Foundation, grant funding is not a sustainable basis for long-term preservation. The LOCKSS Program runs the "Red Hat" model of free, open source software and paid support. From 2007 through 2012 the program was in the black with no grant funds at all.

The demands of the "Red Hat" model make it hard to devote development resources to enhancements that don't address immediate user demands but are targeted at longer-term issues. After discussing this issue with the Mellon Foundation, the LOCKSS Program was awarded a grant to cover a specific set of infrastructure enhancements. It made significant functional and performance improvements to the LOCKSS software in the areas of ingest, preservation and dissemination. The LOCKSS Program's experience shows that the "Red Hat" model is a viable basis for long-term digital preservation, but that it may need to be supplemented by occasional small grants targeted at longer-term issues.

Among the enhancements described in the paper are implementations of Memento (RFC7089) and Shibboleth, support for crawling sites that use AJAX, and some significant enhancements to the LOCKSS peer-to-peer polling protocol.

SIGN UP for the Hands-on Fedora 4 Workshop at DLF Fall Forum / DuraSpace News

Winchester, MA The Digital Library Federation Fall Forum (http://dlfforum2015.sched.org, #DLFforum) will be held in Vancouver, British Columbia, Canada Oct 26-28, 2015. Fedora product manager David Wilcox will offer a hands-on Fedora 4 Workshop on Tuesday, Oct 27 from 9:00am - 12:00pm:

http://dlfforum2015.sched.org/event/19e9d88b84add4d6aabda01203373f77#.VfgXf7TtmyM.

Telling DSpace Stories at Creighton University with Richard Jizba / DuraSpace News

“Telling DSpace Stories” is a community-led initiative aimed at introducing project leaders and their ideas to one another while providing details about DSpace implementations for the community and beyond. The following interview includes personal observations that may not represent the opinions and views of Creighton University or the DSpace Project."

Jonathan Markow from DuraSpace interviewed Richard Jizba to learn about Creighton University’s DSpace Repositories.

“What’s your role with DSpace at your institution?”

He knew the past, which is sometimes much worse / William Denton

I’m rereading the Three Musketeers saga by Alexandre Dumas—one of the greatest of all works of literature—and just re-met a quote I wrote down when I first read it. It’s from chapter twenty-seven of Twenty Years After (Vingt ans après), and is about Urbain Grandier, but I’ll leave out his name to strengthen it:

“[He] was not a sorceror, but a learned man, which is quite another thing. He did not foretell the future. He knew the past, which is sometimes much worse.”

In the original, it reads:

“[Il] n'était pas un sorcier, c'était un savant, ce qui est tout autre chose. [Il] ne prédisait pas l'avenir. Il savait le passé, ce qui quelquefois est bien pis.”

Evergreen 2.9.0 released / Evergreen ILS

Thanks to the efforts of many contributors, the Evergreen community is pleased to announce the release of version 2.9.0 of the Evergreen open source integrated library system. Please visit the download page to get it!

New features and enhancements of note in Evergreen 2.9.0 include:

Evergreen now supports placing blanket orders, allowing staff to invoice an encumbered amount multiple times, paying off the charge over a period of time.
There is now better reporting of progress when a purchase order is activated.
The Acquisitions Administration menu in the staff client is now directly accessible from the main “Admin” menu.
There is now an action/trigger event definition for sending alerts to users before their accounts are scheduled to expire.
When registering a new user, the duplicate record search now includes inactive users.
Evergreen now offers more options for controlling whether and when users can carry negative balances on their account.
The web-based self-check interface now warns the user if their session is about to expire.
The “Manage Authorities” results list now displays the thesaurus associated with each authority record.
Item statistical categories can now be set during record import.
The web staff interface preview now includes cataloging functionality, including a new MARC editor, Z39.50 record import, and a new volume/copy editor.
The account expiration date is now displayed on the user’s “My Account” page in the public catalog.
Users can now sort their lists of items checked out, check out history, and holds when logged into the public catalog.
The bibliographic record source is now available for use by public catalog templates.
The public catalog can now cache Template Toolkit templates, improving its speed.
On the catalog’s record summary page, there is now a link to allow staff to to forcibly clear the cache of added content for that record.
Google Analytics (if enabled at all) is now disabled in the staff client.
Several deprecated parts of the code have been removed, including script-based circulation policies, the open-ils.penalty service, the legacy self-check interface, and the old “JSPAC” public catalog interface.

For more information about what’s in the release, check out the release notes.

Enjoy!

Pew study affirms vital role of libraries / District Dispatch

Libraries are transforming amidst the changing information landscape and a report released this week by the Pew Research Center, Libraries at the Crossroads, affirms the evolving role of public libraries within their communities as vital resources that advance education and digital empowerment.

“Public libraries are transforming beyond their traditional roles and providing more opportunities for community engagement and new services that connect closely with patrons’ needs,” ALA President Sari Feldman said. “The Pew Research Center report shows that public libraries are far from being just ‘nice to have,’ but serve as a lifeline for their users, with more than 65 percent of those surveyed indicating that closing their local public library would have a major impact on their community.

Francis W.Students gathered around computer. Parker School (Chicago, IL)

Photo credit: Francis W. Parker School (Chicago, IL).

“Libraries are not just about what we have for people, but what we do for and with people,” Feldman said. “Today’s survey found that three-quarters of the public say libraries have been effective at helping people learn how to use new technologies. This is buttressed by the ALA’s Digital Inclusion Survey, which finds that virtually all libraries provide free public access to computers and the Internet, Wi-Fi, technology training and robust digital content that supports education, employment, e-government access and more.

“Although the report affirms the value of public libraries, the ALA recognizes the need for greater public awareness of the transformation of library services, as the report shows library visits over the past three years have slightly decreased. In response, libraries of all types are preparing for the launch of a national public awareness campaign entitled ‘Libraries Transform.’

“Libraries from across the county will participate in the campaign and will work to change the perception that ‘libraries are just quiet places to do research, find a book, and read’ to ‘libraries are centers of their communities: places to learn, create and share, with the help of library staff and the resources they provide,” she noted.

The report also reveals that 75 percent of the public say libraries have been effective at helping people learn how to use new technologies. This is buttressed by the ALA’s Digital Inclusion Survey, which finds that virtually all libraries provide free public access to computers and the Internet, Wi-Fi, technology training and robust digital content that supports education, employment, e-government access and more.
With their accessibility to the public in virtually every community around the country, libraries offer online educational tools for students, employment resources for job-seekers, computer access for those without it and innovation centers for entrepreneurs of all ages.

Other interesting findings in the report that point to the vital role of libraries in communities nationwide include:

o 65 percent maintain that libraries contribute to helping people decide what information they can trust.

o 75 percent say libraries have been effective at helping people learn how to use new technologies.

o 78 percent believe that libraries are effective at promoting literacy and love of reading.

The post Pew study affirms vital role of libraries appeared first on District Dispatch.

Prediction: "Security will be an on-going challenge" / David Rosenthal

The Library of Congress' Storage Architectures workshop ~~asked~~ gave a group of us each 3 minutes to respond to a set of predictions for 2015 and questions accumulated at previous instances of this fascinating workshop. Below the fold, the brief talk in which I addressed one of the predictions. At the last minute, we were given 2 minutes more, so I made one of my own.

One of the 2012 Predictions was "Security will be an on-going challenge".

It might seem that this prediction was about as risky as predicting "in three years time, water will still be wet". But I want to argue that "an on-going challenge" is not an adequate description of the problems we now understand that security poses for digital preservation. The 2012 meeting was about 9 months before Edward Snowden opened our eyes to how vulnerable everything connected to the Internet was to surveillance and subversion.

Events since have greatly reinforced this message. The US Office of Personnel Management is incapable of keeping the personal information of people with security clearances secure from leakage or tampering. Sony Pictures and Ashley Madison could not keep their most embarrassing secrets from leaking. Cisco and even computer security heavyweight Kaspersky could not keep the bad guys out of their networks. Just over two years before the meeting, Stuxnet showed that even systems air-gapped from the Internet were vulnerable. Much more sophisticated attacks have been discovered since, including malware hiding inside disk drive controllers.

Dan Kaminsky was interviewed in the wake of the compromise at the Bundestag:

No one should be surprised if a cyber attack succeeds somewhere. Everything can be hacked. ... All great technological developments have been unsafe in the beginning, just think of rail, automobiles and aircraft. The most important thing in the beginning is that they work, after that they get safer. We have been working on the security of the Internet and the computer systems for the last 15 years.

Yes, automobiles and aircraft are safer but they are not safe. Cars kill 1.3M and injure 20-50M people/year, being the 9th leading cause of death. And that is before their software starts being exploited.

For a less optimistic view, read A View From The Front Lines, the 2015 report from Mandiant, a company whose job is to clean up after compromises such as the 2013 one that meant Stanford had to build a new network from scratch and abandon the old one. The sub-head of Mandiant's report is:

For years, we have argued that there is no such thing as perfect security. The events of 2014 should put any lingering doubts to rest.

The technology for making systems secure does not exist. Even if it did it would not be feasible for organizations to deploy only secure systems. Given that the system vendors bear no liability for the security of even systems intended to create security, this situation is unlikely to change in the foreseeable future. Until it is at least possible for organizations to deploy a software and hardware stack that is secure from the BIOS to the user interface, and until there is liability on the organization for not doing so, we have to assume the our systems will be compromised, the only questions being when, and how badly.

Our digital preservation systems are very vulnerable, but we don't hear reports of them being compromised. There are two possibilities. Either they have been and we haven't noticed, or it hasn't yet been worth anyone's time to do it.

In this environment the key to avoiding loss of digital assets is diversity, so that a single attack can't take out all replicas. Copies must exist in diverse media, in diverse hardware running diverse software under diverse administration. But this diversity is very expensive. Research has shown that the resources we have to work with suffice to preserve less than half the material that should be preserved. Making the stuff we have preserved safer means preserving less stuff.

To be fair, I should make a prediction of my own. If we're currently preserving less than half of the material we should, how much will we be preserving in 2020? Two observations drive my answer. The digital objects being created now are much harder and more expensive to preserve than those created in the past. Libraries and archives are increasingly suffering budget cuts. So my prediction is:

If the experiments to measure the proportion of material being preserved are repeated in 2020, the results will not be less than a half, but less than a third.

Lucidworks Fusion 2.1 Now Available! / SearchHub

Today we’re releasing Fusion 2.1 LTS, our most recent version of Fusion offering Long-Term Support (LTS). Last month, we released version 2.0 which brought a slew of new features as well as a new user experience. With Fusion 2.1 LTS, we have polished these features, and tweaked the visual appearance and the interactions. With the refinements now in place, we’ll be providing support and maintenance releases on this version for at least the next 18 months.

If you’ve already tried out Fusion 2.0, Fusion 2.1 won’t be revolutionary, but you’ll find that it works a little more smoothly and gracefully. Besides the improvements to the UI, we’ve made a few back end changes:

Aggregation jobs now run only using Spark. In previous versions, you could run them in Spark optionally, or natively in Fusion. We’ve found we’re happy enough with Spark to make it the only option now.
You can now send alerts to PagerDuty. Previously, you could send and email or a Slack message. PagerDuty was a fairly popular request.
Several new options for crawling websites
Improvements to SSL when communicating between Fusion nodes
A reorganization of the Fusion directory structure to better isolate your site-specific data and config from version-specific Fusion binaries, for easier upgrades and maintenance releases
Better logging and debuggability
Incremental enhancements to document parsing
As always, some performance, reliability, and stability improvements

Whether you’re new to Fusion, or have only seen Fusion 1.x, we think there’s a lot you’ll like, so go ahead, download and try it out today!

The post Lucidworks Fusion 2.1 Now Available! appeared first on Lucidworks.

"The Prostate Cancer of Preservation" Re-examined / David Rosenthal

My third post to this blog, more than 8 years ago, was entitled Format Obsolescence: the Prostate Cancer of Preservation. In it I argued that format obsolescence for widely-used formats such as those on the Web, would be rare. If it ever happened, would be a very slow process allowing plenty of time for preservation systems to respond.

Thus devoting a large proportion of the resources available for preservation to obsessively collecting metadata intended to ease eventual format migration was economically unjustifiable, for three reasons. First, the time value of money meant that paying the cost later would allow more content to be preserved. Second, the format might never suffer obsolescence, so the cost of preparing to migrate it would be wasted. Third, if the format ever did suffer obsolescence, the technology available to handle it when obsolescence occurred would be better than when it was ingested.

Below the fold, I ask how well the predictions have held up in the light of subsequent developments?

Research by Matt Holden at INA in 2012 showed that the vast majority of even 15-year old audio-visual content was easily rendered with current tools. The audio-visual formats used in the early days of the Web would be among the most vulnerable to obsolescence. The UK Web Archive's Interject prototype's Web site claims that these formats are obsolete and require migration:

image/x-bitmap and image/x-pixmap, both rendered in my standard Linux environment via Image Viewer.
x-world/x-vrml, versions 1 and 2, not rendered in my standard Linux environment, but migration tools available.
ZX Spectrum software, not suitable for migration.

These examples support the prediction that archives will contain very little content in formats that suffer obsolescence.

Click image to start emulation

The prediction that technology for access to preserved content would improve is borne out by recent developments. Two and a half years ago the team from Freiburg University presented their emulation framework bwFLA which, like those from the Olive Project at CMU and the Internet Archive, is capable of delivering an emulated environment to the reader as a part of a normal Web page. An example of this is Rhizome's art piece from 2000 by Jan Robert Leegte untitled[scrollbars]. To display the artist's original intent, it is necessary to view the piece using a contemporary Internet Explorer, which Rhizome does using bwFLA.

Viewed with Safari on OS X

Increasingly, scrollbars are not permanent but pop up when needed. Viewing the piece with, for example, Safari on OS X is baffling because the scrollbars are not visible.

The prediction that if obsolescence were to happen to a widely used format it would happen very slowly is currently being validated, but not for the expected reason and not as a demonstration of the necessity of format migration. Adobe's Flash has been a very widely used Web format. It is not obsolete in the sense that it can no longer be rendered. It is becoming obsolete in the sense that browsers are following Steve Jobs lead and deprecating its use, because it is regarded as too dangerous in today's Internet threat environment:

Five years ago, 28.9% of websites used Flash in some way, according to Matthias Gelbmann, managing director at web technology metrics firm W3Techs. As of August, Flash usage had fallen to 10.3%.

But larger websites have a longer way to go. Flash persists on 15.6% of the top 1,000 sites, Gelbmann says. That’s actually the opposite situation compared to a few years ago, when Flash was used on 22.2% of the largest sites, and 25.6% of sites overall.

If browsers won't support Flash because it poses an unacceptable risk to the underlying system, much of the currently preserved Web will become unusable. It is true that some of that preserved Web is Flash malware, thus simply asking the user to enable Flash in their browser is not a good idea. But if Web archives emulated a browser with Flash, either remotely or locally, the risk would be greatly reduced.

Even if the emulation fell victim to the malware, the underlying system would be at much less risk. If the goal of the malware was to use the compromised system as part of a botnet, the emulation's short life-cycle would render it ineffective. Users would have to be warned against input-ing any sensitive information that the malware might intercept, but it seems unlikely that many users would send passwords or other credentials via a historical emulation. And, because the malware was captured before the emulation was created, the malware authors would be unable to update it to target the emulator itself rather than the system it was emulating.

So, how did my predictions hold up?

It is clear that obsolescence of widely used Web formats is rare. Flash is the only example in two decades, and it isn't obsolete in the sense that advocates of preemptive migration meant.
It is clear that if it occurs, obsolescence of widely used Web formats is a very slow process. For Flash, it has taken half a decade so far, and isn't nearly complete.
The technology for accessing preserved content has improved considerably. I'm not aware of any migration-based solution for safely accessing preserved Flash content. It seems very likely that a hypothetical technique for migrating Flash would migrate the malware as well, vitiating the reason for the migration.

Three out of three, not bad!

Jobs in Information Technology: September 16, 2015 / LITA

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week:

Digital Technologies, Bridgepoint Education – Ashford University, San Diego, CA

E-Learning Librarian, Olin Library – Use Job ID 31577, Washington University, St Louis, MO

Archivist/Program Manager, History Associates Incorporated, Fort Lauderdale, FL

Program Officer for Federal Documents and Collections, HathiTrust, University of Michigan, Ann Arbor, MI

Program Officer for Shared Print Initiatives, HathiTrust, University of Michigan, Ann Arbor, MI

Director of Services and Operations, HathiTrust, University of Michigan, Ann Arbor, MI

Learning Commons Coordinator, Forsyth Library, Fort Hays State University, Hays, KS

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

My Capacity: What Can I Do and What Can I Do Well? / LITA

I like to take on a lot of projects. I love seeing projects come to fruition, and I want to provide the best possible services for my campus community. I think the work we do as librarians is important work. As I’ve taken on more responsibilities in my current job though I’ve learned I can’t do everything. I have had to reevaluate the number of things I can accomplish and projects I can support.

Photo by Darren Tunnicliff. Published under a CC BY-NC-ND 2.0 license.

Libraries come in all different shapes and sizes. I happen to work at a small library. We are a small staff—3 professional librarians including the director, 2 full-time staff, 1 part-time staff member, and around 10 student workers. I think we do amazing things at my place of employment, but I know we can’t do everything. I would love to be able to do some of the projects I see staff at larger universities working on, but I am learning that I have to be strategic about my projects. Time is a limited resource and I need to use my time wisely to support the campus in the best way possible.

This has been especially true for tech projects. The maintenance, updating, and support needed for technology can be a challenge. Now don’t get me wrong, I love tech and my library does great things with technology, but I have also had to be more strategic as I recognize my capacity does have limits. So with new projects I’ve been asking myself:

How does this align with our strategic plan? (I’ve always asked this with new projects, but it is always good to remember)
What are top campus community needs?
What is the estimated time commitment for a specific project?
Can I support this long term?

Some projects are so important that you are going to work on the project no matter what your answers are to these questions. There are also some projects that are not even worth the little bit of capacity they would require. Figuring out where to focus time and what will be the most beneficial for your community is challenging, but worth it.

How do you decide priorities and time commitments?

Cultural Institutions Embrace Crowdsourcing / Library of Congress: The Signal

Photo of Children planting in Thos. Jefferson Park, N.Y.C.

“Children planting in Thos. Jefferson Park, N.Y.C.” Created by Bain News Service, publisher, between ca. 1910 and ca. 1915. Medium: 1 negative : glass ; 5 x 7 in. or smaller. http://www.loc.gov/pictures/resource/ggbain.09228/?loclr=blogsig

Many cultural institutions have accelerated the development of their digital collections and data sets by allowing citizen volunteers to help with the millions of crucial tasks that archivists, scientists, librarians, and curators face. One of the ways institutions are addressing these challenges is through crowdsourcing.

In this post, I’ll look at a few sample crowdsourcing projects from libraries and archives in the U.S. and around the world. This is strictly a general overview. For more detailed information, follow the linked examples or search online for crowdsourcing platforms, tools, or infrastructures.

In general, volunteers help with:

Analyzing images, creating tags and metadata, and subtitling videos
Transcribing documents and correcting OCR text
Identifying geographic locations, aligning/rectifying historical maps with present locations, and adding geospatial coordinates
Classifying data, cross-referencing data, researching historic weather, and monitoring and tracking dynamic activities.

The Library of Congress utilizes public input for its Flickr project. Visitors analyze and comment on the images in the Library’s general Flickr collection of over 20,000 images and the Library’s Flickr “Civil War Faces” collection. “We make catalog corrections and enhancements based on comments that users contribute,” said Phil Michel, digital conversion coordinator at the Library.

In another type of image analysis, Cancer Research UK’s Cellslider project invites volunteers to analyze and categorize cancer cell cores. Volunteers are not required to have a background in biology or medicine for the simple tasks. They are shown what visual elements to look for and instructed on how to categorize into the webpage what they see. Cancer Research UK states on its website that as of the publication of this story, 2,571,751 images have been analyzed.

Three British soldiers in trench under fire during World War I. <a href="http://www.loc.gov/item/96505409/" target="_blank">The Library of Congress.</a>

“Three British soldiers in trench under fire during World War I,” created by Realistic Travels, c1916 Aug. 15. Medium: 1 photographic print on stereo card : stereograph. http://loc.gov/pictures/resource/cph.3b22389/?loclr=blogsig

Both of the examples above use descriptive metadata or tagging, which helps make the images more findable by means of the specific keywords associated with — and mapped to — the images.

The British National Archives runs a project, titled “Operation War Diary,” in which volunteers help tag and categorize diaries of World War I British soldiers. The tags are fixed in a controlled vocabulary list, a menu from which volunteers can select keywords, which helps avoid the typographical variations and errors that may occur when a crowd of individuals freely type their text in.

The New York Public Library’s “Community Oral History Project” makes oral history videos searchable by means of topic markers tagged into the slider bar by volunteers; the tags map to time codes in the video. So, for example, instead of sitting through a one-hour interview to find a specific topic, you can click on the tag — as you would select from a menu — and jump to that tagged topic in the video.

The National Archives and Records Administration offers a range of crowdsourcing projects on its Citizen Archivist Dashboard. Volunteers can tag records and subtitle videos to be used for closed captions; they can even translate and subtitle non-English videos into English subtitles. One NARA project enables volunteers to transcribe handwritten old ship’s logs that, among other things, contain weather information for each daily entry. Such historic weather data is an invaluable addition to the growing body of data in climate-change research.

Transcription is one of the most in-demand crowdsourcing tasks. In the Smithsonian’s Transcription Center, volunteers can select transcription projects from at least ten of the Smithsonian’s 19 museums and archives. The source material consists of handwritten field notes, diaries, botanical specimen sheets, sketches with handwritten notations and more. Transcribers read the handwriting and type into the web page what they think the handwriting says. The Smithsonian staff then runs the data through a quality control process before they finally accept it. In all, the process comprises three steps:

The volunteer types the transcription into the web page
Another set of registered users compares the transcriptions with the handwritten scans
Smithsonian staff or trained volunteers review and have final approval over the transcription.

Notable transcription projects from other institutions are the British Library’s Card Catalogue project, Europeana’s World War I documents, the Massachusetts Historical Society’s “The Diaries of John Quincy Adams,” The University of Delaware’s, “Colored Conventions,” The University of Iowa’s “DIY History,” and the Australian Museum’s Atlas of Living Australia.

Excerpt from the Connecticut war record, May 1864, from OCLC.

Optical Character Recognition is the process of taking text that has been scanned into solid images — sort of a photograph of text –and machine-transforming that text image into text characters and words that can be searched. The process often generates incomplete or mangled text. OCR is often a “best guess” by the software and hardware. Institutions ask for help comparing the source text image with its OCR text-character results and hand-correcting the mistakes.

Newspapers comprise much of the source material. The Library of Virginia, The Cambridge Public Library, and the California Digital Newspaper collection are a sampling of OCR-correction sites. Examples outside of the U.S. include the National Library of Australia and the National Library of Finland.

The New York Public Library was featured in the news a few years ago for the overwhelming number of people who volunteered to help with its “What’s on the Menu” crowdsourcing transcription project, where the NYPL asked volunteers to review a collection of scanned historic menus and type the menu contents into a browser form.

NYPL Labs has gotten even more creative with map-oriented projects. With “Building Inspector” (whose peppy motto is, “Kill time. Make history.”), it reaches out to citizen cartographers to review scans of very old insurance maps and identify each building — lot by lot, block by block — by its construction material, its address and its spatial footprint; in an OCR-like twist, volunteers are also asked to note the name of the then-existent business that is hand written on the old city map (e.g. MacNeil’s Blacksmith, The Derby Emporium). Given the population density of New York, and the propensity of most of its citizens to walk almost everywhere, there’s a potential for millions of eyes to look for this information in their daily environment, and go home and record it in the NYPL databases.

Black-necked stilt. The Library of Congress.

“Black-necked stilt,” photo by Edgar Alexander Mearns, 1887. Medium: 1 photographic print on cabinet card. http://loc.gov/pictures/resource/cph.3c17874/

Volunteers can also user the NYPL Map Warper to rectify the alignment differences between contemporary maps and digitized historic maps. The British Library has a similar map-rectification crowdsourcing project called Georeferencer. Volunteers are asked to rectify maps scanned from 17th-, 18th- and 19th-century European books. In the course of the project, maps get geospatially enabled and become accessible and searchable through Old Maps Online.

Citizen Science projects range from the cellular level to the astronomical level. The Audubon Society’s Christmas Bird Count asks volunteers to go outside and report on what birds they see. The data goes toward tracking the migratory patterns of bird species.

Geo-Wiki is an international platform that crowdsources monitoring of the earth’s environment. Volunteers give feedback about spatial information overlaid on satellite imagery or they can contribute new data.

Gamification makes a game out of potentially tedious tasks. Malariaspot, from the Universidad Politécnica de Madrid, makes a game of identifying the parasites that lead to malaria. Their website states, “The analysis of all the games played will allow us to learn (a) how fast and accurate is the parasite counting of non-expert microscopy players, (b) how to combine the analysis of different players to obtain accurate results as good as the ones provided by expert microscopists.”

Carnegie Melon and Stanford collaboratively developed, EteRNA, a game where users play with puzzles to design RNA sequences that fold up into a target shapes and contribute to a large-scale library of synthetic RNA designs. MIT’s “Eyewire” uses gamification to get players to help map the brain. MIT’s “NanoDoc” enables game players to design new nanoparticle strategies towards the treatment of cancer. The University of Washington’s Center for Game Science offers “Nanocrafter,” a synthetic biology game, which enables players to use pieces of DNA to create new inventions. “Purposeful Gaming,” from the Biodiversity Heritage Library, is a gamified method of cleaning up sloppy OCR. Harvard uses the data from its “Test My Brain” game to test scientific theories about the way the brain works.

Crowdsourcing enables institutions to tap vast resources of volunteer labor, to gather and process information faster than ever, despite the daunting volume of raw data and limitations of in-house resources. Sometimes the volunteers’ work goes directly into a relational database that maps to target digital objects and sometimes the work resides somewhere until a human can review it and accept or reject it. The process requires institutions to trust “outsiders” — average people, citizen archivists, historians, hobbyists. If a project is well structured and the user instructions are clear and simple, there is little reason for institutions to not ask the general public for help. It’s a collaborative partnership that benefits everyone.

Seminar Week 3 / Ed Summers

In this week’s class we discussed three readings: (Bates, 1999), (Dillon & Norris, 2005) and (Manzari, 2013). This was our last general set of readings about information science before diving into some of the specialized areas. I wrote about my reaction to Bates over in Red Thread.

The Manzari piece continues a series of studies that started in 1985, surveying faculty about the most prestigious journals in the field of Library and Information Science. 827 full time faculty in ALA accredited programs were surveyed and only 232 (27%) responded. No attempt seemed to have been made to see if non-response bias could have had an effect – but I’m not entirely sure if that was needed. I couldn’t help but wonder if this series of studies could have resulted in reinforcing the very thing they are studying. If faculty read the article, won’t it influence their thinking about where they should be publishing?

We used this article more as a jumping off point in class to discuss our advisors top-tier and second-tier conferences to present at and journals to publish in. There were quite a few up on the board, even with just four of us in the class. We subdivided them into 4 groups, that were distinguished by the competitiveness and level of peer review associated with them. It was pretty eye opening to hear how strong of a signal these conferences were for everyone, with a great deal of perceived prestige being associated with particular venues. I was surprised at how nuanced perceptions were.

I had asked Steven Jackson for his ideas about conferences since I’m hoping to continue his line of research about repair in my own initial work (more about that in another post shortly). I won’t detail his response here (since I didn’t ask to do that) but the one conference I learned about that I didn’t know about previously was Computer-Supported Cooperative Work and Social Computing which does look it could be an interesting venue to tune into. Another one is Society for Social Studies of Science.

We ended the class with the Marshmallow Challenge. We broke up into groups and then attempted to build the tallest structure we could using spaghetti, tape and string – as long as a marshmallow could be perched on top. The take home from the exercise was the importance of getting feedback from prototypes, and testing ideas in an iterative fashion. This was a bit of a teaser for the next class which is going to be focused on Users and Usability. The resulting structure also reminded me of more than one software projects I’ve contributed to over the years :-)

References

Bates, M. (1999). The invisible substrate of information science. Journal of the Society for Information Science, 50(12), 1043–1050.

Dillon, A., & Norris, A. (2005). Crying wolf: An examination and reconsideration of the perception of crisis in lIS educatino. Journal of Education in Library and Information Science.

Manzari, L. (2013). Library and information science journal prestige as assessed by library and information science faculty. Library Quarterly, 83, 42–60.

ECPA overhaul way “Oprah”-due / District Dispatch

Remember when a new talk show called “Oprah” debuted, a fourth TV network called “Fox” started broadcasting, and every other headline was about the return of Halley’s Comet? No? Don’t feel badly; 1986 was so long ago that nobody else does either. That’s also how long it’s been since Congress passed the Electronic Communications Privacy Act (ECPA): the principal law that controls when the government needs a search warrant to obtain the full content of our electronic communications.

You read right. As previously reported here in District Dispatch, the law protecting all of our email, texts, Facebook pages, Instagram accounts and Dropbox files was written when there was no real Internet, almost nobody used email, and the smallest mobile phone was the size of a brick. Under the circumstances, it’s profoundly disturbing — but hardly surprising — that law enforcement authorities, with few exceptions, don’t need a search warrant issued by a judge to compel any company or other institution that houses your “stuff” online to hand over your stored documents, photos, files and texts once they’re more than six months old. (This ACLU infographic lays it all out very well.)

Happily, legislation to finally update ECPA for the digital age has been building steam in Congress for several years and the legislation pending now in the House (H.R. 699) and Senate (S.356) has extraordinary support (including at this writing nearly a quarter of all Senators and almost 300 of the 435 Members of the House). On September 16, the Senate Judiciary Committee will hold a hearing on ECPA – the first in quite some time. While no vote on an ECPA reform bill may occur for a while yet, if your Senator is a member of the Senate Judiciary Committee, now would be a great time to endorse ECPA reform by sending them one of the timely tweets below:

My email deserves just as much privacy protection as snail mail. Pls reform #ECPA now.
I vote, but #ECPA’s older than I am; let’s overhaul it while it’s still in its 20s too.
If we can’t rewrite #ECPA for the internet age then please correct its name to the Electronic Communication Invasion Act.
If #ECPA were a dog it would be 197 years old now. In an internet age it might as well be. Please overhaul it now.

Maybe, just maybe, 2016 — ECPA’s 30th anniversary — could be the year that this dangerously anachronistic law finally gets the overhaul it’s needed for decades. . . with your help.

The post ECPA overhaul way “Oprah”-due appeared first on District Dispatch.

Tuning Apache Solr for Log Analysis / SearchHub

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Radu Gheorghe’s session on tuning Solr for analyzing logs. Performance tuning is always nice for keeping your applications snappy and your costs down. This is especially the case for logs, social media and other stream-like data, that can easily grow into terabyte territory. While you can always use SolrCloud to scale out of performance issues, this talk is about optimizing. First, we’ll talk about Solr settings by answering the following questions:

How often should you commit and merge?
How can you have one collection per day/month/year/etc?
What are the performance trade-offs for these options?

Then, we’ll turn to hardware. We know SSDs are fast, especially on cold-cache searches, but are they worth the price? We’ll give you some numbers and let you decide what’s best for your use-case. The last part is about optimizing the infrastructure pushing logs to Solr. We’ll talk about tuning Apache Flume for handling large flows of logs and about overall design options that also apply to other shippers, like Logstash. As always, there are trade-offs, and we’ll discuss the pros and cons of each option. Radu is a search consultant at Sematext where he works with clients on Solr and Elasticsearch-based solutions. He is also passionate about the logging ecosystem (yes, that can be a passion!), and feeds this passion by working on Logsene, a log analytics SaaS. Naturally, at conferences such as Berlin Buzzwords, Monitorama, and of course Lucene Revolution, he speaks about indexing logs. Previous presentations were about designing logging infrastructures that provide: functionality (e.g.: parsing logs), performance and scalability. This time, the objective is to take a deeper dive on performance.

Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext from Lucidworks

The post Tuning Apache Solr for Log Analysis appeared first on Lucidworks.

3/4 of class in the bag (Sabbatical Part 7) / Tim Ribaric

Book Pile

Now my course work is just about done....

The Future of Institutional Repositories at Small Academic Institutions: Analysis and Insights / D-Lib

Article by Mary Wu, Roger Williams University

Taking Control: Identifying Motivations for Migrating Library Digital Asset Management Systems / D-Lib

Article by Ayla Stein, University of Illinois at Urbana-Champaign and Santi Thompson, University of Houston Libraries

Enhancing the LOCKSS Digital Preservation Technology / D-Lib

Article by David S. H. Rosenthal, Daniel L. Vargas, Tom A. Lipkis and Claire T. Griffin, LOCKSS Program, Stanford University Libraries

The Value of Flexibility on Long-term Value of Grant Funded Projects / D-Lib

Article by Lesley Parilla and Julia Blase, Smithsonian Institution

The Sixth Annual VIVO Conference / D-Lib

Article by Carol Minton Morris, Duraspace

Enduring Access to Rich Media Content: Understanding Use and Usability Requirements / D-Lib

Article by Madeleine Casad, Oya Y. Rieger and Desiree Alexander, Cornell University Library

Success Criteria for the Development and Sustainable Operation of Virtual Research Environments / D-Lib

Article by Stefan Buddenbohm, Goettingen State and University Library; Heike Neuroth, University of Applied Science Potsdam; Harry Enke and Jochen Klar, Leibniz Institute for Astrophysics Potsdam; Matthias Hofmann, Robotics Research Institute, TU Dortmund University

In Brief: veraPDF Development Roadmap / D-Lib

In Brief: Project EMiL: Emulation-based Access Framework / D-Lib

In Brief: New Mexico's Makerstate Initiative / D-Lib

Year Twenty-One / D-Lib

Editorial by Laurence Lannom, CNRI

In Brief: Use and Connect: Linked Open Data of National Diet Library, Japan / D-Lib

REGISTER for the Midwest Fedora User Group Meeting / DuraSpace News

From Stefano Cossu, Director of Application Services, Collections, The Art Institute of Chicago

Chicago, Ill I am pleased to announce that the first Midwest Fedora User Group Meeting will be held in Chicago on October 22 and 23, 2015. Event details are posted in the Duraspace Wiki page: https://wiki.duraspace.org/x/CgM1B

Registration is still open and presentation proposals are welcome! If you are interested in participating, please register through this form:

Toward the Next DSpace User Interface: The DSpace UI Prototype Challenge / DuraSpace News

Winchester, MA Help us discover the technology/platform for our new user interface (UI) for DSpace! You are invited to create a prototype UI on a platform of your choice (in any programming language), with basic DSpace-like capabilities as described below. The goal of a UI prototype is to exercise a new UI technology/platform to see whether it would meet the needs of our DSpace community.

Important win for fair use and for babies who dance / District Dispatch

From Flickr

In Lenz v. Universal, an appeals court in San Francisco today ruled that a rights holder must consider whether a use is fair before sending a takedown notice. The “Dancing Baby Case,” you may recall, is about a takedown notice a mother received after uploading a video to YouTube showing her baby dancing to Prince’s “Let’s Go Crazy.” The court found that rights holders cannot send takedown notices without first considering whether the use of the copyrighted content is fair. This ruling not only clarifies the steps that rights holders should consider before issuing a takedown notice, it also bolsters the notion that fair use is a right, not just an affirmative defense to infringement.

“Fair use is not just excused by the law, it is wholly authorized by the law . . . The statute explains that the fair use of a copyrighted work is permissible because it is a non-infringing use.”

“Although the traditional approach is to view ‘fair use’ as an affirmative defense . . . it is better viewed as a right granted by the Copyright Act of 1976. Originally, as a judicial doctrine without any statutory basis, fair use was an infringement that was excused–this is presumably why it was treated as a defense. As a statutory doctrine, however, fair use is not an infringement. Thus, since the passage of the 1976 Act, fair use should no longer be considered an infringement to be excused; instead, it is logical to view fair use as a right. Regardless of how fair use is viewed, it is clear that the burden of proving fair use is always on the putative infringer.” Bateman v. Mnemonics, Inc., 79 F.3d 1532, 1542 n.22 (11th Cir. 1996).

The court’s ruling is one that reflects what people understand to be a fair use. The general public thinks that integrating portions of copyrighted content in non-commercial user-created videos is reasonable. Today, there are so many dancing baby videos on YouTube that people are starting to curate them!

I like when the law makes sense to regular people – after all, in today’s digital environment, copyright affects the lives of everyday people, not just the content industry. Many hope that Congress also understands this as it considers copyright review. Congratulations to the Electronic Frontier Foundation for their leadership on this litigation over the past several years.

The post Important win for fair use and for babies who dance appeared first on District Dispatch.

Searching and Querying Knowledge Graphs with Solr/SIREn: a Reference Architecture / SearchHub

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Tummarello Delbru and Giovanni Renaud’s session on querying knowledge graphs with Solr/SIREn: Knowledge Graphs have recently gained press coverage as information giants like Google, Facebook, Yahoo and Microsoft, announced having deployed Knowledge Graphs at the core of their search and data management capabilities. Very richly structured datasets like “Freebase” or “DBPedia” can be said to be examples of these. In this talk we discuss a reference architecture for high performance structured querying and search on knowledge graphs. While graph databases, e.g., Triplestores or Graph Stores, have a role in this scenario, it is via Solr along with its schemaless structured search plugin SIREn that it is possible to deliver fast and accurate entity search with rich structured querying. During the presentation we will discuss an end to end example case study, a tourism social data use case. We will cover extraction, graph databases, SPARQL, JSON-LD and the role of Solr/SIREn both as search and as high speed structured query component. The audience will leave this session with an understanding of the Knowledge Graph idea and how graph databases, SPARQL, JSON-LD and Solr/SIREn can be combined together to implement high performance real world applications on rich and diverse structured datasets. Renaud Delbru, Ph. D., CTO and Founder at SindiceTech, is leading the research and development of the SIREn engine and of all aspects related to large scale data retrieval and analytics. He is author of over a dozen academic works in the area of semi-structured information retrieval and big data RDF processing. Prior to SindiceTech, Renaud completed his Ph.D. on Information Retrieval for Semantic Web data at the Digital enterprise Research Institute, Galway where he worked on the Sindice.com semantic search engine project. Among his achievements, he led the team that won the Entity Search track of the Yahoo’s Semantic Search 2011.

Searching and Querying Knowledge Graphs with Solr/SIREn – A Reference Architecture: Presented by Renaud Delbru & Giovanni Tummarello, SIREn Solutions from Lucidworks

The post Searching and Querying Knowledge Graphs with Solr/SIREn: a Reference Architecture appeared first on Lucidworks.

Extracting information from VIAF / Thom Hickey

Occasionally I run into someone trying to extract information out of VIAF and having a difficult time. Here's a simple example of how I'd begin extracting titles for a given VIAF ID. Far from industrial strength, but might get you started.

The problem: Have a file of VIAF IDs (one/line). Want a file of the titles, each proceeded by the VIAF ID of the record they were found in.

There are lots of ways to do this, but my inclination is to do it in Python (I ran this in version 2.7.1) and to use the raw VIAF XML record:

from __future__ import print_function
import sys, urllib
from xml.etree import cElementTree as ET

# reads in list of VIAF IDs one/line
# writes out VIAFID\tTitle one/line

# worry about the name space
ns = {'v':'http://viaf.org/viaf/terms#'}
ttlPath='v:titles/v:work/v:title'

def titlesFromVIAF(viafXML, path):
    vel = ET.fromstring(viafXML)
    for el in vel.findall(path, ns):
        yield el.text

for line in sys.stdin:
    viafid = line.strip()
    viafURL = 'https://viaf.org/viaf/%s'%viafid
    viafXML = urllib.urlopen(viafURL).read()
    for ttl in titlesFromVIAF(viafXML, ttlPath):
      print('%s\t%s'%(viafid, ttl.encode('utf-8')))

That's about as short as I could get it and have it readable in this narrow window. We've been using the new print function (and division!) for some time now, with an eye towards Python 3.

--Th

Update 2015.09.16: Cleaned up how namespace is specified

Infographic: The Dangers of Bias in High-Stakes Data Science / SearchHub

A data set is only as powerful as the ability of data scientists to interpret it, and insights gleaned can have huge ramifications in business, public policy, health care, and elsewhere. As the stakes of data-driven decisions become increasingly high, let’s look at some of the most common data science fallacies.

The post Infographic: The Dangers of Bias in High-Stakes Data Science appeared first on Lucidworks.

Stump The Chump: Meet The Panel Keeping Austin Weird / SearchHub

As previously mentioned: On October 15th, Lucene/Solr Revolution 2015 will once again be hosting “Stump The Chump” in which I (The Chump) will be answering tough Solr questions — submitted by users like you — live, on stage, sight unseen.

Today, I’m happy to announce the Panel of experts that will be challenging me with those questions, and deciding which questions were able to Stump The Chump!

Our Moderator: Cassandra Targett
Additional Judges:

In addition to taunting me with the questions, and ridiculing all my “Um”s and “Uhh”s as I stall for time while I rack my brain to come up with a non-gibberish answer, the Panel members will be responsible for awarding prizes to the folks who have submitted the question that do the best job of “Stumping” me.

Check out the session information page for details on how to submit questions. Even if you can’t make it to Austin to attend the conference, you can still participate — and do your part to humiliate me — by submitting your questions.

To keep up with all the “Chump” related info, you can subscribe to this blog (or just the “Chump” tag).

The post Stump The Chump: Meet The Panel Keeping Austin Weird appeared first on Lucidworks.

Open Journal Systems 3.0: A User Experience Webinar / FOSS4Lib Upcoming Events

Date:

Tuesday, October 20, 2015 - 10:00 to 11:00

Supports:

Open Journal Systems

Last updated September 14, 2015. Created by Peter Murray on September 14, 2015.
Log in to edit this page.

From the announcement:

Open Journal Systems 3.0: A User Experience Webinar
Think about user experience and libraries? Be sure to register for this UX webinar.

In August, Open Journal System 3.0 was released in beta. The new version has major upgrades, including improvements to usability.
In this webinar, Kevin Stranack of the Public Knowledge Project will provide a case study of integrating UX into a major web design project: OJS.

Islandora Community Sprint 001 - Complete! / Islandora

For the past couple of weeks, the Islandora Community has been working together on a collective experiment in community-driven code: the community sprint. The brainchild of Jared Whiklo and Nick Ruest, a call was put out in mid-August for volunteers to tackle a maintenance sprint, fixing bugs, doing code tasks, and updating documentation. Ten people signed up:

Nick Ruest
Jared Whiklo
Melissa Anez
Diego Pino
Mark Cooper
Brian Harrington
Kim Pham
Peter Murray
Lingling Jiang
Jacob Sanford

And on Monday, August 31st, they got to work. 118 outstanding issues from our JIRA tracker were tagged as possible tasks for the sprint, ranging from complex bug fixes that spanned multiple modules, to 'newbie' tickets updating readme files and user documentation. The sprint to off to a brisk start, knocking off 15 tickets in the first 24 hours. The pace slowed a little as we ran out of low-hanging-fruit and got into the more challenging stuff, but I'll let this JIRA tracker gif speak for itself:

By the end, 38 tickets were down for the count.

Some of the highlights include:

JIRA 1087 - Get a WSOD when using xml form with template and changing tabbed template values. A particularly tricky little error that was difficult to trigger, but pointed to deeper issues. Reported nearly a year ago, and actually effecting the code since 2012, this bug finally fell to the efforts of Diego Pino near the end of the sprint. If you're curious about the details, just check that long and involved comments thread on the ticket - this one was a doozy!

JIRA 1383 and JIRA 1394 weren't particularly complex or difficult, but they share the record for pull requests for a single sprint ticket. Both involved making updates to the readme file of every module in the release, to point to our new static wiki documentation and to the new instructions on contributing to Islandora that will ship with every module. They were also my biggest ticket, and I include them in this report not to brag, but to demonstrate that someone with no programming skills using the GitHub GUI can still be an active part of a sprint. Kim Pham and I were both 'newbie' sprinters and tackled low-level tickets accordingly, but we racked up a decent chunk of the 38 completed tickets, so I hope more 'newbies' will consider joining us on the next sprint.

JIRA 1274 was a Blocker ticket finished up during the sprint by Jared Whiklo. This one brings Ashok Modi's great Form Field Panel into Islandora coding standards so the it can be included in the 7.x-1.6 release.

Our sprinters also did a lot of testing, enabling some fixed from outside the sprint to finally be implemented, such as:

JIRA 1181 - newspaper view is broken when newspaper issues have XACML policies
JIRA 1184 - islandora_batch failing to import MARCXML
JIRA 1292 - Citeproc shouldn't try to render dates it doesn't have
JIRA 1419 - Derivative Generation on Web Archive is Broken After Adding JPG Functionality (a Release Blocker)

Many thanks to the first team of Islandora community sprinters. Your hard work and coordination have proven this approach can have great results, so we will definitely be sprinting again. Keep an eye on the listserv and here on islandora.ca for the next announcement.

Creating High-Quality Online Video Tutorials / LITA

Lately it seems all I do all day is create informational or educational video tutorials on various topics for my library. The Herbert Wertheim College of Medicine Medical Library at Florida International University in Miami, FL has perfected a system. First, a group of three librarians write and edit a script on a topic. In the past we have done multiple videos on American Medical Association (AMA) and American Psychological Association (APA) citation styles, Evidence-Based Medicine research to support a course, and other titles on basic library services and resources. After a script has been finalized, we record the audio. We have one librarian who has become “the voice of the library,” one simple method to brand the library. After that, I go ahead and coordinate the visuals – a mixture of PowerPoint slides, visual effects and screen video shots. We use Camtasia to edit our videos and produce and upload them to our fiumedlib YouTube Channel. Below are some thoughts and notes to consider if starting your own collection of online video tutorials for your organization.

Zoom In
As my past photography teacher declared, rather than zoom in with a telephoto lens walk towards your subject. You as the photographer should reposition yourself to get the best shot. The same holds true for screen shots. If recording from your browser, it is good practice to use your zoom feature when recording your footage to get the sharpest footage. If using Chrome, click on the customize and control (three-bar icon) on the top right of the browser window and you will see the option to zoom in or out. Keep in mind that the look of the video also is dependent on the viewers monitor screen resolution and other factors – so sometimes you have to let it go. The screen shots below show one recorded in 100% and another in 175%. This small change affected the clarity of the footage.

recorded at 100%

recorded at 175%

Write a Script and Record Audio First – (then add Visuals)
Most people multi-task by recording the voice over, their video and audio at the same time. I have found that this creates multiple mistakes and the need to record multiple takes. Preparation steps help projects run smoothly.

Brand Your Library
The team brands our library by having the same beginning title slide with our logo and the ending slide with contact email with the same background music clip. In addition, we try to use a common look and feel throughout the video lesson to further cement that these are from the same library. As mentioned before, we use the same narrator for our videos.

PowerPoint Slides
I cringe at the thought of seeing a PowerPoint slide with a header and a list of bullet points. PowerPoint is not necessarily bad, I just like to promote using the software in a creative manner by approaching each slide as a canvas. I steer clear from templates and following the usual “business” organization of a slide.

Check out the current videos our department has created and let me know if you have any questions at jorperez@fiu.edu

Herbert Wertheim College of Medicine Medical Library You Tube Channel: https://www.youtube.com/channel/UC_DLYn2F2Q4AACsicEh9BSA

Models of our World / Karen Coyle

This is to announce the publication of my book, FRBR, Before and After, by ALA Editions, available in November, 2015. As is often the case, the title doesn't tell the story, so I want to give a bit of an introduction before everyone goes: "Oh, another book on FRBR, yeeech." To be honest, the book does have quite a bit about FRBR, but it's also a think piece about bibliographic models, and a book entitled "Bibliographic Models" would look even more boring than one called "FRBR, Before and After."

The before part is a look at the evolution of the concept of Work, and, yes, Panizzi and Cutter are included, as are Lubetzky, Wilson, and others. Then I look at modeling and how goals and models are connected, and the effect that technology has (and has not) had on library data. The second part of the book focuses on the change that FRBR has wrought both in our thinking and in how we model the bibliographic description. I'll post more about that in the near future, but let me just say that you might be surprised at what you read there.

The text will also be available as open access in early 2016. This is thanks to the generosity of ALA Editions, who agreed to this model. I do hope that enough libraries and individuals do decide to purchase the hard copy that ALA Publishing puts out so that this model of print plus OA is economically viable. I can attest to the fact that the editorial work and application of design to the book has produced a final version that I could not have even approximated on my own

Update on the Library Privacy Pledge / Eric Hellman

The Library Privacy Pledge of 2015, which I wrote about previously, has been finalized. We got a lot of good feedback, and the big changes have focused on the schedule.

Now, any library , organization or company that signs the pledge will have 6 months to implement HTTPS from the effective date of their signature. This should give everyone plenty of margin to do a good job on the implementation.

We pushed back our launch date to the first week of November. That's when we'll announce the list of "charter signatories". If you want your library, company or organization to be included in the charter signatory list, please send an e-mail to pledge@libraryfreedomproject.org.

The Let's Encrypt project will be launching soon. They are just one certificate authority that can help with HTTPS implementation.

I think this is an very important step for the library information community to take, together. Let's make it happen.

Here's the finalized pledge:

The Library Freedom Project is inviting the library community - libraries, vendors that serve libraries, and membership organizations - to sign the "Library Digital Privacy Pledge of 2015". For this first pledge, we're focusing on the use of HTTPS to deliver library services and the information resources offered by libraries. It’s just a first step: HTTPS is a privacy prerequisite, not a privacy solution. Building a culture of library digital privacy will not end with this 2015 pledge, but committing to this first modest step together will begin a process that won't turn back. We aim to gather momentum and raise awareness with this pledge; and will develop similar pledges in the future as appropriate to advance digital privacy practices for library patrons.

We focus on HTTPS as a first step because of its timeliness. The Let's Encrypt initiative of the Electronic Frontier Foundation will soon launch a new certificate infrastructure that will remove much of the cost and technical difficulty involved in the implementation of HTTPS, with general availability scheduled for September. Due to a heightened concern about digital surveillance, many prominent internet companies, such as Google, Twitter, and Facebook, have moved their services exclusively to HTTPS rather than relying on unencrypted HTTP connections. The White House has issued a directive that all government websites must move their services to HTTPS by the end of 2016. We believe that libraries must also make this change, lest they be viewed as technology and privacy laggards, and dishonor their proud history of protecting reader privacy.

The 3rd article of the American Library Association Code of Ethics sets a broad objective:

We protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.
It's not always clear how to interpret this broad mandate, especially when everything is done on the internet. However, one principle of implementation should be clear and uncontroversial:

Library services and resources should be delivered, whenever practical, over channels that are immune to eavesdropping.

The current best practice dictated by this principle is as following:

Libraries and vendors that serve libraries and library patrons, should require HTTPS for all services and resources delivered via the web.

The Pledge for Libraries:

1. We will make every effort to ensure that web services and information resources under direct control of our library will use HTTPS within six months. [ dated______ ]

2. Starting in 2016, our library will assure that any new or renewed contracts for web services or information resources will require support for HTTPS by the end of 2016.

The Pledge for Service Providers (Publishers and Vendors):

1. We will make every effort to ensure that all web services that we (the signatories) offer to libraries will enable HTTPS within six months. [ dated______ ]

2. All web services that we (the signatories) offer to libraries will default to HTTPS by the end of 2016.

The Pledge for Membership Organizations:

1. We will make every effort to ensure that all web services that our organization directly control will use HTTPS within six months. [ dated______ ]

2. We encourage our members to support and sign the appropriate version of the pledge.

There's a FAQ available, too. The pledge is now posted on the Library Freedom Project website. (updated 9/14/2015)

JOIN "VIVO Stories": Introducing People, Projects, Ideas and Innovation / DuraSpace News

The Telling VIVO Stories Task Force is underway! The task force goal is to grow our open source community and actively engage with its members by sharing each others stories. The first three stories are now available to inspire and answer questions about how others have implemented VIVO at their institutions:

Thinks and Burns / William Denton

Yesterday I stumbled on the Thinks and Burns with Fink and Byrne podcast. I have no idea where or when (indeed if) it was announced, but I was happy to find it. It’s John Fink (@adr) and Gillian Byrne (@redgirl13) talking about library matters. I’m acquainted with both of them and we all work near each other, so it’s of more interest to me than if it were two complete strangers rambling on, but if you know either of them, or are a Canadian librarian, or like podcasts where two librarians talk like you’re hanging out down the pub and one keeps hitting the table for emphasis, it’s worth a listen. They’ve done three episodes, and I hope they keep it going, even if irregularly.

Library of Cards / Mita Williams

On Thursday, September 10th, I had the honor and the pleasure to present at Access 2015. I've been to many Access Conferences over the years and each one has been a joyful experience. Thank you so much to those organizing Access YYZ for all of us.

Have you ever listened to the podcast 99% Invisible?

99% Invisible is a weekly podcast dedicated to design and architecture and the 99% percent of the invisible activity that shapes our experiences in the world.

They’ve done episodes on the design of city flags, barbed wire, lawn enforcement, baseball mascot costumes, and it’s already influenced my life in pretty odd ways.

If I was able to pitch a library-related design to host, Roman Mars for an episode, I would suggest the history of the humble 3 x 5 index card.

That being said, for this presentation, I am not going to present to you the history of the humble 3 x 5 index card.

That’s what this book is for, Markus Krajewski’s 2011 Paper Machines published by MIT Press.

Now, before I had read his book, I had believed that the index card was invented by Melvil Dewey and commercialized by his company, The Library Bureau. But Krajewski makes the case that the origin of the index card should be considered to go as far back as 1548 with Konrad Gessner who described a new method of processing data: to cut up a sheet of handwritten notes into slips of paper, with one fact or topic per slip, and arrange as desired.

According to Krajewski, when this technique goes from provisional to permanent – when the slips that describe the contents of a library are fixed in a book, an unintended and yet consequential turn takes place: it gives rise to the first card catalog in library history in Vienna around 1780.

Most histories of the card catalog begin just slightly later in time -- in 1789 to be precise -- during the French Revolution. The situation at hand was that the French revolutionary government had just claimed ownership of all Church property, including its substantial library holdings. It order to better understand what it now owned, the French revolutionaries started to inventory all of these newly acquired books. The instructions for how this inventory would conducted is known as the French Cataloging Code of 1791.

The code instructed that first, all the books were to be numbered. Next, the number of each book as well as the bibliographic information of each work were to be written on the back of two playing cards - and this was possible because at that time the backs of playing cards were blank. The two sets of cards are then put into alphabetical order and fastened together. One set of cards were to be sent to Paris, while a copy remains in each library.

On the screen behind me, you can see two records for the same book.

Again, my talk isn’t about bibliographic history, but I want to return back to the 16th century to Gessner for some important context. The reason why Gessner was making all those slips in the first place was to construct this, the Bibliotheca Universalis which consists of a bibliography of around 3,000 authors in alphabetical order, describing over 10,000 texts in terms of content and form, and offering textual excerpts. As such, Gessner is considered the first modern bibliographer.

And you can find his work on the Internet Archive.

Gessner’s Biblioteca Universalis wasn’t just a bibliography. According to Krajewski, the book provides instructions to scholars how to properly organize their studies through the keeping excerpted material in useful order. Gessner was describing an already established practice. Scholars kept slips or cards in boxes, and when they had the need to write or give a lecture or sermon, they would take the cards that fit their theme, and would arrange those thoughts and would temporarily fix them in order using devices such as the one pictured. This hybrid book has guiding threads that stretch over the page so that two rows of paper slips can be inserted and supported by paper rails.

Until the Romantics came around and made everyone feel embarrassed about taking inspiration from other people, it was not uncommon for scholars to use Scholar’s Boxes. Gottfried Leibniz actively used what was known as an excerpt cabinet to store and organize quotations and references.

Leibniz's method of the scholar's box combines a classification system with a permanent storage facility, the cabinet. So in a way this is similar to the use of Zotero or other citation management systems, but instead uses loose sheets of paper on hooks. The strips are hung on poles or placed into hybrid books

And that’s the reason why I wanted to start my talk with a brief history lesson. To remind us that there is a common ancestor to the library catalog and the scholar’s bibliography, and that is the index card.

So as we’ve learned, from as far back as Gessner’s 16th Century, writers have been using cards and slips of paper to rearrange ideas and quotations into texts, citations into bibliographies, and bibliographic descriptions into card catalogues.

You can still buy index cards and card boxes at my local campus bookstore. That’s because there are still authors today, who still use index cards to piece together and re-sort parts of their paper or novel, or they use and rearrange digital cards inside of such writing software tools such as Scrviner to generate new works.

Now, I don’t write this way myself. But I do use Zotero as one of the tools that I use to keep track of citations, book marks, saved quotations, and excerpts of text that I have used or might use in my own work as a writer and academic librarian.

Zotero acts as an extension of your web reading experience and it operates best as an add-on to the Firefox browser. If you use Zotero, you can usually easily capture citations that one finds on a page either because someone who supports Zotero has already developed a custom text scraper (called a translator) for the database or website that you are looking at or that citation has been marked up with text that’s invisible to the human eye but can be found in the span HTML tags that surround the citation using a microformat called COinS.

Zotero also allows scholars to backup their citations to their server and in doing so, share their citations by making one’s library public on Zotero.org. Alternatively, scholars can share bibliographies on their own website using the Zotero API which is so simple and powerful you can embed a bibliography styled with APA with a single line of code.

One of my favourite features of Zotero is not widely known. Zotero out of the box allows the scholar to generate ‘cards’ which are called ‘reports’ from your bibliography. When I have a stack of books that I need to locate in my library, I sometimes find it’s easier for me to select and generate a report of cards from my Zotero collection rather than to search, select and print the items using my library’s expensive ILS system.

There is a terrible irony to this situation. As I learned from the Library Journal column of Dorothea Salo, the design problem given to Henriette Avram’s, the inventor of the MARC records was to have “computers print catalog cards.”

As Salo says in her piece, “Avram was not asked to design a computer-optimized data structure for information about library materials, so, naturally enough, that is not what MARC is at heart. Avram was asked solely to make computers print a record intended purely for human consumption according to the best card-construction practices of the 1960s.”

Let’s recall that one of the reasons why Zotero is able to import citations easily is because of the invisible text of COinS and translators.

The metadata that comes into Zotero is captured as strings of text. Which is great - a name is now tagged with the code AU to designate that the text should go in the Author field. But this functionality is not enough if you want to produce linked data.

Dan Scott has kindly shared the code to RIS2WEB that allows you to run it on an export of a bibliography from Zotero in doing so create and serve a citation database that also generates of linked data using Schema. Afterwards, you can add available URIs.

You can see the results of this work at http://labourstudies.ca

When I showed this to a co-worker of mine, she couldn’t understand why I was so impressed by this. I had to hit Control-U on a citation to show her that this citation database contained identifiers such as from VIAF: The Virtual International Authority File. I explained to her that by using these numeric identifiers - invisible to the human eye - computers will be able not only find matching text in the author field, they will be better able to find that one particular author.

So can we call Zotero a Scholar’s Box or Paper Machine for the digital age?

I think we can, but that being said, I think we need to recognize that the citations that we have are still stuck in a box, in still so many ways.

We can’t grab citations from library database and drop them into a word processor without using bibliographic manager like Zotero as an intermediary to the capture structured data that might be useful to my computer when I need format a bibliography. Likewise, I can’t easy grab linked data from sites like the Labour Studies bibliography page.

And we still really don’t share citations in emails or social media.

Instead, we share the URL web addresses that point to the publisher or third party server that host said paper. Or we share PDFs that should contain all the elements needed to construct a citation and yet somehow still requires the manual re-keying and control c-ing and v-ing of data into fields when we want to do such necessary things as add an article to an Institutional Repository.

Common Web tools and techniques cannot easily manipulate library resources. While photo sharing, link logging, and Web logging sites make it easy to use and reuse content, barriers still exist that limit the reuse of library resources within new Web services. To support the reuse of library information in Web 2.0-style services, we need to allow many types of applications to connect with our information resources more easily. One such connection is a universal method to copy any resource of interest. Because the copy-and-paste paradigm resonates with both users and Web developers, it makes sense that users should be able to copy items they see online and paste them into desktop applications or other Web applications. Recent developments proposed in weblogs and discussed at technical conferences suggest exactly this: extending the 'clipboard' copy-and-paste paradigm onto the Web. To fit this new, extended paradigm, we need to provide a uniform, simple method for copying rich digital objects out of any Web application.

Now, those aren’t my words. That’s from this paper Introducing unAPI written by Daniel Chudnov, Peter Binkley, Jeremy Frumkin, Michael J. Giarlo, Mike Rylander, Ross Singer and Ed Summers.

This paper, I should stress, was written in 2006.

Within the paper, the authors outline the many reasons why cutting and pasting data is so infuriatingly difficult in our sphere of tools and data.

But what if there was another paradigm we could try?

In order to see how we might be possibly break out of the scholar’s box, I’m going to talk about a very speculative possibility. And in order to set us up for this possibility, I first need to talk about how cards are already used on the web and on our tablets and smart phones.

https://blog.intercom.io/why-cards-are-the-future-of-the-web/

If you look around the most popular websites and pay particular attention to the design patterns used, you will quickly notice that many of the sites that we visit every day (Twitter, Facebook, Trello, Instagram, Pinterest) they all use cards as a user interface design pattern.

https://www.pinterest.com/khoi/card-user-interfaces/

The use of cards as a design pattern rose up along with the use of mobile devices largely because a single card fits nicely on a mobile screen...

...while on larger surfaces, such as tablets and desktops, cards can be arranged in a vertical feed, like in Facebook or Twitter, or arranged as a board like Pinterest, or like a like a stack, such as Google Now or Trello.

This slide is from a slidedeck of designer and technologist, Chris Tse. The rest of this section is largely an exploration of Chris’ work and ideas about cards.

https://speakerdeck.com/christse/patterns-of-card-ui-design

Case in point, Chris Tse states, the most important quality of ‘cards’ is that of movement. But by movement, he isn’t referring to the design’s apparent affordances that makes swiping or scrolling intuitive.

The movement of cards that’s really important is how they feed into content creation and content sharing and how cards feed into discussions and workflow.

(How cards fit into kaban boards and shared workflow software like Trello, is a whooooole other presentation)

https://twitter.com/copystar/status/622459788606210048

Social media is collectively made up of individuals sharing objects - objects of text, of photos, of video, of slideshows - and they share these objects with friends and to a larger public. Each of these objects are framed - by and large - within cards.

It’s important to realize that the cards on the web are fundamentally more than just a just a design hack. If you are familiar with Twitter, you may have started to see cards that don’t just feature 140 characters - you see playable music (such as from Soundcloud), slideshows that you can read through without leaving Twitter, and you can even download 3rd party apps from Twitter advertising cards. When the business press say that Twitter is a platform, it’s not just marketing hype.

As Chris Tse says, cards are more than just glorified widgets. “When done right”, he says, “a card looks like responsive web content, works like a focused mobile app, and feels like a saved file that you can share and reuse". As “cards” become more interactive, he believe they will go from being just concentrated bits of content and turn into mini-apps that can be embedded, can capture and manipulate data, or even process transactions.

But why isn’t this more obvious to people? I think the reason why is that cards don’t really feel this way is that most cards can only move within their own self-contained apps or websites.

For example, Google Now cards work with your Google applications – such as your calendar - but doesn't know about the events that you've RSVPed in Facebook.

That being said, Google and Apple are working on ways into integrate more services into their services. In Google Now, I’m regularly offered news story based on my recent searches as well as stories that are popular to other readers who read similar stories using the Feedly RSS reader.

And this is a problem because it’s Google who is deciding whose services I can choose from for such card notifications.

The apps on your smart phone live in a walled garden where things are more beautiful and more cultivated, but it is a place that is cut off from the open web.

The fall of the open web and the rise of the walled garden is not a trivial problem. We should not forget that if you want your app to be available on an iPhone it must be in the Apple Store and the content of your app will be subject to the Apple Review process and Apple will take a 30% cut of what your app sells for. Content within apps curtail various forms of free and freedoms.

To bridge this split of the open web and the walled app garden, Chris Tse founded Cardstack.io. The mission of Cardstack is to “To build a card ecosystem based on open web technologies and open source ethos that fights back against lock-in.”

CardStack wraps single-page JavaScript applications as a reusable ‘card’ that can be embedded in native apps and other web apps. According to Chris, Cardstack.io HTML5 cards will be able to move between apps, between devices, between users and between services.

CardStack itself is comprised of other JavaScript libraries, most notably Conductor.js and Oasis.js and I cannot speak anything more to this other than to repeat the claim that these combined libraries create a solution that is more secure than the embedded content than the iFrames of widgets past.

But notice the ‘Coming Soon’ statement in the top left hand corner? CardStack is still in beta with SDKs still being developed for iOS and Android systems.

Despite this, when I first stumbled upon Chris Tse’s presentations about Cardstack, I was really excited by his vision. But the crushing reality of the situation settled mere moments later.

Yes, a new system of cards to allow movement between siloed systems that could work in mobile or desktop environments, that would be wonderful - but wasn’t it all too late?

And what does it mean if we all collectively decide, that it is all just too late?

One of the challenges of promoting a new sort of container is that you can really show it off until you have some content to contain. You need a proof of concept.

When I checked in later to see what Chris was doing as I was drafting this presentation, I learned that he was now the CTO of a new platform - that he confirmed for me is built using Cardstack as a component.

This new platform has its origin from the art world’s Rhizome’s Seven on Seven conference. The Seven on Seven conference pairs seven visual artists with seven technologists for a 24 hour hackjam and in 2014, artist Kevin McCoy was paired up with technologist Anil Dash.

McCoy and Dash were both interested in improving the situation of digital artists whose work can be easily be copied. With copying, the provenance of a digital work can be lost and as well as the understanding of what was original and what has been modified.

They talked and worked on a proof of concept of a new service that would allow visual artists to register ownership of their digital work and transfer that ownership using blockchain technology - that's the technology behind bitcoin.

Over a year later, this idea has grown into a real platform that is private beta and is set to be released to the public this month.

I think two things are particularly awesome about this project. First the platform also allows for artists to decide for themselves whether the license for their work in the creative commons or requires a royalty and whether derivatives of their work is allowed.

The other thing I love about this project is its name. If you look at the top left hand of this screen you will find that the name of the platform is spelled m-o n-e-g-r-a-p-h. The platform is called MONOGRAPH.

And as we now know, you build a monegraph with cards.

We need to remember that the library and the bibliography are connected by the card.

As we continue to invest in crucial new endeavors in the digital realm, I think it's essential that librarians find new ways to surface our resources and allow them to be shared socially and to find the means by which scholars can save and sorting and re-use these resources that they find from our collections.

We are part of a generative process. Cards of single ideas that are arranged and stacked build theses, which in turn, build papers, books which, in turn, form bibliographies which fill libraries.

I would like libraries to find a way to back to Gessner’s Bibliotheca Universalis, a place where the library and the scholar were both connected.

After all...