User talk:Brian McNeil/Project INDECT/Libraries

Research points

 * Back up assumptions about libraries and put in context. Check US constitution, ask British Library, Library of Congress.
 * Actually, United States Bill of Rights - First and Fourth Amendments. --Brian McNeil / talk 16:40, 9 October 2009 (UTC)
 * Query on Wikipedia - check back regularly.
 * Responses:
 * In Norway it is illegal to retain such records. Even teachers' lists of which child has borrowed which books from the school library are discouraged and may in no case be retained longer than the current school year. - Hordaland (talk) 22:13, 9 October 2009 (UTC)

Library of Congress
"I am attempting to investigate the historical background of records maintained by libraries in general; the basis of the project this research is for relates to the Library of Congress as a "model library", and the potentially revolutionary changes computerised record-keeping involves.
 * Website: http://www.loc.gov/index.html
 * Pressroom: http://www.loc.gov/pressroom/login (requires username/password, see page source )
 * Librarian of Congress James H. Billington
 * James Hadley Billington was sworn in as the Librarian of Congress on September 14, 1987. He is the 13th person to hold the position since the Library was established in 1800.
 * Ask a librarian: http://www.loc.gov/rr/askalib/ (publicly accessible)
 * Press-specific contacts: http://www.loc.gov/pressroom/page/Working-with-the-Library-of-Congress
 * Query submitted to LoC Ask a librarian service:

My questions (sorry, multiple) are:

1. When set up, and made accessible to the public, what records did the Library of Congress maintain? Did this include records of who borrowed or consulted any specific books?

2. Are the principles enshrined in the First and Fourth amendments applied to records as set out in Q1?

3. What modern-equivalent to records as set out in Q1 are retained?

4. In modern terms, "data retention" - i.e. over time, how long have the Q1 records been retained?

4. Now, and since opening to the public, under what circumstances would the Library of Congress divulge said records?

5. In general, it would also be extremely useful to have the same information from different geopolitical points."


 * Received notice that query has been forwarded to Office of the General Counsel --Brian McNeil'' / talk 21:03, 10 October 2009 (UTC)

Paula Kaufman (no Wikipedia page)

 * University Librarian and Dean of Libraries, University of Illinois at Urbana-Champaign since September 7, 1999
 * About page: http://www.library.illinois.edu/administration/librarian/aboutme.html
 * Relevance: Refused to assist FBI's Library Awareness Program with information on library patrons in 1987. (See NYT source below, at Columbia University at the time.) Gave presentation on the Program to Tennessee Library Association Annual Conference, April 20, 1989
 * Request for quotable comment sent to Dean Kaufman via email:
 * "Dean Kaufman,


 * I am a freelance journalist carrying out some background research into libraries, their record-keeping, it's evolving relationship with the U.S. First Amendment, and implications arising from use of computers and the Internet.


 * Investigating an aspect of this - history of threats and challenges to libraries and their records - brings up your name as one of the more outspoken defendants of the First Amendment in a library context. This, of course, is in relation to your highly principled stance against cooperation with the FBI's Library Awareness Program in 1987.


 * See: http://www.nytimes.com/1987/09/18/nyregion/fbi-in-new-york-asks-librarians-aid-in-reporting-on-spies.html?pagewanted=all


 * My more general research into library history and the First Amendment turns up points where librarians actively cooperated with authorities in time of war to selectively censor collections; as you move closer to the present, the ALA firms up its best-practice guidelines, stresses neutrality in content selection, and restricting cooperation with authorities to the "reasonable search" principle of the First Amendment.


 * I am afraid my expansion of the points I would highly value quotable comment, or a statement, on is somewhat complex. Please bear with me while I put this all in context.


 * One item I am particularly interested in is library records kept detailing who borrows or refers to works within its collection. I have found absolutely no pre-computing information on this but, in modern terms I would call this a "borrowing record". I have an outstanding query with the Library of Congress regarding their handling of such records, and from the British Library's Information Charter, referring to personally identifiable information, it states "make sure we don’t keep it longer than necessary". Elsewhere on the British Library site a similar statement indicates that this data is only retained as long as required for the work of the library.


 * This, in principle, is what I had hoped to find. From my own IT background I would express this thus:


 * Where a "borrowing record" has been created matching a person with an item in the library collection; it must be deleted when:
 * The referenced item is returned to its correct place in the collection or,
 * If returned damaged, the "borrowing record" is only deleted after the person has replaced the item, or paid appropriate damages
 * If not returned, the "borrowing record" is retained to pursue whoever has the borrowed item


 * That, I would sincerely hope, has been a long-standing tradition in libraries. While I find reference to Nazi and Soviet action similar to the FBI's Library Awareness Program, I find nothing indicating that at any point my aforementioned "borrowing records" have been used for persecution or profiling.


 * Now, within a library, such records are reasonably protected. However, a variety of similar information may be available over, or from, the Internet.
 * For convenience, a person can check what items they have booked out, and their due return date
 * A person may browse and search the library collection, and carry out actions such as reserving an item in the collection, or requesting an item via an inter-library-loan


 * This information is not just available to the person using the website and the library, it is available to anyone able to read the data travelling between the person's computer and the library's website. That is, unless the data travels encrypted like use of online banking or shopping.


 * This does not become a particularly concerning issue until you consider actions such as the AT&T Warrantless Wiretapping scandal. In such cases AT&T, as a person's Internet Service Provider, would be able to provide all this information to whoever requested it, and the library would be unable to guarantee First Amendment rights, or prevent disclosure without regard for the First Amendment.


 * Of course, there are not just issues with this inside the United States. I live in Scotland, and several of the biggest Internet Service providers here are working on a targeted advertising system (Phorm) which relies on access to a subscriber's browsing history. If the library website does not use encryption, advertisers would be able to use a person's reading preferences in selecting the advertising presented to them.


 * Given this context, I would greatly appreciate quotable content in response to the following questions:


 * Do libraries, in general, do enough to keep as little personally identifying information as possible, for the shortest period of time required for the library to function?


 * Do you think libraries, to preserve First Amendment rights, should use the same measures as applied with online banking for remote access to their collection and services?


 * Considering that records such as books borrowed from a library, and searches made in a library's collection can be very useful to users accessing the library website, do you think that retention of such information should only be done when the user has explicitly requested it?




 * I would like to thank you in advance for your comments on this; I would also greatly appreciate if you could suggest colleagues or other library professionals, outside the United States, who I might attempt to obtain similar comment from with their local perspective.


 * I will of course, when published, furnish you with a link to the complete article.


 * Regards,


 * Brian McNeil
 * Wikinewsie"

Jay Walsh, WMF

 * Wikimedia Foundation's head of communications since January 10, 2008.
 * Previously with Canadian Broadcasting Corporation


 * Sent link to Project INDECT and the leaked video
 * Put situation/hypothetical situation in context
 * Following from emails:
 * My query (in reference to Wikimedia going https:// -only) :
 * With the WMF having a strong stance on user privacy, I have to ask, how much might that actually cost? What sort of indications that the worst-case levels of data collection were being carried out would prompt the WMF to (if it wasn't such a funding nightmare) contemplating the switch?
 * Response:
 * 'I don't think there's any serious consideration right now of us investing lots of money into a large scale, secure system beyond what's in place. As you can appreciate, there would be tens of thousands of users and right now we divert our limited financial resources into supporting the editors and of course the hundreds of millions of users who come to Wikipedia and our other projects every month.
 * 'I don't think anyone at the Foundation could speculate on the cost - we're not experts in this kind of infrastructure, but you can be sure we'd seek out open-source alternatives first (as virtually the entire software stack at WMF is open-source).
 * 'And it's also not really possible for us to determine what hypothetical worst case scenario would prompt us to think about increasing security. We would really have to examine any changes (I'm not aware of any intense examinations taking place right now) to this space and of course keep up to speed on what's happening in the industry overall.
 * As you know, we're opposed to collecting any more private data than necessary. We're proud of a vast, open project that epitomizes transparency and openness on the net. We only collect as necessary by law, and what we do collect we track, protect, and monitor very carefully.

The British Library

 * Website: http://www.bl.uk/
 * Press room: http://www.bl.uk/news/main.html
 * Contacts: http://www.bl.uk/news/prcontacts.html
 * Data Protection policy: http://www.bl.uk/aboutus/foi/pubsch/pubscheme5/BLData%20Protection%20Policy2002.pdf
 * Information Charter: http://www.bl.uk/aboutus/foi/infcharter/index.html
 * "The Library regularly reviews its systems and processes to ensure that when we ask you for personal information, we:

• make sure you know why we need it;

• ask for what we need and do not collect irrelevant information;

• protect it from loss, damage or unauthorised access;

• let you know if we share it with other organisations;

• make sure we don’t keep it longer than necessary; and

• will not use your personal information for marketing without your permission."


 * (Emphasis added)


 * Submitted query to press assistant from contacts page:
 * "Dear Chloé,

I am carrying out research for a possible news article I would like to pitch. I assume, as none of the specific listed press contacts appears appropriate, you would be the relevant contact, and could direct my query accordingly.

What I am trying to establish is a general idea of record-keeping within libraries. The specific records of interest to me are details of people referring to, or borrowing, books from the library.

I have read the various principles and policies currently provided on the British Library website, the repeated references are to UK data protection legislation, and data on individuals only being retained as long as is required for the functioning of the library.

For these - "borrowing records" - My interpretation is that, where a library user accesses a book this should be recorded only for as long as is required to ensure the book is returned to the correct place in its collection. Of course, book could be replaced with a variety of other media forms which the library may have in their collection.

The above would probably best be clarified if this is built into the specification of the library catalogue management system to delete such "borrowing records" upon confirmation the book has been returned to its rightful place.

This covers now, and in general I believe I can assume this is the case for all computerised records of this nature.

However, I am at a loss to find details of how this may have been handled in the past.

Any advice or assistance you can give on this matter would be greatly appreciated.

Brian McNeil Wikinewsie"

Costs
Very difficult to formulate effective search queries to get ball-park figures for cost of https instead of http.


 * Query posted on Ubuntu forums
 * "Support for hardware-based web server encryption

I'm doing some research into the impact of running a website exclusively over https instead of unencrypted http. I am not having much luck getting easily-understood answers to make credible estimates on the cost of this, and how that scales with a more complex setup than a single web server.

The first, and simplest question, is:

1. If you compare a single web server running https with one running http on the same hardware, what are the the impacts on performance? (For example, for every 100 users/requests the http server supports, the https server will only support 85).

2. Is there crypto hardware supported by Linux that would make question 1 a non-issue?

3. If a site is scaled and distributed using squid caches, what are the repercussions of processing requests exclusively via https? Does this change if the site usage includes a significant number of user-supplied content updates?

Any help with this would be very, very useful.

Thanks!"

Wikipedia - Infrastructure

 * Hardware and hosting report - Domas Mituzas (last updated: August 2006, only lists new purchases)
 * Servers page on Meta (Outdated warning)
 * Wikitech Main page
 * Wikimania 2009 presentation - Rob Halsell
 * Nagios
 * Ganglia
 * Torrus - Squids
 * These look very interesting and cool

IBM - Approach

 * Search for hardware crypto acceleration turns up IBM as one of a very small number of solutions.
 * Pressroom, Steve Malkiewicz - IBM Research and Tecnology
 * Submitted query for information:
 * "Dear Mr Malkiewicz,

In my efforts to research details for a quite complex story I am working on, IBM is one of only a handful of companies that gave a positive match for a particular piece of equipment, namely hardware-based encryption.

Details that would be useful to my work appear to have been researched by IBM - judging by the page I located.

The premise I am working from is: Considering the AT&T Warrantless Wiretapping scandal, its First Amendment implications, and the intense global research effort into search and other data mining technology, a wide range of popular websites would best serve their users First Amendment privacy rights through using encryption; serving up pages over https:// instead of http://

I have made a number of inquiries with the Library of Congress, and prominent University librarians, on their reaction to this.

However, this itself needs put in context; basically by in some way expressing the cost of, or reduction in capacity, switching to encrypted web access.

Taking Library of Congress web services as an example, what would be the contrast in costs for an IBM solution using encryption versus one not using encryption?

I had also hoped to look into cost details for highly popular sites such as Wikipedia and Facebook.

Both these example sites rely almost exclusively on commodity hardware, Linux, and Open Source Software. Neither could even begin to contemplate a from-scratch solution.

I could not confirm that crypto hardware is available that would plug into the systems they have, and be supported in Linux, researching this is what, neatly, brought me to the IBM website.

This, as I understand it, leaves the choice for such sites as software encryption. If such were implemented, the capacity of the site would be reduced, or additional servers would be needed to maintain current capacity.

Would you be able to help put numbers into all of this? As mentioned earlier, I certainly have reason to believe IBM has carried out serious research into this to establish where your hardware encryption solution is most appropriate for clients.

Regards,

Brian McNeil Wikinewsie" --Brian McNeil / talk 22:09, 10 October 2009 (UTC)

Library records - Threats - Related documents

 * The Freedom to Read Statement - American Library Association (originally published and ratified in 1953, linked-to version has been updated)
 * Public Libraries and Intellectual Freedom - American Library Association, undated.
 * "User confidentiality and privacy: Failure to protect the rights of library users to utilize library materials and services privately can limit the practical exercise of First Amendment rights. Incursions of this sort—which are illegal in many jurisdictions— have been attempted by the FBI, police agencies, marketing firms, religious missionaries, the press, and others. Protecting confidentiality in an automated, networked environment provides new challenges for libraries. Security becomes an issue not only to insure the integrity of the library’s own databases and computer operating systems, but also to protect users’ privacy rights."


 * "RESOLVED, That the American Library Association urges all libraries to adopt and implement patron privacy and record retention policies that affirm that "the collection of personally identifiable information should only be a matter of routine or policy when necessary for the fulfillment of the mission of the library" (ALA Privacy: An Interpretation of the Library Bill of Rights)"
 * "RESOLVED, That the American Library Association urges all libraries to adopt and implement patron privacy and record retention policies that affirm that "the collection of personally identifiable information should only be a matter of routine or policy when necessary for the fulfillment of the mission of the library" (ALA Privacy: An Interpretation of the Library Bill of Rights)"
 * "RESOLVED, That the American Library Association urges all libraries to adopt and implement patron privacy and record retention policies that affirm that "the collection of personally identifiable information should only be a matter of routine or policy when necessary for the fulfillment of the mission of the library" (ALA Privacy: An Interpretation of the Library Bill of Rights)"


 * Columbia University’s Mathematics and Science Library, Acting University Librarian, Paula Kaufman, refused to cooperate with FBI
 * Columbia University’s Mathematics and Science Library, Acting University Librarian, Paula Kaufman, refused to cooperate with FBI
 * Columbia University’s Mathematics and Science Library, Acting University Librarian, Paula Kaufman, refused to cooperate with FBI


 * Book: Surveillance in the stacks : the FBI's library awareness program / Herbert N. Foerstel (1991) Offline (see p65).
 * Reference in National Library of Australia database - Prohibitively expensive excerpt/reproduction costs
 * Author/Library expert: Roy Tennant (2003 LITA/Library Hi Tech Award winner)
 * Journal: June Pinnell–Stevens, 2003. "Libraries, privacy, and government information after September 11," PNLA Quarterly, volume 67, number 4, pp. 6–9.
 * Periodical?: Pamela S. Richards, 2001. "Cold War librarianship: Soviet and American library activities in support of national foreign policy, 1946–1991," Libraries and Culture, volume 36, number 1, pp. 193–203.
 * Journal: Roy Tennant, 2003. "Patriotism as if our Constitution matters," Library Journal, volume 128, number 12 (July), p. 32.