Sunday, July 06, 2008

End of Science? End of Theory?

Chris Anderson has done it again, this time with an article about the end of theory. How? In short: raw data. In End of Theory, he believes that with massive data, the millennial-long scientific model of hypothesize, model, test is becoming obsolete, Anderson believes.


Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n dimensional grand unified models over the past few decades (the "beautiful story" phase of a discipline starved of data) is that we don't know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

And according to Anderson, biology is heading in the same direction. What does this say about science and humanity? In February, the National Science Foundation announced the Cluster Exploratory, a program that funds research designed to run on a large-scale distributed computing platform developed by Google and IBM in conjunction with six pilot universities. The cluster will consist of 1,600 processors, several terabytes of memory, and hundreds of terabytes of storage, along with the software, including IBM's Tivoli and open source versions of Google File System and MapReduce.

Anderson's been right before. See Long Tail and Free. But this one's just speculation of course. Perhaps one commentator hit the point when he says, "Yeah, whatever. We still can't get a smart phone with all the bells and whistles to be able to use any where in the world with over 2 hours worth of use and talk time...so get back to me when you've perfected all of that." Well said. Let's wait and see some more.

Tuesday, July 01, 2008

Catalogue 2.0

It's blogs like Web 2.0 Catalog that keep me going. Catalogues have been the crux of librarianship, from the card catalogue to the OPAC. But for libraries, the catalogue has always seemed to be a separate entity. It's as if there is a dichotomy: the Social Web and the catalogue -- from there, the twine shall never meet. What would a dream catalogue look like to me? I have 8 things I’d like to see. Notice that it’s not out of the stretch of imagination. Here they are:

(1) Wikipedia – What better way to get the most updated information for a resource than the collective intelligence of the Web? Can we integrate this into the OPAC records? We should try.

(2) Blog – “Blog-noting” as I call it. To a certain extent, some catalogues already allow users to scribble comments on records. But blog-noting allows users to actually write down reflections of what they think of the resource. The catalogue should be a “conversation” among users.

(3) Amazon.ca - Wouldn’t it be nice to have an idea what a book costs out on the open market? And wouldn’t it make sense to throw in an idea of how much the used cost would be?

(4) Worldcat - Now that you know the price, wouldn’t it be useful to have an idea of what other libraries carry the book?

(5) Google-ability – OPAC resources are often online, but “hidden” in the deep web. If opened up to search engines, it makes it that much accessible.

(6) Social bookmarking – If the record is opened to the Web, then it naturally makes sense to be linked to Delicious, Refshare & Citulike (or similar bibliographic management service).

(7) Cataloguer’s paradise – Technical servicemen and women are often hidden in the pipelines of the library system, their work often unrecognized. These brave men and women should have their profiles right on the catalogue, for everyone to see, to enjoy. Makes for good outreach, too. (Photo is optional).

(8) Application Programming Interface - API's are sets of declarations of the functions (or procedures) that an operating system, library or service provides to support requests made by computer programs. It's like the interoperable sauce which adds taste to web service. It's the crux of Web 2.0, and will be important for the Semantic Web when the Open Web will finally arrive. As a result, API's need to be explored in detail by OPACs, for ways to integrate different programs and provide open data for reuse for others.

Are these ideas out of the realms of possibility? Your thoughts?

Monday, June 23, 2008

Seth Godin at SLA in Seattle

Seth Godin is a best-selling author, entrepreneur and agent of change. He is the author of Permission Marketing, a New York Times best seller that revolutionizes the way corporations approach consumers. Fortune Magazine named it one of their Best Business Books, and Promo magazine called Godin "The Prime Minister of Permission Marketing" and "ultimate entrepreneur for the information age" by Business Week Magazine.

Best known as being an author of books such as Unleashing the Ideavirus, the Purple Cow, and Permission Marketing, Godin’s blog is not only one of the most popular blogs in the world, Godin also helped create a a popular website Squidoo, which is
a network of user-generated lenses --single pages that highlight one person's point of view, recommendations, or expertise . According to Godin, the way marketing works now is not by interrupting large numbers of people; rather, it is through soliciting a small segment of rabid fans who can eagerly spread the word about one's idea. The challenge is how to engage each person to go and bring five friends. What tools do we give them so that they can reach out to colleagues? A website like Zappos is so successful not because it sells shoes, but because it connects consumers to products, and then encourages consumers to spread the word to their friends and colleagues -- and hence, more consumers.

In this new era of permission marketing, spamming no longer works. Services such as PayPal which connect users to products or Sonos, which engage users as customers through recreating data into knowledge, and producing a conversation using the web as its platform are the new models of success. "Be remarkable," Godin argues, and "tell a story to your sneezers" so that they could spread the word and "get permission" from consumers for their attention to the product. Godin concludes with a controversial assertion. "Books are souvenirs," he said, to a hushed audience. Most people find everyday facts and information from digital documents. "When was the last time you got your information from a book?" Although Godin might have made a gross generationalization, his assertion of the divergence between the digital and the physical is a reality. In the Web 2.0 world, our enemy is obscurity, not piracy.

Together, Abram and Godin's sessions at SLA 2008 in Seattle were both rewarding experiences. They ultimately propose that information professionals need to shift their mentality from one of passivity to one of actively promoting themselves, of engaging information services in new ways, and of accepting change with an open mind.

Thursday, June 19, 2008

Stephen Abrams at SLA in Seattle

Day #2 of SLA was full of fascinating discussions. Stephen Abram's session, "Reality 2.0 - Transforming ourselves and our Associations" offered the most thought provoking ideas - definitely the highlight of my experience at this conference.

For those who don't already know, Stephen Abram is President 2008 of SLA and was past-President of the Canadian Library Association. He is Vice President Innovation for SirsiDynix and Chief Strategist for the SirsiDynix Institute.

Here's a flavour of what I thought were key points that really gave me food for thought:

(1) What's wrong with Google and Wikipedia? - It's okay for librarians to refer to Google or Wikipedia. Britannica has 4% error; Wikipedia has 4% error, plus tens of thousands of more entries. It's not wrong to start with Wikipedia & Google, but it is wrong when we stop there.

(2) Don't dread change - This is perhaps the whiniest generation this century. The generation that dealt with two world wars and a depression did fine learning new tools like refrigerators, televisions, radios, and typewriters. And they survived. Why can't we? Is it so hard to learn to use a wiki?

(3) Focus! - We need to focus on the social rather than the technology. Wikis, blogs, and podcasts will come and go. But connecting with users won't. We must not use technology just for the sake of catching up. There has to be a reason to use them.

(4) Don't Be Anonymous - Do we give our taxes to a nameless accountant? Our teeth to a nameless dentist? Heart surgeon who has no title? If these professions don't, then why are information professionals hiding behind their screens. Go online! Use social networking as your tools to reach out to users!

(5) Millennials - This is perhaps the 1st generation in human history that its younger generation teaches its previous generation. However, though there is much to learn from youths about technology, there is also much need to mentor and train for this profession to prosper and flourish.

(6) Change is to come! - Expect the world to be even more connected than it already has. With HDTV, that means more cables are freed up for telecommunications. Google's endgame is to provide wireless accesss through electricity. There're already laser keyboards where you can type on any surface. The world is changing. So must information professionals.

(7) Build paths, not barriers - When there are pathlines created by pedestrians, libraries commonly erect fences to prevent walking. Why not create a path where one exists already so that the library becomes more accessible? Librarians must go to the user, not the other way around. If patrons are using Facebook, then librarians need to use that as a channel for communication.

Stephen's power presentation is here for your viewing pleasure as well.

Tuesday, June 17, 2008

SLA Day #1

Just when one thought that bibliographic control has changed, it might change some more. On Day 1 of SLA in Seattle, I went to a fascinating session given by Jose-Marie Griffiths, called On the Record: Report of the Library of Congress Working Group on the Future of Bibliographic Control offered a fascinating multifaceted glimpse into the current situation of bibliographic control and cataloguing. What is intriguing about this working group is the fact that it comprises both the library world and the private sector. Led by a tri-membership of Google, the American Library Association, and the Library of Congress, the working group created a recommendation document which proposed five general recommendations: 1) increasing efficiency; 2) enhancing access; 3) positioning technology; 4) positioning the community for the future; and 5) strengthening the profession.

What is controversial about the proposal is the suspension of the Resource and Description Access (RDA). Not only does the working group believe that the RDA is too confusing and difficult to implement, it also requires much more testing. The report also proposes for a more continue education in bibliographic control for professionals and students alike. By designing an LIS curriculum and building an evidence base for LIS research can the profession be strengthened for the future.

Although the session had a fairly spare audience, I found this session to be highly engaging and perhaps even ominous for the future of librarianship. Because the Library of Congress accepted the report with support (although unofficially), this could mean a schism in progress of RDA, which is viewed as the successor of the AACR2. Also, the fact that this working group included the non-library world (i.e. Google and Microsoft), the future of bibliographic control won't be limited to librarians. Rather, it will involve input from the private sector, including publishers, search firms, and the corporate world. Is this a good thing? Time will tell. For better or for worse.

Thursday, June 12, 2008

B2B in a World of Controlled Vocabularies and Taxonomies

The e-readiness rankings have been released. And it reveals that the US and Hong Kong are the leaders in this category for e-readiness. How do you measure it? According to the Economic Intelligence Unit, connectivity is a measure of e-readiness. Digital channels in a country which are plentiful, fast, and reliable enough for its people and its organizations to make the most of the Internet is the basic infrastructure and measure for this to happen. But if individuals and businesses do not find the available channels useful in completing transactions, then the number of PC's or mobile phones in a country is a worthless measure.

Hence, the EIU designed its findings by looking at the opportunities that a country provides them to businesses and consumers to complete transactions. Market analysts Forrester estimate that online retail sales in the US grew by 15% in 2006; US $44 billion was spent online in the third quarter and the firm estimates that 2006 online sales in the Christmas holiday season alone reached US 427 billion. Another research firm, IDC, estimates that business-to-business (B2B) transaction volume in the US will reach US $650 billion by 2008, which amounts to two-thirds of the world's US $1 trillion B2B market by that time.

Even though there is concern that the great weight of the US in online activity takes away from the rest of the world, the fact that its online adoption also benefits other countries; China is one beneficiary of the growth of B2B volumes in the US, so much so that there has been the creation of some sizeable and sophisticated B2B transaction service providers, including one of the world's largest online B2B marketplaces, Alibaba.

Over 15 million business and consumer customers in China user Alibaba's online platform. While most do not pay to use basic services, more than 100,000 businesses do. In fact, Yahoo! had bought 40% stake in Alibaba for US $1 billion in 2005). The chinese firm is evolving into a comprehensive supplier of online business development resources for Chinese customers, many of whom would not be doing business online at all if not for Alibaba.

What does this mean for information professionals? A great deal. Look at the financial implications of B2B in the current telecommunications infrastructure. We're essentially running the online and digital economy on the bricks and mortars of outdated networks. We're in a good position to take advantage of the this upcoming economy.

Thursday, June 05, 2008

Talis on Web 2,0, Semantic Web, and Web 3.0

I was honoured to have been interviewed by Richard Wallis of Talis. I was also quite humbled by the whole experience, as I learned just how far I've come in my understanding of the SemWeb and how much more I have to go. We had a good chat about Web 2.0, Semantic Web, and Web 3.0. Have a listen to the podcast. Any comments are welcomed. For those who want a synopsis of what we had discussed, here is my distilled version:

1. Why librarians? - Librarians have an important role to play in the SemWeb. Information organization are traits and skills that librarians have which are relevant to the SemWeb architecture. Cataloguing, classification, indexing, metadata, taxonomies & ontologies -- these are the building blocks of LIS.

2. What will the SemWeb look like? - Think HDTV. I believe the SemWeb will be a seamless transition, one that will be lead by innovators - companies and individuals that will pave the way with the infrastructure for it to happen, yet at the same time will not alienate those who don't want to encode their applications and pages with SemWeb standards. But like HDTV, those who fall behind will realize that they'll need they'll eventually need to convert...

3. Is this important right now? - Not immediately. The SemWeb might have minimal effect on the day-to-day work of librarians, but the same could be said for computer programmers and software engineers. Right now, we are all waiting for that killer application that will drive home the potentials of the SemWeb. So until that transpires, there is much speculation and skepticism.

4. What do librarians need to do? -
Learn XML, join the blogosphere's discussion of the SemWeb, discuss with colleagues, pay attention to RDA, continue questioning the limitations of Web 2.0. Just because we don't see it yet, doesn't mean it should stop us from joining the discourse. Think string theory.

Wednesday, June 04, 2008

Easterlin Paradox of Information Overload

According to Wikipedia, the Easterlin Paradox is a key concept in happiness economics. Theorized by economist Richard Easterlin in the 1974 paper "Does Economic Growth Improve the Human Lot? Some Empirical Evidence," it proposes that, contrary to expectation, economic growth doesn't lead to greater happiness. It quickly caught fire, as Easterlin became famous beyond famous, and the paradox quickly became a social science classic, cited in academic journals and the popular media. As the New York times says, the Easterlin Paradox tapped into a near-spiritual human instinct to believe that money can’t buy happiness. Although there have been attempts to debunk the Easterlin Paradox, I believe the concept applies quite well to Web 2.0 and the information overload it has presented to the current state of the Web.

As one information expert has put it, Web 2.0 is about searching, Web 3.0 will be about finding. Well said. That is exactly the problem about Web 2.0. There are a plethora of excellent free and very useful tools out there - blogs, wikis, RSS feeds, mashups - but at what point does it become too much? Recently, I noticed that my Google Reader has gotten out of hand. I just can't keep up anymore. I skim and I skim and I skim. I'm pulling in a lot of information, but am I really processing it? Am I really happy with the over abundance of rich content of Web 2.0? Not really. Are you?

Tuesday, June 03, 2008

Semantic Web and Librarians At Talis

I've always believed that librarians should and will play a part in the rise of the Semantic Web and Web 3.0. I've gone into the theory and conceptual components, but really haven't discussed too much about the practical elements of how librarians will realize this. Meet Talis. Besides its contribution to the blogosphere, Talis has recently dipped into publishing with its inaugural issue of Nodalities: The Magazine of the Semantic Web. It's a wonderful read - take a look.

How did Talis come about? It's been in the works for quite a while now, and it's worth noting how it came to be. In 1969 a number of libraries founded a small co-operative project, based in Birmingham to provide services that would help the libraries become more efficient. The project was known as the Birmingham Libraries Cooperative Mechanisation Project, or BLCMP. At this time the concept of automation was so new that the term mechanisation was often used in its place.

BLCMP built a co-operative catalogue of bibliographic data at the start of its work, a database that now contains many millions of records. BLCMP moved into using microfiche and later IBM mainframes with dedicated terminals at libraries in the mid-seventies and was one of the first library automation vendors to provide a GUI on top of Microsoft Windows to provide a better interface for end-users. The Integrated Library System was first called Talis. Talis became the name of the company during re-structuring and the ILS became known as Alto. In 1995 Talis was the first library systems vendor to produce a web enabled public access catalogue. Much of Talis' work now focusses on the transition of information to the web, specifically the Semantic Web and Talis have lead much of the debate about how Web 2.0 attitudes affect traditional libraries.

How does this include librarians? This ambitious Birmingham-based software company began life in the 1970s as a university spin-off. For many years it was a co-operative owned by its customers (a network of libraries), but in 1996 it was restructured as a commercial entity. It has a well-established pedigree of supplying large-scale information management systems to the public in the UK and academic libraries: in fact, more than 60% of UK public libraries now use the company's software, which benefits some 9m library users. In 2002, the company embarked on Talis 2.0, a change programme to take advantage of "the next wave of technology" (Web 2.0 and the semantic web). In the year ending March 2004, turnover was £7.5m with profits of £226,000. Who says librarians can't make a buck, right?

Saturday, May 31, 2008

Introducing WebAppeal

There are some good Web 2.0 applications and websites. Then there is WebAppeal. The web service is based on the principle of 'Software as a Service' (SaaS), which is rapidly gaining popularity. The uprise of innovative online applications makes traditional and expensive software unnecessary. Examples of successful web applications are video service YouTube and free music service Last.fm. To bring some structure and insight into these ever-growing technologies, http://www.appappeal.com/ informs consumers as comprehensively as possible about all the possibilities SaaS web applications have to offer.

Although we're in the age of Web 2.0, one of the main challenges remains information overload. Too much information does not necessarily mean knowledge. That's why I find AppAppeal to be a convincing website which provides insightful reviews of applications and indexes them according to utility. On this website, all applications are organized in categories such as "Blogging", "Personal Finance" and "Wiki Hosting". The website is still being developed. Soon, tools will be added to create an interactive community around web-based applications.

There are already Web 2.0 review sites such as Mashable, All Things Web 2.0, or Bob Stumpel's Everything 2.0. But WebAppeal goes one step further. It analyzes the advantages and disadvantages of particular applications, providing demo videos. I really like this website. It's a good complement to a project that Rex Turgano and I are collaborating on: Library Development Camp, which not only reviews Web 2.0 applications, but offers trial accounts for users to try out different applications. Together we make a great punch. Stay tuned. More to come. . .

Thursday, May 29, 2008

Day 4 of TEI/XML Bootcamp

Day 4 has come and gone. What did I learn? XML is not easy. Programming is even tough business, not for the faint of heart or mind. The main challenge that I had, and made my head spin, was learning the complexities behind XHTML and XSLT. A powerful tool for the construction of the Semantic Web is XHTML. Most people are acquainted with the "meta" tags which can be used to embed metadata about the document as a whole. Yet there are more powerful, granular techniques available too. Although largely unused by web authors, XHTML and XSLT offer numerous facilities for introducing semantic hints into markup to allow machines to infer more about the web page content than just the text. These tools include the "class" attribute, used most often with CSS stylesheets. A strict application of these can allow data to be extracted by a machine from a document intended for human consumption.

Although there have been several proposals for embedding RDF inside HTML pages, the technique of using XSLT transformations has a much broader appeal. Because not everyone is keen to learn RDF, and it thus presents a barrier to the creation of semantically rich web pages. Using XSLT provides a way for web developers to add semantic information with minimal extra effort. Dan Connolly of the W3C has conducted quite a number of experiments in this area, including HyperRDF, which extracts RDF statements from suitably marked-up XHTML pages. What can librarians do?
The Resource Description and Access is just around the corner. And there is much buzz (good and bad) that it's going to change the way librarians and catalogers think about information science and librarianship. I encourage information professionals to be aware of the changes to come. Although most are not going to be involved directly with the Semantic Web, they can keep abreast of developments, particularly exciting developments in information organization and classification. Workshops and presentations about the RDA are out in droves. Pay attention. Stay tuned. There could relevancy in these new developments that spill into the SemWeb.

Tuesday, May 27, 2008

The Digital Humanities

I am Day 2 of the Digital Humanities Summer Institute. Prior to this workshop, I had no inkling of what was digital humanities. Not anymore. The Digital Humanities, also known as Humanities Computing, is a field of study, research, teaching, and invention concerned with the intersection of computing and the disciplines of the humanities. It is methodological by nature and interdisciplinary in scope. It involves investigation, analysis, synthesis and presentation of knowledge using computational media. provides an environment ideal to discuss, to learn about, and to advance skills in new computing technologies influencing the work of those in the Arts, Humanities and Library communities.

I'm currently taking Text Encoding Fundamentals and their Application at the University of Victoria from May 26–30, 2008, taught by Julia Flanders and Syd Bauman experts in using the Text Encoding Initiative (TEI) an XML language which collectively develops and maintains a standard for the representation of texts in digital form in order to specify encoding methods for machine-readable texts. And it has been a blast. This has been the seventh year of its existence, and already it has gained the attention of academics and librarians across the world.

The DHSI takes place across a week of intensive coursework, seminar participation, and lectures. It brings together faculty, staff, and graduate student theorists, experimentalists, technologists, and administrators from different areas of the Arts, Humanities, Library and Archives communities and beyond to share ideas and methods, and to develop expertise in applying advanced technologies to activities that impact teaching, research, dissemination and preservation. What have I learned so far? Lots. But most of all, just how much XML plays in the Semantic Web. But more on that in the next posting . . . stay tuned.

Friday, May 23, 2008

One Million Dollar Semantics Challenge and API

The SemanticHacker $1Million Innovators’ Challenge and new open API for Semantic Discovery has recently launched by TextWise, LLC. The Challenge enables developers to showcase the power of TextWise’s patented Semantic Signature® technology and accelerate developing breakthrough applications.

The Challenge provides incentives to encourage creation of software prototypes and/or business plans that demonstrate commercial viability in specific industries. Are you up to the Challenge? Go to Semantichacker.com to experience the technology first-hand in our demo and learn more about how to enter the $ 1 million challenge.

But what are Semantic Signatures®? They identify concepts and assign them weights; in order words, they're the ‘DNA’ of documents which in essence become highly effective at describing what the documents are ‘about.’ Semantic Signatures® enable Web publishers and application developers to automatically embed consistent, semantically meaningful tags within their content for use in classification, organization, navigation and search.

In many ways, that's what librarians can offer in terms of information structuring and organization. Interestingly, textwise technology will have a spot at the Semantic Technology Conference in San Jose on May 21, 2008. I won't be able to attend. But if you are, could you give a write-up? I would be forever in your debt.

Thursday, May 22, 2008

Dublin Core is Dead, Long Live MODS

Jeff Beall wrote an article called Dublin Core: An Obituary. In it Beall asserts that the Dublin Core Metadata Initiative is a failed experiment. Instead, MODS is the way to go. And this was back in 2004! What is MODS? The Library of Congress' Network Development and MARC Standards Office, with interested experts, is developing a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. As an XML schema it is intended to be able to carry selected data from existing MARC 21 records as well as to enable the creation of original resource description records.

It includes a subset of MARC fields and uses language-based tags rather than numeric ones, in some cases regrouping elements from the MARC 21 bibliographic format. This schema is currently in draft status and is being referred to as the "Metadata Object Description Schema (MODS)". MODS is expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained by the Network Development and MARC Standards Office of the Library of Congress with input from users.

Here's what MODS can do that the Dublin Core can't:
1. The element set is richer than Dublin Core
2. The element set is more compatible with library data than ONIX
3. The schema is more end user oriented than the full MARCXML schema
4. The element set is simpler than the full MARC format

In my article at the Semantic Report, I argue that the DCMI is potentially relevant to the SemWeb because implementations of Dublin Core use not only XML, but are based on the Resource Description Framework (RDF) standard. The Dublin Core is an all-encompassing project maintained by an international, cross-disciplinary group of professionals from librarianship, computer science, text encoding, the museum community, and other related fields of scholarship and practice. As part of its Metadata Element Set, the Dublin Core implements metadata tags such as title, creator, subject, access rights, and bibliographic citation, using the resource description framework and RDF Schema.

So will the Dublin Core’s role in knowledge management activity representation be significant in the emergence of the SemWeb? So far, MODS hasn't done the job. Even though it has claimed that it can do so. Is this the problem similar to the situation during ancient Chinese period of the Hundred Schools of Thought? Who will win in the end? Or which ones? Perhaps opportunities and possibilities are much higher than narrowly looking for one path for absolute knowledge. So we march on . . .

Tuesday, May 20, 2008

Post-modern business in the Free World - Open Access & Librarians

I came across this interesting article from the Vancouver Sun, Post-modern business model: It's free. Videogame company Nexon has been giving away its online games for free, and making its revenue from selling digital items that gamers use for their characters. Garden says his business is as much about psychology as it is about game design. It’s no good to sell a bunch of cool designer threads to a character who is isolated in a game, because no one will see how good he looks.
Free games can have a dozen different revenue models, from Nexon’s microtransactions to advertising, product placement within a game, power and level upgrades, or downloadable songs. However, on the question of videogames (or any other digital product) being offered to consumers for free. Much of the principles of Nexon is based on Chris Anderson's "free" concept.

“No one says you can’t make money from free." What does this mean for libraries? Especially since much of the mandates and goals of libraries are not to make money? The possibilities are there. A great number of libraries are already dipping into open access initiatives, particularly at a time when database vendors and publishers are charging arms, legs, and first-borns. With Web 2.0 technologies forming an important foundation for digital and virtual outreach opportunities, and the SemWeb on the horizon, I encourage librarians and information professionals to put on their thinking caps and think together in a collaborative environment to break down the silos of information gathering, and move towards information sharing.