Monday, June 01, 2009

The Semantic Way

PricewaterhouseCoopers has just come out with an important document forecasting Semantic Web technologies. While PWC has usually churned out fairly solid business knowledge management-type best practice research, this particular publication is worthy of a close reading. Its feature article in particular, "Spinning a Data Web" offers an indepth and concise look into the technologies behind the SemWeb, one which LIS professionals should take heed, as many of the concepts are relevant to our profession. Why? Here are the main points which I find significantly important for us moving ahead in the race to the Semantic Web.

(1) Linked Data Initiative - In order for the Web to be move from a messy, siloed, and unregulated frontier, the SemWeb will require a standards-based approach, one which data on the Web would become interchangeable formats. By linking data together, one could find and take pieces of data sets from different places, aggregate them, and use them freely and accessibly. Because of this linking of data, the Web won't be limited to just web-based information, but ultimately to the non-Web-based world. To a certain extent, we are already experiencing this with smart technologies. Semantic technologies will help us extend this to the next version of the Web, often ambiguously dubbed Web 3.0.

(2) Resource Description Framework - RDF is key to the SemWeb as it allows for the federation of Web data and standards, one which uses XML to solve a two-dimension relational database world cannot. RDF provides a global and persistent way to link data together. RDF isn't a programming language, but a method (a metahporical "container") for organizing the mass of data on the Web, while paving the way for a fluid exchange of different standards on the Web. In doing so, data is not in cubes or tables; rather, they're in triples - subject-predicate-object combinations that provide for a a multidimensional representation and linking of the Web, connecting nodes in an otherwise disparate silo of networks.

(3) Ontologies and Taxonomies - LIS and cataloguing professionals are familiar with these concepts, as they often form the core of their work. The SemWeb moves from taxonomic to an ontological world. While ontologies describe relationships in an n-dimensional manner, easily allowing information from multiple perspectives, taxonomies are limited to hierarchical relationships. In an RDF environment, ontologies provide a capability that extends the utility of taxonomies. The beauty of ontologies is that it can be linked to another ontology to take advantage of its data in conjunction with your own. Because of this linkability, taxonomies are clearly limited as they are more classification schemes that primarily describe part-whole relationships between terms. Ontologies are the organizing, sense-making complement to graphs and metadata, and mapping among ontologies is how domain-level data become interconnected over the data Web.

(4) SPARQL and SQL - It overcomes the limits of SQL because SPARQL because graphs can receive and be converted into a number of different data formats. In contrast, the rigidness of SQL limits the use of table structures. In constructing a query, one has to have knowledge of the database schema; with the abstraction of SPARQL, this problem is solved as developers can move from one resource to another. As long as data messages in SPARQL reads within RDF, tapping into as many data sources becomes inherently possible. De-siloing data was not possible without huge investment of time and resources; with semantic technologies, anything is possible.

(5) De-siloing the Web - This means is that we would need to give up some degree of control on our own data if we wish to have a global SemWeb. This new iteration of the Web takes the page-to-page relationships of the link document Web and augments them with linked relationships between and among individual data elements. By using ontologies, we can link to data we never included in the data set before, thus really "opening" up the Web as one large global database.