abistuckontheweb - Tumblr blog

abistuckontheweb · 8 years ago

Text

What Does Generosity Have to Do With Linked Open Data?: A Lit Review

The following is a partial lit review of recent conversations surrounding Linked Open Data within the humanities with a focus on what generosity means in the context of the Semantic Web. I’ve experimented with form in this post by lifting key voices out of the text and styling them as block quotes in order to better visualize the range of opinions presented in this review. I have also set up a corresponding (open) annotated bibliography on Zotero which you can find here. The annotations can be found in both the Notes attached to each citation, as well as under Extra so that readers may quickly move through the citations, rather than having to open each Note separately. Where possible, I have processed the text of the essays, blog posts, and articles featured in this annotated bibliography using Voyant Tools and included up to ten tags of the most frequently used words.

The dream of the Semantic Web and the emergence of LOD

Popular buzz surrounding the Semantic Web has been going on since the early 2000s, later epitomized in a TED talk Tim Berners-Lee delivered in 2009. In his talk, Berners-Lee announced to the world that he needed some help revamping the World Wide Web: a world in which web documents and data coexist. What he asked for was a collective push towards making raw data available on the Web, by which he meant machine-readable data. The same year Berners-Lee, Christian Bizer, and Tom Heath published a paper in the International Journal on Semantic Web and Information Systems titled, “Linked Data - The Story so Far.” In it, they describe Linked Data as “a set of best practices for publishing and connecting structured data on the Web” which opens up the possibility of establishing a “global data space.” Where Open Data means data that is freely accessible on the web in non-proprietary form, Linked Open Data (or LOD) at its most basic is hyperlinked data, meaning data that references and connects to other data on the Web. Structurally, LOD is created through the use of URIs (Unique Resource Identifiers), the vocabularies that identify and define relationships between resources (which make up web ontologies), and RDF (Resource Description Framework). If the World Wide Web is made up of documents, then the Semantic Web is a Web made up of data. For a complete technical overview on LOD, I recommend a look at linkeddata.org, the W3C, or the Berners-Lee and co. paper referenced above. There is still much confusion surrounding the relationship between the Semantic Web and Linked Open Data. Where some believe the Semantic Web and LOD to be one and the same, others understand the Semantic Web as made up of Linked (Open) Data -- this review subscribes to the latter. The lack of consensus, however, is interesting and perhaps representational of the spirit of Linked Open Data in that it reflects both its charm and difficulty, that is, the nature of LOD’s conflicting ontologies and unregulated vocabularies.

Tempting the star-collecting achiever in us all, Berners-Lee’s Five Stars of Open Data is a LOD deployment scheme which urges users to free their information from documents that rely on proprietary software so that others may access their data. These five stages towards open data are perhaps best represented in the following graph and legend (taken from their posh site and pasted here):

At the heart of Berners-Lee’s five star systems is a desire for people to make available the data they have now and worry about refining the structure of that data later, a point made clear in his talk:

We want unadulterated data. OK, we have to ask for raw data now. And I'm going to ask you to practice that, OK? Can you say "raw"?...Can you say “data”?...Can you say “now”?

Although this approach is effective at getting data out and onto the Web, the question of how many return to refine or clean up their data, let alone work up the five-star ladder, is still up for debate (see amazing article on “metacrap”). Perhaps the most crucial moment in his talk is a reminder that “data is relationships,” where each node is connected to another and that node to another, making up a complex network of relationships. LOD, then, is a social practice that relies on shared labour for the greater good. This spirit of social responsibility fuels the collective work, a philosophy summarized in the concluding remarks of Berners-Lee’s talk:

It's about people doing their bit to produce a little bit, and it all connecting. That's how linked data works. You do your bit. Everybody else does theirs. You may not have lots of data which you have yourself to put on there but you know to demand it.

The structural politics of LOD

LOD is valuable in its ability to publish data that is interoperable and to quickly build up networks of connectivity. In the last ten years, the ecosystem that supports linked and open datasets, more formally known as the LOD cloud, has grown 47.5 times since it was first captured in 2007.

A screenshot of the LOD cloud in 2007 featuring 12 datasets.

A screenshot of the LOD cloud in 2014 featuring a total of 570 datasets (here’s a link to an explorable graph).

Like these LOD cloud graphs emphasize, the structure of RDF itself represents the “data is relationships” philosophy in its subject-predicate-object statements, which describe the relationships between resources within local as well as external datasets. What’s more, LOD supports meaningful, that is context-based, connections between data from a wide range of sources, aided by the easy integration of RDF’s forgiving non-hierarchical structure. In “Zen and the Art of Linked Data,” Dominic Oldman, Martin Doerr, and Stefan Gradmann praise the use of RDF for humanities driven research, writing

Of particular significance to humanists is that semantics can be embedded (rather than described separately) within exactly the same structure. This provides far greater potential for integrating vast repositories of data using the standard Web protocol, and provides the foundation for additional technology layers with increasingly sophisticated levels of expressivity. It also provides the type of flexibility that researchers require to quickly incorporate new information and data structures that are necessary as their research progresses, and creates the opportunity for consistent forms of knowledge representation for all research activities.

In other words, RDF serves as a kind of common language in the world of Linked Data with which to establish semantic connections across the Web. This history of a shared interest in knowledge representation is charted in James Smith’s chapter on “Working with the Semantic Web,” in which he explains

The Semantic Web and linked data are computational applications of pre-existing scholarly practices: linking to primary and secondary sources, signalling trusted vocabularies and authorities, and positioning a work in a larger conversation.

In other words, humanities scholars are uniquely qualified to participate in the creation of the Semantic Web in that the standards of Linked Data mirror the methods and practices we employ in our own scholarly writing. Beyond how we create content, John Unsworth points out the need for increased humanist inquiry in the field of LOD, writing

In some form, the semantic web is our future, and it will require formal representations of the human record. Those representations – ontologies, schemas, knowledge representations, call them what you will – should be produced by people trained in the humanities.

For Unsworth, the creation of “formal representations of the human record” need humanities-authored ontologies with a particular focus on their expertise in the mechanics of knowledge production and representation. Though a still emerging field, Alan Liu reminds us that the task of the digital humanities now is to bring the values of the humanities back into computation and consider “how the digital humanities advances, channels, or resists today’s great postindustrial, neoliberal, corporate, and global flows of information-cum-capital” as a way of addressing the lack of cultural criticism that “blocks the digital humanities from becoming a full partner of the humanities.” Digital humanists, in other words, need to get into the habit of thinking critically about their metadata, about the web applications and tools they use to conduct their research, and about the culturally-bound infrastructures that support those technologies. As Tara McPherson reminds in her essay “Why are the Digital Humanities so White”, as much as computation responds to culture, “we must remember that computers are themselves encoders of culture.” With this history in mind, McPherson (and others like Amy Earhart, Lisa Nakamura, Moya Bailey, and Kim Gallon) urge for attention to be paid to the white epistemologies that underlie the structures of our digital world, writing

We need to privilege systemic modes of thinking that can understand relation and honor complexity, even while valuing precision and specificity. We need nimbler ways of linking the network and the node and digital form and content, and we need to understand that categories like race profoundly shape both form and content. (McPherson)

There are a handful of projects (Linked Modernisms, Linked Jazz, InPho and Huviz (in beta), to name a few) that have begun some this work -- but there is much work that lay ahead.

Corinna Bath takes the task of modelling the future of the Semantic Web as one that must rely on feminist ethics. She draws on the work Donna Haraway and Karen Barad’s concept of diffraction as a way of facing the challenges automatic reasoning pose in an environment that supports competing ontologies within the LOD cloud (3). Pointing to Barad’s term “onto-epistom-ology,” Bath calls for more attention to be paid to the misleading division between ontology and epistemology when creating LOD, especially when conceptualizing ontologies as representational of the “real world”(4). This call for more attention to be paid to feminist ethics as sources of knowledge modelling is echoed in the works of Anita Gurumurthy and Nandini Cham in “Data: the new four-letter word for feminism”. In their article, Gurumurthy and Cham argue the importance of reclaiming data from hegemonic rule, writing

Assuming that data can indeed enable a powerful reconstruction of reality, the process by which it constitutes knowledge for transformative change must be based in deeper ethical-political debates. Unhinged from the complexity of ethics and politics, a world of data – as we are witness to – can end up as an absolutism that endangers the very essence of democracy as feminism would know it.

What’s at stake, then, is a world of data without critical thinking -- a world in which the processes by which data is generated, contained, and accessed are left unchallenged. Jeni Tennison expresses similar anxieties surrounding the social processes that govern the production and dissemination of information on the web, asking

Is it the case that opening data simply increases the gap between the information haves and have-nots, and that leads to wider economic inequality, or does everyone benefit when information is more widely available? Are there tipping points of availability at which we start realising the benefits of open data? What is the role of government in encouraging data to be more widely available and more widely used? To what extent should government invest in data infrastructure as a public good? How can local or specialist cooperatives pool resources to maintain data?

Others like Ingrid Mason remain weary of the standards (or lack thereof) surrounding the representation of people in data. Put simply, “People matter and representing “people” in data and turning that into linked open data is no small feat.” For Mason, one way of tackling the complexity of representing identity on the Web and avoiding harmful representations of people in LOD -- harmful in the sense of placing people in categories that overlook the discursive categories of gender and race -- is through collaboration. The organization of post-Summit meetings (ie. Linked Open Data in Libraries, Archives, and Museums Summit 2017) is one small step towards addressing the challenges surrounding the treatment of data about people, but crucial nonetheless.

More than a feeling: cultural challenges, social responsibility, & LOD

If we are to promote broader engagement with LOD and widen the field to include the humanities as full partners, formal standards must be established when it comes to how we publish Linked Data on the Web (ie. context, provenance, and data integration). Despite the incredible growth of LOD within digital humanities and cultural heritage sectors (#LODLAM), the recycling of data, however, what Michele Barbera calls “creative reuse,” has been limited despite the recent technological advances that make it possible (91). What this suggests, Barbera argues, is a need to shift social and cultural habits of digital scholars from humanities and cultural heritage backgrounds. The discomfort around sharing content and collaborating online is a feeling that continues to persist in the humanities. Where collaborative scholarship may be business as usual in the sciences, the humanities still have much work to do in establishing a culture that not only supports but encourages collaborative work. Digital collaboration -- indeed collaboration of any kind -- will likely always require an initial leap of faith. When done right, however, this kind of work, this effort to make oneself open to the possibilities of working with others, exchanging best practices, and sharing the burden of research and writing (while celebrating the pleasures too) proves powerful and worthwhile. For recent work on collaborative scholarship online see Susan Brown’s “Towards Best Practices in Collaborative Online Knowledge Production” and Natalia Mehlman Petrzela and Sarah Manekin’s “The Accountability Partnership: Writing and Surviving in the Digital Age.

To return to Barbera, beyond the discomforts of sharing content online, cultural heritage and digital humanities researchers continue to remain caught in so-called “two-dimensional paper thinking,” that is, reproducing print technologies on the web rather than designing projects that derive from and are built for the Semantic Web (96). We cannot continue to rebuild old models with new technologies, we must, as Berners-Lee urges, encourage “thinking in the graph.” Likewise, technological innovation in the field of LOD cannot flourish if the shifting cultural demands of the Semantic Web are not first addressed. One way of bringing about the kind of cultural change required to support a rich and diverse linked open data economy, I propose, begins with what Kathleen Fitzpatrick calls “generous thinking.”

Generous scholarship: towards critical cyberinfrastructures

Fitzpatrick's work (her excellent blog can be read here) is known for advocating for scholarship that is open to displaying works-in progress and honest about mistakes made along the way -- including the countless drafts (or version, if you like) a project goes through before “completion.” Her latest project focuses on “the possibilities that might open up for scholars not just in doing more of their work in public but in doing more of that work in conversation with the public.” Drawing on the recent critiques of criticism by Bruno Latour and Rita Felski, “generous thinking” is offered as a way to encourage better practices of communication within the academy. In its most basic form, generous thinking roots the humanities in a practice of generosity, meaning, “the practices of thinking with rather than reflexively against both the people and the materials with which we work” while fostering “more productive relationships and conversations not just among scholars but between scholars and the surrounding community” (Fitzpatrick). For Fitzpatrick, now is as good a time as any to tackle our institutional problems:

We have the opportunity, if we take that care seriously, to create a kind of dialogue that might help further rather than stymie the work we want to do — and that might not simply improve the standing of the humanities in the popular imagination, but dramatically transform the relationship between the university and the broader public.

This philosophy of academic life is compelling in its emphasis on cultivating small moments that affect great change, including: “a greater disposition toward listening, toward patience, toward engaging with what is actually in front of us rather than continually pressing forward to where we want to go” (Fitzpatrick). When faced with the question of what the humanities offer universities and the general public, Fitzpatrick points to the many possibilities we open up when we think generously. For her, “generosity of mind” encourages genuine dialogue that builds rather than stifles a work, an attitude that places value on the importance of listening for the sake of understanding rather than a means to an end (Fitzpatrick). This is the difference between paying attention to your colleague during their talk instead of focusing on what you’re going to say during Q&A (guilty). At the core of Fitzpatrick’s model is a desire to learn and build better together, to work collectively with a reminder to pay attention to fellow collaborators, to honour the subjects we study, and to “encounter the other in all its irreducible otherness.” It’s about trying to slow down the demands of the academy and focus on true engagement, whether it’s with perspectives that are not our own or making time to revisit that project you keep putting off. It’s about hard work, yes, but of a different kind: work that cultivates the ability “to listen — to the text, to our communities, to ourselves — without attaching or rejecting” (Fitzpatrick).

Other voices have been pushing for generosity too. Mitchell Whitelaw takes up the “ethos [of] generosity” in his work on “generous interfaces,” writing:

The qualities of generosity I am interested in here are “to be liberal in giving or sharing”; also to be “large, abundant, ample” . Both of these qualities seem well aligned with the aims and missions of cultural collections. Our digital collections are certainly large, abundant and ample; and the charters of our cultural institutions place a high value on sharing these riches liberally with the public. Generosity seems to be very much in line with the aims of our cultural collections. (2)

Within this context, generosity in interface design means presenting the user with the richness of a collection and empowering them to explore its contents in ways that are both intuitive and delightful. Arguing for a different kind of generosity, Miriam Posner voices her concerns regarding how data is conceptualized within the digital humanities, noting, “most of the data and data models we have inherited deal with structures of power, like gender and race, with a crudeness that would never pass muster in a peer-reviewed humanities publication.” Returning to Fitzpatrick’s definition of “generosity,” the bulk of digital humanities work has been rather ungenerous, that is, not paying attention to the white epistemologies that continue to inform the ways in which concepts like race and gender are treated in our datasets and represented on the Web. To borrow again from Posner, we must

. . . stop acting as though the data models for identity are containers to be filled in order to produce meaning and recognize instead that these structures themselves constitute data. That is where the work of DH should begin. . . [we need to be] more ambitious, to hold ourselves to much higher standards when we are claiming to develop data-based work that depicts people’s lives.

She goes on to challenge criticism that would paint calls for more engagement with race and gender theory as “a kind of philanthropic activity.” Generous thinking too can be read can be read in a similar light -- but this is of course nonsense. Rather than scoff at attempts to rally efforts and challenge systems of oppression in all its shapes and forms, Posner reminds us that

DH needs scholarly expertise in critical race theory, feminist and queer theory, and other interrogations of structures of power in order to develop models of the world that have any relevance to people’s lived experience. Truly, it is the most complicated, challenging computing problem I can imagine, and DH hasn’t even begun yet to take it on.

What does generosity have to do with LOD?

I’d like to end with some thoughts on “this most complicated, challenging computing problem” (Posner) and imagine a Semantic Web made up of generous Linked Open Data. If the voices I’ve gathered here in this review have demonstrated anything, it's that the academic community needs to reclaim its sense of responsibility by conducting research that builds rather than fragments, remaining ever conscious of the needs of the communities they serve, and creating a kind of digital legacy worth investing in. In this sense, the hard work that lays ahead for humanities-driven LOD has more to do with Fitzpatrick and Whitehall’s radical application of generosity than it does with technological innovation. Where “generosity” means generous as in linked (an abundance of meaningful connections to external resources), generous as in open (free for others to access, reuse, or build on), and generous as in thoughtfully managed data (with attention paid to how data is categorized, represented, and made explorable).

Major Works Cited

Barbera, Michele. “Linked (Open) Data at Web Scale: Research, Social and Engineering Challenges in the Digital Humanities.” JLIS 4.1 (2013): 91-101.

Bath, Corinna. “Towards a Feminist Ethics of Knowledge Modeling for the Future Web 3.0.” 10th IAS-STS Annual Conference. May 2011. Graz, Autria. Abstract.

Berners-Lee, Tim. "The Next Web." TED. Feb. 2009. Lecture.

Bizer, Christian, Tom Heath, and Tim Berners-Lee. "Linked data-the story so far." Semantic Services, Interoperability and Web Applications: Emerging Concepts (2009): 205-227.

Fitzpatrick, Kathleen. “Generous Thinking: Introduction.” Planned Obsolescence. Last updated 5 October 2016.

Gurumurthy, Anita, and Nandini Chami. “Data: The New Four-Letter Word for Feminism.” GenderIT: Feminist Reflection on Internet Policies. 31 May 2016.

Mason, Ingrid. “People in Linked Open Data.” Summit2017: Linked Open Data in Libraries Archives and Museums. 29 Nov. 2016.

Oldman, Dominic, Martin Doerr, and Stefan Gradmann. “Zen and the Art of Linked Data.” A New Companion to Digital Humanities. Ed. Susan Schreibman, Ray Siemens, and John Unsworth. John Wiley & Sons, Ltd (2015): 251–273.

Posner, Miriam. “What’s Next: The Radical, Unrealized Potential of Digital Humanities.” Debates in the Digital Humanities 2016.

Smith, James. “Working with the Semantic Web.” In Compton, Lane, and Siemens (eds.) Doing Digital Humanities. Routledge, 2016.

Tennison, Jenni. “Agent-Based Model of the Information Economy: Initial Thoughts.” Jeni’s Musings. 9 Feb. 2016.

Whitelaw, Mitchell. “Generous Interfaces for Digital Cultural Collections.” Digital Humanities Quarterly 9.1 (2015).

Photo credit: Milada Vigerova via Unsplash

0 notes

abistuckontheweb · 9 years ago

Text

(Towards) A Very Merry LOD Season

This post is a product of having swapped readings with a fellow Linked Open Data classmate who comes from an Art History background. I was very glad for the chance to approach this topic from another angle. She recommended Matthew Lincoln’s “The Art Historian’s Macroscope: Museum Data and the Academy,” a blog post based on a talk he gave last year in May at the Cultural Programs of the National Academy of Sciences in Washington D.C. The following is a quick overview and some thoughts on collaborating with(in) academic institutions.

Although admittedly weary of the term “big data,” Matthew Lincoln applies Shawn Graham, Ian Milligan, and Scott Weingart’s The Historian’s Macroscope to the field of art history in his talk “Museum Data and the Academy.” He takes their tools and methodologies surrounding data-driven analysis “in concert with traditional historiographical methods” in art history, like the microscopic “close looking” (Lincoln). Lincoln notes that although the concept of data analysis is not necessarily “new” to art historians (see Roger de Pile’s work in 1708 on quantified style), never before has there been such an abundance and access to art historical data.

This ever growing collection of data produced by an “increasingly digitized museum world” (Lincoln) allows for print historians like Lincoln to take on the sheer number of existing Dutch engravings and etchings. Beyond access, the question of how to “honour the specificity of individual artists and artworks” (Lincoln) becomes central for Lincoln. In other words, how can we pay continued attention to the macro-trends of history (what literary students know as “distant reading”), the complex web of networks these artists worked within (close-ish reading), as well as the particular lives and works of key figures (close reading). Perhaps this is where Linked Open Data catches the eye of humanities scholars — what if LOD could help solve this problem of scale?

Lincoln pauses to remind the reader that “museums are repositories of artwork, yes, but also of repositories of knowledge structured data” -- not a far cry from Tim Sherratt’s “We have data too!” — a compelling point for art historians who are working in a moment where “more than a century of curatorial work describing collections’ history is finally starting to make it into publicly-accessible databases.” Lincoln’s own work draws on the British Museums Semantic Web Collection project that pulls from the CIDOC Conceptual Reference Model.* Access to these kinds of records is particularly exciting for print historians who know that a print is often the product of many hands (designers, etchers, inkers, printers, painters, publishers): with linked data, they can chart professional relationships within printing communities. For Lincoln, this opens up the possibility of asking new questions like, “did different regional networks experience their own patterns of centralization and decentralization?” Which central figures are remembered in our art history and which have been overlooked or forgotten? And how can we map printers hubs or chart the circulation of prints geographically?

His talk ends with two questions, what can museums do in light of these new and exciting art history projects made possible by linked data? And how can universities help support and contribute to this burgeoning field?

Lincoln has a few things in mind, points I’ve reimagined here in the form of a LOD holiday wish list!

🌲 Museums need to expose the curatorial knowledge stowed in their content management systems and work to structure and clean that data. This echoes Tim Berners Lee’s five stars of open data system, which urges people to first get their data out and then work to refine its structure.

🌲 Museums should not solely support digital database development (focused on user-facing tools that allow users to easily sift through data on web and mobile devices) but also work towards “bulk datasets built for complexity, not just for speed and convenience” (Lincoln).

🌲 Whenever possible, make data interoperable and avoid heavy customization — Lincoln acknowledges that this is perhaps the “hardest goal.”

🌲 Universities must “reimagine how we can describe and permute our knowledge in digital formats” (Lincoln). This will require for art historians to work closely with librarians and information scientists within their institutions as well as reach out to DH scholars, borrowing tools and methodologies.

🌲 Humanities departments must be willing to support macroscopic research and “hypothesis-driven experimentation.” This requires a re-imagining of humanities scholarship that makes room for the possibility of “quasi-scientific testing” (Lincoln) in combination with the kind of interpretive work we’ve been carefully refining for centuries.

🌲 We must recognise that digital humanists share priorities, but their interests also diverge. Lincoln draws on Sheila Brennan’s piece, ‘DH Centered in Museums’ to remind us that “Museums have done DH for a long time, and they have their own priorities” — namely, are collection driven (from exhibition to preservation). DH for Art Historian’s, however, as Lionel’s project would suggest, is research question driven and more interested in locating (and filling) knowledge gaps.

Sadly, these “wishes” are not for some over-cookied, twinkly-eyed fellow to stuff down our chimneys. These calls to action are, of course, something we — digital humanists broadly construed — must work on together if they are to ever be delivered. Although a crucial start, it is not enough to come up with a list for museums and universities to consider. We must engage with the groups that run these institutions directly, establish fruitful relationships where possible, and collaborate where resources (like time) permit. We must continue to resist division (whether departmental, institutional, ivory towered) as we work towards building the kind of infrastructures that support “hypothesis-driven experimentation” within the humanities if the kind of scholarship we produce is to be valued and preserved.

*The British Museum Semantic Web Collection Online was down at the time I was writing this post, taking with it both the digital collection, SPARQL endpoint, and HTML user interface. It is now back up.

Watch this space for a review.

Photo: Maria Mekht Unsplash

0 notes

abistuckontheweb · 9 years ago

Text

“Generous Thinking” and the Future of Data Economies

This week’s response is two-tiered. The first takes up Michele Barbera’s call for “a lively data economy with a rich ecosystem,” one that requires a “profound cultural shift [in the ways] data is produced, managed and disseminated” (91), while the second considers the problems faced in the linked open data (LOD) community and brings them into conversation with Kathleen Fitzpatrick’s concept of “generous thinking.”

In his article “Linked (Open) Data at Web Scale: Research, Social and Engineering Challenges in the Digital Humanities,” Barbera provides an efficient survey of the current technological and cultural landscape of linked data, locating major gaps and spaces for scholarly intervention. He flags three major areas that require attention, streaming linked data, versioning, and the social challenges of a linked data economy.

Streaming linked data

With the ubiquity of mobile devices that are embedded with data-generating sensors (Barbera 93), live streaming data is more possible now than ever. Keeping this new abundance of data in mind, Barbera calls for more commercial attention to be paid to the possibility of linking live streamed data. He fails, however, to address the ethics that surround the collection of this kind of data which, if linked data projects are to be pushed beyond research communities, need to be taken into account as a part of the larger cultural landscape.

Versioning

Beyond the interest of tracking the evolution of RDF graphs for the purposes of generating a history, versioning is the knee-jerk response to digital collaboration — an attempt to keep checks and balances while establishing trust within a community. Barbera, however, does not spend much time discussing the need for versioning protocols and tools in LOD. Instead, he moves on to a more pressing issue within the LOD community, one that requires work beyond “technological innovation” (93).

Social challenges and nurturing a linked data economy

Barbera puts forth the controversial idea that cultural heritage and digital humanities researchers are trained to inherently think two-dimensionally and ultimately find it hard to think “in the graph” (96). He urges this group to resist a “two-dimensional thinking derived from the paper-world,” a world in which the limitations of paper are “mimicked rather than revolutionized in the digital world” (96). For Barbera our minds are influenced by the organizational logics of the tools we employ; within this logic, tabular structures are aligned with two-dimensional “paper-world” thinking and stunt the progress of the linked open data community. Or, how can we build up a robust and dynamic linked open data economy if we are unable to conceive it? And how are we to inhabit new structural logics if we remain shackled by old models? According to Barbera, the time to invest in innovation, especially on a commercial scale, is now. In his concluding remarks, he brings up “monopolistic threats” and the danger they pose to “public good” (98), but does not go much further except to urge for a careful strategy that “protect[s] common knowledge-heritage” and “(linked!) public good” (99).

Earlier in his paper, Barbera reminds us that with the “rapidly growing amount of data available in the linked open data cloud and in enterprise linked data repositories” the existence of a single, centralized computation of all data is simply not possible (92). The necessity of a decentralized system, then, is promising in its ability to foster a shared management of our data economies but does not come without its own share of complications. After all, a decentralized system relies deeply on the ability for online communities to not only relinquish total control, which comes hand-in-hand with collaboration, but to pay attention to one another and build together. This concept of “paying attention” may seem like an obvious point, and yet the protocols practised “out in the wild” serve as a shocking reminder of how little we look outside ourselves and consider the projects of others.

One way of approaching this problem of digital collaboration and “paying attention” is to turn to what Kathleen Fitzpatrick calls “generous thinking,” a concept that urges academic institutions and their agents to “cultivate a greater disposition toward listening, toward patience, toward engaging with what is actually in front of us rather than continually pressing forward to where we want to go” (Fitzpatrick). In this context, generosity is not meant in the sense of “giving” but as “generosity of mind,” a kind of deep listening that goes beyond waiting for an opportunity to speak (Fitzpatrick). At the core of Fitzpatrick’s model is a desire to learn how to engage in genuine dialogue, collaborate, and build better together “not only with our colleagues but with our objects of study, our predecessors, and the many potential publics that surround us.”

There’s no question that humanities disciplines have much to offer the LOD community (see last post). However, before we join the LOD community and potentially lose ourselves in the exciting features and unexpected insights to be gained from linked data, the question of “what exactly do we bring to the table” and “how does LOD help us think through x in our own fields” needs to be addressed if we are to take Barbera’s “paper-thinking” critique seriously. What’s more, we must dig deeper into what exactly it means to, as Tim Berners-Lee says, “think in the graph” and what it would look like to do so collectively. I end with a passage from Fitzpatrick’s post that gestures towards an “openness to possibility” and offers a potential answer to the question of what the humanities can offer:

All of these possibilities that we open up — engaging perspectives other than our own, valuing and evaluating the productions and manifestations of our multiplicitous culture, encountering the other in all its irreducible otherness — are the best of what the humanities offer to the university, and the university to the world, and we must allow them to teach us just as much as we teach others. And all of these possibilities begin with cultivating the ability to think generously, to listen — to the text, to our communities, to ourselves — without attaching or rejecting. (Fitzpatrick)

Works Cited

Barbera, Michele. "Linked (Open) Data at Web Scale: Research, Social and Engineering Challenges in the Digital Humanities." JLIS 4.1 (2013): 91-101.

Fitzpatrick, Kathleen. “Generous Thinking: Introduction.” Planned Obsolescence. Last updated 5 October 2016. <http://www.plannedobsolescence.net/generous-thinking-introduction/#more-2828>.

Photo credit: Anthony DELANOIX via Unsplash

0 notes

abistuckontheweb · 9 years ago

Text

sameAs is not yet closeEnough: on knowledge representation and identity in linked data

I’d like to offer some preliminary thoughts on identity and representation within a linked data context in conversation with Harry Halpin, Ivan Herman, and Patrick Hayes’ short paper, “When owl:sameAs isn’t the Same: An Analysis of Identity Links on the Semantic Web” (2010). Although the paper is now a bit stale, the sameAs issue they outline is one that continues to persist.

In the age of Web 2.0, Linked Open Data (LOD) emerged as a decentralized system for identifying, classifying, and linking information made open on the semantic web. One way of establishing relationships between existing open data on the web is by attributing OWL properties like owl:sameAs. As its name would suggest, sameAs “indicates that two URI references actually refer to the same thing: the individuals have the same ‘identity’” (W3C). However, as Halpin, Herman, and Hayes make very clear, “out in the wild” sameAs is often used as if it were “closeEnough” (2010; 2011). This issue is heightened by the fact that sameAs is one of the most widely used properties, or “(ab)used,” within the linked data community (Halpin et al. 2010, 1). This widespread misuse of owl:sameAs poses a potential threat to linked data when considering the impact of inference in a system that builds by way of referral and interlinking. In their paper, Halpin, Herman, and Hayes (shortened here to “H3”) present four alternative readings of owl:sameAs, concluding with “alternative identity links that rely on named graphs” (Halpin et al. 2010, 1). The four alternative readings of owl:sameAs are Same Thing As But Referentially Opaque, Same Thing As But Different Context, Represents, and Very Similar To, which I’ll quickly recap now,

Same Thing As But Referentially Opaque occurs when two URIs point to the same thing but don’t necessarily share all of the same properties, rendering the reference “opaque” (Halpin et al. 2010, 2). This means that the URI “cannot be substituted for another” as it would violate the Principle of Substitution (Halpin et al. 2010, 2). Same Thing As But Different Context refers to the problem when two URIs refer to the same thing and share the same properties, but cannot be re-used in a different context because those same properties, however true, simply do not matter (Halpin et al. 2010, 2). The main claim here is that “there are ‘forms of reference’ appropriate to a context, especially in social contexts” (Halpin et al. 2010, 2). Represents tries to parse the difference between signifier and signified, working with an “intuitive definition” of “representation” where a URI, like a photograph, represents a thing but is not the “thing itself” (Halpin et al. 2010, 3). Problems arise, according to H3, when identity and representation are conflated, not to be confused with instances of “displaced reference” (Halpin et al. 2010, 3) which acts synecdotally, where a thing, like an email address, represents the identity of an entity, like a person; or, as H3 would define it, where something is referenced “accidentally or contextually to refer to something” (Halpin et al. 2010, 3). Very Similar To makes up most of the so-called “noticeable errors” (Halpin et al. 2007, 3) where two things that are closely related but not exactly the same are labelled as identical. H3 use the example of Paris and the Department of Paris in Cyc, for instance (Halpin et al. 2010, 3).

Although this article provides a useful gloss of the sameAs problem, it struggles with cohesion and at times feels a bit rushed -- especially to a linked data outsider. Where, for example, is the knowledge representation primer? And what about the organizational logic of identity on the world wide web and its oppressive history (McPherson)? My first question is in part addressed in a later iteration of the paper published at the International Semantic Web Conference in 2010. In this second version, the authors return to the “sameAs problem” but spend some time first working through the history of knowledge representation and identity within a semantic web context.

According to H3, “the vexing problem of identity has returned with a vengeance to the Semantic Web” (Halpin et al. 2011, 1). However, the problem of precise labelling is not so much a linked data or semantic web problem as it is a knowledge representation problem. “Leibnitz’s Law” states that if x and y are identical then they must share all of the same properties. By the same logic, if all properties are not shared between x and y, then x and y are not identical. Debates surrounding the gaps in Leibnitz’s logic have raged since its inception, most popularly refuted with the principle of change over time (e.g. Is 5 year old Abi the same person as 25-year-old Abi?). For the first time, however, this problem is being encountered by a surge of people trying to “independently knit their knowledge representations together using the same standardized language” (Halpin et al. 2010, 1). Within this disparate environment, owl:sameAs ends up used in ways that are “mutually incompatible [and] almost always violate the rather strict logical semantics of identity” (Halpin et al. 2010, 1). Although H3 frame this issue as one rooted in precise labelling, it seems more a question of establishing a culture that promotes responsible interlinking and thoughtful digital collaboration.

In light of the systemic racism and misogyny manifest not only in the latest US Elections but ever present in the ways we build, access, and navigate the world wide web (McPherson), the question of responsibility is central if we are to address issues of identity and representation within the semantic web. Although work on improving the sameAs problem for the sake of linked data has already begun, the issue of conflating identity with representation within the linked data community continues to persist. Given the current information landscape, digital humanities and (post-)colonial researchers will need a seat at the Linked Open Data table if we are to succeed in working towards representing the discursive nature of identity on the Semantic Web and labelling practices that are as thoughtful as they are accurate.

Works Cited

Halpin, Harry, Ivan Herman, and Patrick Hayes. “When owl:sameAs isn’t the Same: An Analysis of Identity Links on the Semantic Web.” RDF Next Steps Workshop, June 26-27, 2010. Palo Alto, USA.

Halpin, Harry, et al. "When owl:sameAs isn’t the Same: An Analysis of Identity in Linked Data." The 10th International Semantic Web Conference, October 23-27, 2011. Berlin, Germany.

McPherson, Tara. "Why are the Digital Humanities so white? Or Thinking the Histories of Race and Computation." Debates in the Digital Humanities (2012): 139-160.

n. a. “owl:sameAs.” W3C. Last updated November 2009. <https://www.w3.org/TR/owl-ref/#sameAs-def>.

Photo: Hayley Mills in Parent Trap (1961)

0 notes

abistuckontheweb · 9 years ago

Text

What does Textual Scholarship have in common with the Semantic Web?

A reading of James Smith’s “Working with the Semantic Web” from the newly published collection of essays, Doing Digital Humanities (2016)

Some context: James Smith is a Lead Software Engineer (Kit Check) who also teaches the RDF and Linked Open Data (LOD) course at the Digital Humanities Summer Institute in Victoria (which I’ve had the pleasure of attending this past summer). I came across this chapter on a syllabus designed for a LOD directed reading group I’m involved in and wanted to share a few half-baked observations.

Smith begins his chapter by way of analogy,

The Semantic Web and Linked Data are computational applications of existing scholarly practices: linking to primary and secondary sources, signalling trusted vocabularies and authorities, and positioning a work in a larger conversation. (loc. 6650)1

For many textual scholars, this is a welcomed site: a warm invitation. We know analogy. We understand that analogy works as a powerful narrative tool. And we know when we’re about to be told a good story. Upon arrival, the text signals a comparative framework, a bond Smith continues to return to as he guides readers through what is, for most textual scholars, the strange new world of working not just on, but with, the semantic web. For the purposes of this reading, rather than provide a comprehensive overview I’d like to instead focus on two crucial moves Smith makes in this chapter.

First, Smith reviews the basic mechanics of how textual scholarship works. To do this, he uses the following example: “The new sovereign has achieved self-determination” (loc. 6667). With a little pressure, this sentence cracks under the ambiguity of “sovereign” (which sovereign?) and “self-determination” (what self-determination?), and we, as well trained textual scholars, feel the lack of historical context — of reference. Interestingly, Smith works from an electronic text default, drawing on the function of hyperlinks in digital scholarship before turning to Franco Moretti’s printed chapter in Distant Reading as an example of “intra-textual referencing”, or, what Smith would call “crude hyperlinking” (loc. 6667, 6679). I’ve reproduced Moretti’s excerpt here:

The new sovereign — ab-solutus, united, freed from the ethics-political bonds of the feudal tradition — has achieved what Hegel will call ‘self-determination’: he can decide freely, and thus post himself as the new source of historical movement: as in the Trauspiel, and Gorboduc, and Lear, where everything indeed begins with his decision; as in Racine, or La Vida es Sueño. (qtd. in loc 6679)

Next to the efficiency of hyperlinking, Moretti’s list of references, notes, and notes on references seem wild and dizzying. Necessarily restricted by the technology of print, Moretti “links” to the particular definitions of “sovereignty” he has in mind and inserts a brief description of his take on Hegel’s use of “self-determination.” But why the context overload? Surely there’s such a thing as providing too much context. As Smith is quick to point out, what Moretti is doing with this rudimentary “linking” is ensuring that the reader “doesn’t need to follow the ‘link’” (loc. 6667). With hyperlinks, there’s always a chance that readers will get lost as they go off and explore the contextual crumbs. But consider the print reader who has left a book to go follow a tempting footnote and fetch a referenced text from the library. The print readers’ chances of return are far less likely when compared to electronic readers — or, perhaps more crucially, the chances of setting down a book in order to seek out the referential thread in the first place seems even less feasible. Instead, as Smith points out, the kind of “linking” seen in Moretti’s chapter works to signal to the reader that he “trusts” Hegel’s vocabulary (people who know something of Linked Open Data start grinning here) and conveys a sense of “alignment” between Moretti’s language and Hegel’s, indeed King Lear’s, as Moretti’s writing becomes, to draw on Smith’s language, “informed” by the literature he’s referencing (loc. 66698). Remember, Smith reminds, “As we read a text, we bring all the material we have encountered before” (loc. 66698).

Second, Smith introduces this concept of “at least one.” The “at least one” concept goes as follows: A textual scholar, let’s use Moretti again, mentions a set of literature “with the hope that we will have read at least one” (Smith, loc. 66698). If the mission is to make a connection, what Moretti needs is for us, the reader, to have read one -- just one. At first, the language here seems almost exacerbated (“Have the decency to come to class having read at least one of your readings.” Silence. “O come now, at least one!”). In fact, Smith repeats “hope” and “at least one” twice in one paragraph when referring to this desire to connect over a shared reference. Like computers, a human reader scans the information, eyes moving swiftly across familiar words, logs the connections away, and moves on. If nothing looks familiar, however, the reader stalls (perhaps over a wave of curiosity, or, less preferably, renewed anxiety). Machines don’t waste their time feeling anxious: if the information doesn’t look familiar, they give up. This shared reference becomes central to Smith’s guide to working on the semantic web, building on his connection to scholarly reading: “It is critical that the scholar read far and wide in their career: the greater the shared background, the more efficient the communication” (loc. 66698).

Scaling back from the macroscopic fantasy of “wide” reading, Smith returns to the bread and butter of textual scholars: close reading. This return is only to strengthen the natural tie he has been asserting this entire chapter, one between textual and computer science scholars. “The act of making as many connections as possible between the text and what we know,” Smith writes, “ is the essence of close reading” (loc. 66698). This essential connection between linking and close reading, Smith goes on to explain, is why textual scholars find themselves apart of “one of the defining fields in the digital humanities” (loc. 66698).

The rest of Smith’s chapter walks through the basics of structuring information, representing information, vocabularies, relationships, using linked data, and publishing linked data.2 The bulk of the heavy lifting, however, what I would underline as the driving force of this piece, has already been worked out in the first half-dozen pages. To avoid any ambiguity — ever the responsible computer scientist — Smith’s argument becomes fully articulated near the end of his chapter, under the very appropriate SUMMARY heading:

It is by bringing to our computational work the practices of our scholarly work that we elevate the digital side of digital humanities to be equal with the traditional humanities scholarship practices. (loc. 6944)

Refreshingly, Smith departs from approaches that urge humanities scholars to take on the praxis and language of scientific methodology.3 Instead, Smith asks what textual scholarship can bring to this kind of work with the semantic web and gestures towards a model of scholarship that is strengthened by this process of coming together, one that is necessarily — and, as Smith would argue, inherently — open to collaborative and cross-disciplinary work.4

Footnotes

1 The “loc. xxxxx” identifiers work in lieu of page numbers and refer to places within the Kindle edition of this text. 2 To the curious and anxious students of linked data: keep reading. Smith’s gives an accessible and concise overview on how to transform textual information, what readers will soon call a “dataset,” into published, linked data. Though there are moments where readers who are eager to get their hands dirty are left hanging for further instruction, Smith is quick to provide an abundance of links to projects and resources peppered throughout in the form of footnotes, hyperlinks, as well as the inclusion of a Further Readings section.

3 See John Unsworth’s The Importance of Failure, see Franco Moretti’s Conjectures on World Literature.

4 See Susan Brown and John Simpson’s, along with CWRC Project Team and INKE Research Group’s, An Entity By Any Other Name: Linked Open Data as a Basis for a Decentred, Dynamic Scholarly Publishing Ecology.

Works Cited

Brown, Susan, and John Simpson. "An Entity By Any Other Name: Linked Open Data as a Basis for a Decentered, Dynamic Scholarly Publishing Ecology." Scholarly and Research Communication 6.2 (2015).

Moretti, Franco. "Conjectures on World Literature." New Left Review 1 (2000): 54– 68.

Smith, James. “Working with the Semantic Web.” In Compton, Lane, and Siemens (eds.) Doing Digital Humanities. Routledge, 2016.

Unsworth, John. “The Importance of Failure.” Journal of Electronic Publishing (1997).

Photo credit: michael podger via Unsplash

#digitalhumanities #semantic web #linked open data #textual scholarship

0 notes