Digging into Data and Digitizing the Social Sciences
The surge of large data sets has changed the way we interpret and used data in research recently.
Digging into data research tools as digital pundits called it has opened new opportunities for persons and systems to analyze massive data to anticipate the future, understand mainstream trends and gain insight from emerging issues to innovate and create new products, knowledge and social platforms.
The introduction of smart personal devices and applications to the global marketplace has equipped and enabled societies to “make sense” of data sets and persons to spread and share knowledge in less than the speed of thought.
In a data and network driven world, the breadth and sophistication of digitization will continue to grow and learning in the social sciences might just get better and weirder in the immediate future.
Just imagine what social science knowledge, learning and public interaction/interfaces would be like in a big data-driven society?
Will cloud-computing and open access enable publics to turbo charge public debates? What types of knowledge could emerge in a social sciences driven by macro and micro network digital platforms? Will knowledge divide and access persist in a data driven society? What is the role of cultural contexts in digging into data? What would social science knowledge be like when historians and human rights advocates, computer scientists and information technology experts collaborate to create new questions and generate new insights in the social sciences?
The security of our personal data and reputation as researchers and scholars are paramount, how could we fine-tune existing computationally based research methods and techniques to secure privacy and safety of our respondents and informants? Are our systems hackable? Is big data safe, ethical, sustainable and achievable? Can we really gain more insights and questions by wading through much more data?
Are big data a boon for researchers from developing countries? How can global South countries and scholars participate and contribute in the challenge of creating new social science knowledge using large data sets when access to big data are limited?
These are some of the questions that computers scientists, social scientists, historians, philosophers, futurists, policy analysts etc. tried to explore at the Digging into Data (DiD) Challenge conference held last October 12 at the Palais de Congres de Montreal, Quebec, Canada.
Organized by the Social Science and Humanities Research Council of Canada, the DiD event, that was held in conjunction with the 2013 World Social Science Forum, explored diverse ways of using large scale data sets and application to generate new analytics and insights.
The conference featured the research results of the 14 DiD funded researches that linked big data to create, sustain and propel new knowledge repositories.
To fortify research innovation around the world, ten international research funding organizations collaborated to sponsor the Digging into Data Challenge.
Many of the past and current DiD sponsored projects were published in notable news magazines such as the New York Times, Nature, Times in Higher Education among others.
The papers presented applied a broad range of analytical tools to dig into digitized books, newspapers web-searches, sensors and cellphone records to analyze and synthesize the insights emerging from large scale data sets.
Their purpose was to create and/or re-create novelties, ideas, hypotheses, mental maps and knowledge frameworks to leverage the social sciences in the digital age.
Of the 14 DiD award recipients, the projects ChartEx, Digging into Metadata, Digging into Human Rights Violations, Digging by Debating, Digging into Social Unrest, Digging into Trading Consequences, Data Mining the 1918 Influenza Pandemic and the Data Driven Project into Western Musical Styles impressed me the most. This is not to say though that the others were less significant or unimpressive but rather that their topics were more relevant to my area of concern and research interest.
Generally, the researches that I mentioned explored new ways of harnessing the power of data to understand how data could shape public opinion, influence policies, social movements, economies and how computational computer capabilities and tools intensifies processes and impact of researchers in the social sciences and humanities.
The ChartEx project for instance aims to build an interactive ‘virtual workbench’ to allow researchers to dig in to the records and study people’s lives in the 12th and 16th century.
Using medieval charters, historians could now extract information about places, people and events in pre-census and birth registry eras. The importance of recovering stories can help researchers create a richer descriptions of places and people in history. It could help historians visualize or perhaps reinterpret the past that would make sense for people in the digital age and the future. (For more http://www.chartex.org/)
The Digging by Debating project on the other hand seeks to implement a multi-scale workbench they called “InterDebates” to dig into millions of digitized books, bibliographic databases of journal articles, and comprehensive reference works written by experts. Starting with 2.6 million volumes of digitized Google Books collection, the project targets to develop new ways of searching and visualizing interaction in the social sciences particularly philosophy and psychology. The purpose is to help the public map the interactions and to analyze the arguments these resources contained. ( Click http://diggingbydebating.org/ )
Another is the “Digging into Human Rights Violations: Anaphora Resolutions and Emergent Witnesses” project. This research aims to develop systems that could help researchers, human rights advocates and courts divulge details and records of human rights violations and reconstruct their stories from fragmented collections of archival records of witnesses reports and reveal patterns of historic disappearances and violence. Key purpose is to develop software that could aid qualitative researchers to analyze human rights violations data. The project performed an extensive literature review of data on human rights violations and political science research methods. Their findings include, particularly in Canada, that human rights violation research lacks analysis from primary data; that most articles examined only publicly available secondary source with a narrow geographic focus. (Check http://digging.gsu.edu/)
For more of the DiD projects, the official website here: http://www.diggingintodata.org/Default.aspx
Challenges and Insights
The researchers shared their research experiences and offered some “remarkable insights” that funders, researchers and future DiD challengers must note. Their take on open access and big data solidified previous analyses and assumptions about digitizing the social sciences.
Their experiences were compelling and I was able to note some of them:
- computational based research could enhance the quality and context of our research questions and help us find new ways to construct more relevant questions. Also it helps us better understand the dynamics and link of large data sets, creativity, questioning and insights to digital humanities and the social sciences.
- Constant dialogue between and within disciplinary perspectives is a must. A shared understanding and horizons shapes our research designs, approaches to development, processing of testing our methods and analysis of outcomes.
- Knowledge dissemination is a must as “data do not travel easily”.
- Conference presentations and journal publication can expand researchers audience and reach.
- Importance of libraries, archives and data repositories are highlighted in digital research.
- Distance is quite a challenge for coordination and handling of collected datasets. Face to face meetings is beneficial and developing a team identity is a critical research component.
- Feedback are extremely important. Distance makes meeting rare and expensive.
- Clear mutual understanding can help resolve difficult interdisciplinary and technical issues
- Use a shared online workspace for easier sharing of document , reports, communication, etc.
- Develop a shared glossary of terms.
- Bid data success supports small data agendas.
- Pay attention to technical infrastructure
The Case for Open Access
John Willinsky, a professor of education and a distinguished innovator at the Stanford School of Graduate Education, keynoted the DiD event. Willinsky passionately advocated open source software, open data and access, use and re-use of data for research to expand the reach and effectiveness of digital scholarship and communication. Willinsky discussed the usefulness of data curation to scholarship, science and education and said that institutions and funding agencies should ensure that data are suitable for use and available for discovery and re-use. There are other subsets of the larger curation process which includes archiving and preservation (UC San Deigo, 2013). Willinsky’s message revolved around topics of data creation, data curation, digging into data to create new and alternative pathways and the case for open access as a public good.
For more discussion on open access, open journal, open software and scholarly publishing read John Willinsky’s paperback The Access Principle or check the Public Knowledge Project at http://pkp.sfu.ca/.
Invisible labor and big data collaborations
The afternoon presentation was keynoted by Sally Wyatt, Chair of the World Social Science Forum and Professor at the Netherlands Royal Academy of the Arts and Sciences. Wyatt’s presentation revolve around changing research contexts and data-based collaboration work.
Focusing on issues of digital scholarship in the humanities and social sciences, Wyatt observed that knowledge creation and dissemination goes beyond the development and use of new computational tools. The invisible cognitive contents and blind spots that influence knowledge codification, creation and communication could change our ways of knowing and perceiving digital data. The costly duplication of re-using data, fraud in the validation of results, conflicting interest, the lack of digital infrastructure, unstable access of remote resources and areas to digital knowledge, legal and ethical complexities, the learning, data gap, distributional constraints and digital waste among others could hamper credibility of digital data and research.
For more about digitization, invisible labor and the virtualization of knowledge check out Sally Wyatt’s forthcoming book Virtual Knowledge: Experimenting in the Humanities and the Social Sciences here http://research-acumen.eu/wp-content/uploads/VirtualKnowledge-MITpress.pdf
Openness and Digitizing the Social Sciences
The key actors and participants adjourned in an open space to discuss further their experiences and insights on how to improve the digital research methods and processes, sustain their projects and deepen interdisciplinary research.