Skip navigation and jump directly to page content

 IU Trident Indiana University

Talks and Events Video Gallery

Click on the preview image to play videos.

IEEECluster 2013

David Keyes

David Keyes : To compute, to breathe: computing in the 21st century university

David Keyes, founding dean of the King Abdullah University of Science and Technology's (KAUST) division of Computer, Electrical, and Mathematical Sciences and Engineering, delivers the opening keynote address at the IEEE Cluster 2013 Conference, September 24, 2013 in Indianapolis. 

To see more content from the conference visit:  http://pti.iu.edu/ieeecluster-2013/index.php  Event date: 09/25/2013

Steve Lyness

BoF: HPC Cluster Interconnect Topologies: Is Fat Tree the right Answer

At the IEEE Cluster 2013 Conference hosted by the Indiana University Pervasive Technology Institute, Steve Lyness, VP, Cluster Solutions Engineering at Cray Inc., discusses an overview of cluster configuration variations with FDR Infiniband, Gigabit Ethernet to 40 GigE or with proprietary interconnects like Quadrics and Myrinet and Aries. He discusses the cost increase of components that don’t give us more compute speed but are part of the infrastructure costs associated with a complete end–to-end system. In then end, he discusses buying a new cluster and whether we need to have a full fat tree interconnect topology or if there is something more cost effective.  To see more content from the conference visit:  http://pti.iu.edu/ieeecluster-2013/index.php  Event date: 09/25/2013

Top of Page

Cyberinfrastructure Software Sustainability and Reusability Workshop

Brad Wheeler

Brad Wheeler delivers a welcome address

Brad Wheeler delivers a welcome address at the Cyberinfrastructure Software Sustainability & Reusability Workshop on March 26th, 2009.

Dr. Jennifer Schopf

Jennifer Schopf gives her presentation "Sustainable Software"

Jennifer Schopf gives her presentation "Sustainable Software" at the Cyberinfrastructure Software Sustainability & Reusability Workshop on March 26th, 2009.

Dennis Gannon

Dennis Gannon gives his presentation "Software Sustainability"

Dennis Gannon gives his presentation "Software Sustainability" at the Cyberinfrastructure Software Sustainability & Reusability Workshop on March 26th, 2009.

Clifford Lynch

Clifford Lynch gives his presentation "Software and the Long Haul"

Clifford Lynch gives his presentation "Software and the Long Haul" at the Cyberinfrastructure Software Sustainability & Reusability Workshop March 27th, 2009.

Reports Session

Reports from the Breakout Sessions

Reports from the Breakout Sessions of the Cyberinfrastructure Software Sustainability & Reusability Workshop March 26th, 2009.

Neil Chue Hong

Neil Chue Hong gives his presentation "Cultivating Sustainable Software for Research"

Neil Chue Hong gives his presentation "Cultivating Sustainable Software for Research" at the Cyberinfrastructure Software Sustainability & Reusability Workshop on March 27th, 2009.

Brad Wheeler

Brad Wheeler gives a presentation of "An Industry View of the Sustainability Challenge"

"An Industry View of the Sustainability Challenge" - Dr. Brad Wheeler, Vice President for IT, Dean, and Professor Indiana University

Sustainability & Reusability Workshop on March 26th, 2009.

Top of Page

Big Data for Science

Kim - Sequencing Data

Cancer epigenomics using the next generation sequencing data

Sun Kim gives a talk about cancer epigenomics using the next generation sequencing data at the Big Data for Science workshop held at the Pervasive Technology Institute, Indiana University.

This event was put on by PTI's Digital Science Center July 26th - July 30th, 2010.

For slides from this video or more information about the Big Data for Science Workshop, click http://salsahpc.indiana.edu/tutorial/

Judy Qiu - Cloud Computing

Cloud Computing Platforms

Judy Qiu gives a talk about cloud computing platforms at the Big Data for Science workshop held at the Pervasive Technology Institute, Indiana University.

This event was put on by PTI's Digital Science Center July 26th - July 30th, 2010.

For slides from this video or more information about the Big Data for Science Workshop, click http://salsahpc.indiana.edu/tutorial/

Wernert - Paraview

Scalable and Distributed Visualization using Paraview

Eric Wernert gives a talk about scalable and distributed visualization using Paraview at the Big Data for Science workshop held at the Pervasive Technology Institute, Indiana University.

This event was put on by PTI's Digital Science Center July 26th - July 30th, 2010.

For more information on visualization, visit: http://www.avl.iu.edu

For slides from this video or more information about the Big Data for Science Workshop, click http://salsahpc.indiana.edu/tutorial/

BigData DC

Data Movement & Storage (Data Capacitor WAN Filesystem)

Justin Miller gives a talk about Data Movement & Storage (Data Capacitor WAN Filesystem) at the Big Data for Science workshop held at the Pervasive Technology Institute, Indiana University.

This event was put on by PTI's Digital Science Center July 26th - July 30th, 2010.

For information about IU's Data Capacitor, visit http://rt.uits.iu.edu/systems/hpfs/

For slides from this video or more information about the Big Data for Science Workshop, click here - http://salsahpc.indiana.edu/tutorial/

Stewart - FutureGrid Tutorial

Using FutureGrid

Craig Stewart gives a Tutorial on Using FutureGrid at the Big Data for Science workshop held at the Pervasive Technology Institute, Indiana University.

This event was put on by PTI's Digital Science Center July 26th - July 30th, 2010.

For more information about FutureGrid, click here - http://futuregrid.org/

For slides from this video or more information about the Big Data for Science Workshop, click http://salsahpc.indiana.edu/tutorial/

Fox - FutureGrid Overview

An Overview of FutureGrid

Geoffrey Fox gives an Overview of FutureGrid at the Big Data for Science workshop held at the Pervasive Technology Institute, Indiana University.

This event was put on by PTI's Digital Science Center July 26th - July 30th, 2010.

For more information about FutureGrid, click here - http://futuregrid.org/

For slides from this video or more information about the Big Data for Science Workshop, click http://salsahpc.indiana.edu/tutorial/

Top of Page

HTRC Un-Camp

HTRC Uncamp - John Wilkin

John Wilkin delivers Keynote at HathiTrust Research Center Un-Camp

HTRC Uncamp - Ted Underwood

Ted Underwood speaks at HathiTrust Research Center Un-camp

HTRC Uncamp - Colin Allen

Colin Allen speaks at HathiTrust Research Center Un-camp

Top of Page

D2I Seminar Series

Melanie Wu

Efficient Association Discovery with Keyword-based Constraints on Large Graph Data

In many domains, such as social networks, cheminformatics, bioinformatics, and health informatics, data can be represented naturally in graph model, with nodes being data entries and edges the relationships between them. The graph nature of these data brings opportunities and challenges to data storage and retrieval. In particular, it opens the doors to search problems such as semantic association discovery and semantic search. Our group studied the application requirements in these domains and find that discovering Constrained Acyclic Paths (CAP) is highly in demand, based on such studies, we define the CAP search problem and introduce a set of quantitative metrics for describing keyword-based constraints. In addition, we propose a series of algorithms to efficiently evaluate CAP queries on large scale graph data. In this talk, I will focus on two main aspects of our study: (1) what's CAP query and how to express CAP queries in a structured graph query language; and (2) how to efficiently evaluate CAP queries on large graph data.

This talk was sponsored by the Data to Insight Center

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Devarshi Ghoshal

Understanding the I/O Performance of Virtualized Cloud Environments

As cloud resources are increasingly becoming popular, the scientific community is exploring the suitability of the infrastructure to handle High Performance Computing (HPC) applications. Specifically, cloud environments have been suggested as a potential platform for data-intensive science applications. Prior work has shown that applications with significant communication or I/O tend to perform poorly in virtualized cloud environments however there is a limited understanding of the I/O characteristics of cloud environments. In my talk I will discuss my work in benchmarking the I/O performance over different cloud and HPC platforms to identify the major bottlenecks in existing infrastructure. Additionally, I will present some analysis to classify the types of applications executing on current HPC systems that could operate in cloud environments.

This talk was sponsored by the Data to Insight Center

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Michael Conover

Political Communication on Twitter: Misinformation, Polarization and Partisan Engagement

In this talk we will present results from two lines of research relating to the influence of social media on the political process. We will review the astroturf tracking and detection system 'Truthy' [www.truthy.indiana.edu], as well as recent work on the polarized structure of political communication networks on Twitter. The talk will focus on the implementation, analysis and data mining aspects of these two projects. 

This talk was sponsored by the Data to Insight Center

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Joel Salz

Integrated Biological Informatics

Development of biomarkers that predict response to treatment and models that can direct development of new therapies requires integration of many complementary types of biomedical information captured at multiple scales. In the context of our caBIG® In Silico Brain Tumor Research Center, we are  developing  methodologies, information models, tools, and analytic pipelines that will make it feasible to systematically carry out large-scale integrative  analyses of: 1) whole slide digital pathology and radiology based  features, and 2) deep-sequencing data and patterns of protein and gene expression. The methods and tools will be designed to carry out the following closely interrelated tasks: 1) systematically manage, query and analyze results produced by data analyses composed of large numbers of interrelated algorithms, 2) compare results produced by workflows consisting of cascades of multiple algorithms, 3) efficiently manage result datasets that in aggregate will contain trillions of imaging derived features, 4) engage human neuropathologists and radiologists  in validation of results and motivation of new analyses, and 5) support histological feature query and analysis patterns needed to  link histological features with “omic” and outcome data.

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

David Crandall

Studying the World by Mining Photo-Sharing Websites

The dramatic growth of photo-sharing websites has created immense collections of online images, with Flickr and Facebook alone now hosting over 50 billion images. While users of sites like Flickr are primarily motivated by a desire to share photos with family and friends, collectively (and perhaps unwittingly) they are generating vast repositories of online information about the world. Each of their photos is a visual observation of what a small part of the world looked like at a particular point in time and space. In aggregate, these billions of photos in combination with the non-visual metadata available on photo sharing sites (such as photo timestamps, geo-tags, and captions) present a rich source of information about the state of the world and how it is changing over time. In this talk I'll discuss some of our recent work in data mining and computer vision that aims to unlock this latent information from photo-sharing sites.

This talk was sponsored by the Data to Insight Center.

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Cal Lee

Curation of Digital Information at Multiple Levels of Representation

The literature on digital archives tends to place a great emphasis on the "virtual" (i.e. intangible) nature of electronic resources. However, digital objects are created and perpetuated through physical things (e.g. charged magnetic particles, pulses of light, holes in disks). This materiality brings challenges, because data must be read from specific artifacts, which can become damaged or obsolete. The materiality of digital objects also brings unprecedented opportunities for description, interpretation and use.  There is a substantial body of information within the underlying data structures of computer systems that can often be discovered or recovered.  Because of the possibility of interacting with digital information at different levels, there is no single, canonical representation of digital data. To ensure integrity and future use, archivists and other information professionals must make decisions regarding treatment of materials at multiple levels of representation. In this talk, I will report on several projects that involve treatment of data - both computational methods and decision making processes - at multiple levels of representation.

This talk was sponsored by the Data to Insight Center.

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Tevfik Kosar

End-to-End Data-Flow Parallelism for Throughput Optimization

Applications in all areas of science are becoming increasingly complex and more demanding in terms of their computational and data requirements. Some experiments and simulations already generate data volumes exceeding several petabytes. Sharing, disseminating, and analyzing these large data sets has become a big challenge despite the petascale computing systems and multigigabit optical networks. Majority of the users fail to obtain even a fraction of the theoretical speeds promised by these high-bandwidth networks due to issues such as sub-optimal protocol tuning, I/O subsystem performance bottleneck on the sending and/or receiving ends, and data server processor limitations. This implies that having high speed networks in place is important but not sufficient. Being able to effectively use these high speed networks is becoming more and more important to achieve high performance computing in a widely distributed setting. In this talk, I will present an application-level end-to-end Throughput Estimation and Optimization Service (TEOS) that we have been developing. TEOS services are based on the implementation of the novel ``parallel end-to-end data-flow" models and algorithms being developed by our group. TEOS services will be made available to be used for immediate optimization of the end-to-end performance on existing multi-gigabit networks, as well as being ready to be extended to next generation networks and high-end storage systems.

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Kevin Bowyer

Research Frontiers In Iris Biometrics

The texture pattern of the iris is being used as the basis for biometric identification of persons in successful, ongoing applications.  At the same time the level of research activity in the field of iris biometrics is increasing dramatically. This talk will explain how iris biometrics works, and motivate new research themes in iris biometrics that have come about as various elements of conventional wisdom about iris biometrics have been proven wrong.  This talk should be readily accessible to those who are not working in biometrics, and potentially controversial to those who are working in iris biometrics.

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Beth Plale

Provenance Capture of Unmanaged Workflows with Karma

For the digital data created as an outcome of scientific discovery to retain its value over time, the data must undergo some level of curation.  In order for archival of scientific data to be fully realized, however, curation costs must come down. This will be achieved in part through tools that automate metadata and provenance collection.  In this talk I present a logical architecture of a standalone provenance system, and the Karma system that implements it. We focus on the implications of unmanaged workflows particularly on the representation of provenance information. Achieving flexible forms provenance creation has tradeoffs in where the burden of effort lay and in accuracy of the results.   Finally, we discuss an evaluation of the performance of Karma under two capture scenarios and increasing workloads and determine the system to be scalable to a mid-range workload.

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Miao Chen

Semantic Relation Extraction from Socially Generated Tags

Social tagging allows users to contribute their preferred tags and results in a large collection of user-generated tags. On the one hand, we can acquire social tags with user description on objects such as photos, videos, books, etc; on the other hand, the tags are loosely organized without explicit structure. This study is motivated by the problem and aims to connect unlinked tags by mining semantic relations among them. In the experiment we identified tag pairs which are highly correlated with each other and found their exact semantic relation. In order to obtain context of the tag pairs we referred to Google search results, from which semantic relations were extracted for the tag pairs. The results show that our approach achieves a reasonably good rate of accuracy and the derived relations can be used to enrich metadata for social semantics.

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Matt Whitehead

Machine Learning Models

Ensemble machine learning models are often highly accurate on the supervised learning problem of classification. Combining groups of independent models allows for individual specialization and diversification with limited over fitting. The main drawback of using ensembles is the greatly increased computational resource requirements necessary for training. In this talk, Matt explores how training set preprocessing using clustering and singular value decomposition can be used to build accurate ensembles while keeping training times to a minimum. Matt shows results from several domains including sentiment/opinion mining, medical diagnosis, and spam detection.

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Kate Keahy

Cloud Computing

Infrastructure-as-a-Service (IaaS) style cloud computing is emerging as a viable alternative to the acquisition and management of physical resources. But what exactly is cloud computing, how can we leverage it, and what opportunities does it open?

In this talk, Kate Keahy gives an overview of cloud computing and describe Nimbus -- a toolkit that provides an open source, EC2-compatible IaaS implementation as well as user-level tools adapting cloud computing to scientific needs. Keahy describes how application requirements drove the development of various Nimbus capabilities and how they use these capabilities today. Keahy also disscusses her experiences with configuring and running the Science Clouds -- a group of clouds in academic domain available to scientific projects. Finally, she discusses the emerging trends and innovation opportunities in cloud computing.

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Tom Evans

Data Modeling

Interactions between people and the environment are complex and dynamic. Direct relationships between two specific phenomena (e.g., population density and deforestation) are rare, if not nonexistent. More commonly, data from a variety of sources are needed to adequately understand and explain the social and biophysical factors that are a part of human- environment interactions. One method of integrating the various phenomena affecting human-environment relationships is by creating a spatial representation to provide a spatially explicit data modeling environment. A critical aspect of these spatially linked datasets is the decision of what spatial unit of analysis to use to study a specific social-biophysical process. This presentation by Tom Evans discusses these spatial representations and implications for subsequent spatial data analysis.

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Beth Plale

Metadata and Preservation in Geosciences: Issues at Scale

A recent brown bag presentation with computer scientist Dr. Beth Plale, Director of PTI's Data to Insight Center. Plale explores issues of discovery, process, and preservation in relation to large scale data collections.

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Craig Mattocks

LEAD

ILEAD (Linked Environments for Atmospheric Discovery), a service-oriented architecture (SOA) based cyberinfrastructure project in which meteorological analysis tools, forecast models, and data repositories can operate as dynamically adaptive, on-demand, grid-enabled systems, is currently undergoing a transformation to become a persistent, sustained facility upon which the atmospheric sciences community can rely. As part of this effort, LEAD scientists are collaborating with the developers of the WRF (Weather Research and Forecasting model) Portal at NOAA-ESRL-GSD, which is used at the Developmental Test Bed Center (DTC) in the Joint Numerical Test Bed (JNT) at NCAR, to develop enhanced, interoperable capabilities between the two portals. The new fused portal, with its more advanced and intuitive graphical user interface (GUI), will provide unprecedented forecasting capabilities by enabling users of all levels of sophistication and institutional capability to configure and run numerical weather simulations on powerful, shared or local computing resources.

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Top of Page

Adobe Connect Video Talks

Adaptable Metadata

Adaptable and Incremental Metadata Capture in e-Science

Adobe Connect video talk given by Scott Jansen on Adaptable and Incremental Metadata Capture in e-Science

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Adaptable Storage

Addressing Scalability in Distributed Storage

Adobe Connect video talk given by Micah Beck on Addressing Scalability in Distributed Storage

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Adobe Resource Management

An Integrated Resource Management and Scheduling Framework for Production Supercomputers

Adobe Connect video talk given by Wei Tang of the Illinois Institute of Technology on the An Integrated Resource Management and Scheduling Framework for Production Supercomputers

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Adobe GENI

GENI - Global Environment for Network Innovations

Adobe Connect video talk given by Vicraj Thomas on GENI - Global Environment for Network Innovations

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Adobe Storm Predictions

Middleware alternatives for storm surge predictions in Windows Azure

Adobe Connect video talk given by Dr. Beth Plale on Middleware alternatives for storm surge predictions in Windows Azure

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Adobe Modeling Network

Modeling Network of Scientists

Adobe Connect video talk given by Ying Ding on Modeling Network of Scientists

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Adobe Data Transformation

Ontologies for Scientific Data Transformation

Adobe Connect video talk given by Leonardo Salayandia on Ontologies for Scientific Data Transformation

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Adobe Data Transformation

Processing Sliding Window Joins over High-Speed Data Streams

Adobe Connect video talk given by Abhirup Chakraborty on the Processing Sliding Window Joins over High-Speed Data Streams

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Adobe Synthetic Radar Aperture

Synthetic Aperture Radar Interferormetry for 3D Modeling

Adobe Connect video talk given by Maryam Rahnemoonfar on Synthetic Aperture Radar Interferormetry for 3D Modeling

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Adobe Temporal Data Mining

Temporal Data Mining of Scientific Data Provenance

Adobe Connect video talk given by Peng Chen on the Temporal Data Mining of Scientific Data Provenance.

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>

Adobe Illinois Data Initiative

The Illinois Research Data Initiative: gathering requirements

Adobe Connect video talk given by Beth Sandore on The Illinois Research Data Initiative: gathering requirements 

This talk was sponsored by the Data to Insight Center Seminar Series and the School of Informatics and Computing Colloquium

Current D2I Seminar Series Schedule >>

Data to Insight Center Events >>