Genomes to Life Contractor-Grantee Workshop I
Arlington, Virginia, February 9-12, 2003
The GTL enterprise will generate staggering amounts of data and with inherent complexity that dwarfs that of the genome era. GTL Life sciences research will continue to be take place in a distributed, heterogeneous, experimental, and computational environment. To solve the next generation of complex life science problems will require a systems approach that integrates data across various scales and provides interoperability among computational methods and data resources. It will be essential for GTL biologists to guide their research process by leveraging knowledge from multiple research groups and specialized facilities, taking advantage of the results, data and models generated by others. The persistent heterogeneity is likely to include data types, metadata, file formats, computing systems, programming languages, and systems biology models and concepts. To allow the GTL biologist to function in such an environment will require innovative improvements in our approach to organizing data and the integration and interoperation of data resources and the tools that utilize them. The session will focus on lessons learned in the genome era, and on the necessary design elements and requirements for developing data resources and software tools that provide integration of multimodal data and information across time and scale. The discussion will include new methods of data specification and syntactic interoperation such as BioPerl, BioXML, data warehousing and methods for query of multiple heterogeneous data sources, Data Grid approaches to accessing data across distributed environments, and Web-based approaches. We will specifically address barriers to facilitating interoperability and integration, the current state of software technologies and methodologies for interacting with data.