Skip to content

CyVerse Geospatial

Jeffrey K Gillan edited this page Jun 17, 2024 · 59 revisions


Housed at the University of Arizona, CyVerse is a one-of-a-kind cloud computing system for the academic and research communities. It's mission is to design, deploy and expand a national cyberinfrastructure for scientific research and to train scientists how to use it. CyVerse is an excellent platform to make your geospatial research open and reproducible!

Cyverse (originally called Iplant) has been in existence for 16 years; has spent $120M in research funds; has 135,000 registered users; and has facilitated 1,700 peer-reviewed publications across many scientific fields such as plant genetics, genomics, astronomy, geosciences, health, and agriculture.

Cyverse is completely Free for University of Arizona students, staff, and faculty.




Data Storage and Sharing

Cyverse Data Store is the ideal cloud storage to host your large geospatial datasets, share data with colleagues, and meet publication/grant archival requirements.

Utilizing cloud storage infrastructure eliminates the need for researchers to maintain their own servers (and APIs) and allows them to focus more on their mission. It empowers individuals to easily share their data on the web while eliminating costly local storage. Cloud storage is also a great solution for never losing your data due to hardware failure.

  • Cyverse Data Store is object cloud storage similar to Azure Blob, or Amazon S3
  • Pro account has a 3TB limit
  • Data I/O is accessible through website and multiple command line tools
  • Share your data with your colleagues and world with a URL
  • Data can be public/private, shared with anyone, set permission levels



Share Your Cloud Native Geospatial Formats

Sharing of geospatial data from cloud storage can be greatly improved with the use of Cloud Native Formats. These formats are designed to be used in the cloud and are built for http streaming. This means that users can view and analyze data without downloading the entire dataset. Analagously, this is like going from the original Napster model of downloading music to the Spotify model of streaming music.

There is a cloud native format to fit almost any geospatial data type. For example, FlatGeoBuf is a cloud native format for vector data, Cloud Optimized GeoTIFF (COG) is a cloud native format for raster data, and Cloud Optimized Point Cloud (COPC) is a cloud native format for point cloud data. Zarr is cloud native format that can be used for multi-dimensional raster data.



The above image shows an example of Cloud-optimized Point Cloud streamed from Cyverse Data Store into a web application.







Permanent Archival

Cyverse Data Commons is our public facing data storage interface. It enables you to share data with people outside of Cyverse.

Community Released: A folder of data you want to share with colleagues, stakeholders, or the entire world. You control read/write/own permissions and make it public or private.

Curated: Data that is tied to a peer-reviewed publication and needs permanent archival. You can apply many types of metadata templates your data as well as receive a permanent DOI.




CyVerse Discovery Environment

The Cyverse Discovery Environment is a place where you can run cloud instances of the most popular and powerful data analysis programs. It is backed by robust computing resources that can help scale your analysis beyond what is possible on your laptop.

  • Launch instances of QGIS, RStudio, Jupyter Notebooks, VSCode,and more
  • Launch existing applications or create and launch your own analysis

  • Get access to a Linux desktop

  • GPU Resources for machine learning and other high-computation data processing

Cloud Computing is all about moving geospatial analysis and computation from your local machine to a remote machine in the cloud. This approach has several advantages over traditional desktop computing.

Advantages of Cloud Computing

  • With cloud computing, you can avoid the upfront cost and complexity of owning and maintaining your own IT infrastructure

  • Cloud computing allows groups or individuals to scale up (or down) their operations quickly as their computing needs change

  • Cloud computing allows users to access their data and applications from anywhere, on any device, at any time

  • Launch and run cloud instances of software applications like Jupyter Notebooks, QGIS, Rstudio, VSCode

  • Large CPU, GPU, and memory resources

  • Ability to run your own containerized software application

Open Science (data sharing, reproducible science)

SpatioTemporal Asset Catalogs

Planet Satellite Imagery

Agisoft Metashape Licenses

Jetstream2

Cyverse is a partner in Jetstream2, which is a national cyberinfrastructure funded by the National Science Foundation. Housed primarily at Indiana University, Jetstream2 is cloud computing a grand scale! Any funded research in the USA could get computing allocation on Jetstream2.



Cacao

Cyverse has developed the platform Cloud Automation & Continuous Analysis Orchestration (CACAO) to make it easier to deploy and provision virtual machines in a cloud environment. Known as Infrastructure-as-code, with a few clicks, a user can generate any size of virtual computing to do their scientific analysis. Cacao is cloud agnostic which means it can run in Jetstream2 but also AWS, AZURE, and Google Cloud.



Resources

Abernathey, R. P. et al. (2021) "Cloud-Native Repositories for Big Scientific Data," in Computing in Science & Engineering, vol. 23, no. 2, pp. 26-35, 1 March-April 2021, https://doi.org/10.1109/MCSE.2021.3059437

Chris Holmes's blog on Cloud Native

Cloud-Native Geospatial Outreach Event - April 2022 - from Open Geospatial Consortium (OGS)

Gentemann, C. L., et al. (2021). “Science Storms the Cloud”. AGU Advances, 2, e2020AV000354. https://doi.org/10.1029/2020AV000354

Mapscaping Podcast on Cloud Native Geospatial