How U-M scientists can focus on science while safely transferring data using Globus
August 16, 2023, U-M data technology experts met with Globus team members to talk about data transfer, a topic of great interest for scientists since nowadays most scientists are data scientists. They need not only to collect, curate and analyze data, but also manage and move data between servers, and often very large quantities of it. Scientists are also responsible for the security of their data requiring safe data management tools.
In a world where science is collaborative, it is crucial to move data securely and effectively between research partners in a distributed environment, within or across organizations. But the super large size of most data sets constitutes one of the many challenges of transferring data between servers. “Moving large amounts of data is plain painful,” said Ken Weiss, IT Project Senior Manager in the U-M Department of Computational Medicine and Bioinformatics (DCMB). "Given the amount of data we are moving these days, it can take days to weeks to months to transfer and you need to be sure it is done as quickly as possible and with reliability."
This is why the University of Michigan (U-M) subscribed to Globus, a research cyber infrastructure developed and operated as a not-for-profit service by the University of Chicago. Globus offers a platform that transfers data quickly, securely and with a tracking system. With Globus, scientists can select a set of data and a destination no matter how large the data set and how far the delivery is. The data is transferred in a highly secure environment.
For example, DCMB recently welcomed Kin Fai Au, Professor of Computational Medicine and Bioinformatics, from Ohio State University. “We had to move over 500 terabytes of data from Au’s lab and with Globus and the infrastructure at and between OSU and U-M, we were able to sustain transfer rates of over 20TB per day,” said Weiss. “Without this service, it would have taken months to move this data over the wire –or we could have loaded a large box of tapes in the back of Dr. Au’s car!”
How does it work?
The platform has been designed with the user in mind, and it takes only a few clicks to initiate a transfer from a web browser interface. Globus makes it happen “in the background.” For example, you have data at Stanford University that you want to move to U-M. You login to Globus with your U-M credentials and state where you want to put the data locally through a Globus collection (an endpoint for accessing your data). You then access your Stanford data through a different Globus collection and enter your Stanford credentials. You select the files and/or folders at Stanford University and then initiate the transfer by clicking the “Start” button. That’s it! Globus brokers the transfer on your behalf and you can logout of Globus, even shut down your computer, while the transfer continues without you nor your computer being involved. When the transfer is completed, you receive an email notification.
A Linux command line version is also available for those who would like to use Globus services from the command line or in shell scripts. Globus keeps upgrading its service and interface, and is currently customizing its platform for U-M users, so our scientists can further focus on science rather than on transfer technology.
If you can use a cell phone, you can use Globus,” said Weiss.
Another benefit of using Globus is its ability to handle interruptions and still complete the transfer. Let’s say there was a network hardware failure or a disk full issue, Globus will try to continue the transfer every 5 minutes for up to 1 week before timing out. Once the network issue is resolved or more space is freed up on the storage, Globus will continue right where it left off –while other transfer protocols would make you start over.
Cost and subscription
There is no cost to the individual U-M scientist for Globus since the university pays the service subscription fee. The recipient does either need to create a Globus ID (which is free) or be a user at an institution that is partnered with Globus.
Beyond moving data
Globus is always improving and adding onto its offerings. It is more than “just a data moving” service. Currently, Globus provides the ability to create workflows, “Globus Flows,” to automate repetitive tasks. There is also the ability to prepare and submit compute jobs, “Globus Compute,” which is a distributed Function as a Service (FaaS) platform that enables reliable, scalable, and high performance remote function execution on remote clusters, including the Great Lakes cluster at U-M. Note that Globus does not store data. Both high performance computing and storage options are available to researchers upon request via the U-M Research Computing Package. Visit the ITS Advanced Research Computing website for full details and to request these services .
U-M policy
Before transferring data, please ensure that the appropriate data use agreements are on file with the Office of Research and Sponsored Project. Or, go directly to the policies and process to file a Data Use Agreement.
Visit the ARC website to get started with Globus.
For more information or to get help with Globus, please contact: [email protected].