Ending the Data Blockage in Cryo-EM Research


By Walter Hinton

On occasion, medical researchers are hampered in ways that resemble the maladies they study. Such can be the case with Cryo-EM research. 

In just the past two years, cryo-electron microscopy has become a juggernaut in pharmaceutical investigations worldwide. The Nobel Prize-winning technology allows scientists to view biomolecules in three dimensions and in their mid-active state. Cryo-EM has enabled a new era in structural biology and biomedicine therapeutics, spanning everything from COVID-19 discoveries to cancer treatments. By one account, over 100 Cryo-EM units have already been sold to biomedicine labs around the world.

But there is a bottleneck in Cryo-EM research. Data comes off the microscope in the form of micrographs, which are similar to a streamed video image; during research, a massive amount of micrograph data is generated, often measured in petabytes. This data is ingested by servers attached to the Cryo-EM device. Automated scripts copy the data, then move it to shared file systems that allow teams of researchers to investigate the observations. After moving the data, the servers must replicate the information on backup servers, compare the files to ensure nothing was corrupted, and finally delete the data on the device’s servers before moving on.

Like plaque that forms on the walls of arteries, restricting the flow of blood, these cumbersome processes slow the flow of information and with it, the speed of research findings. In a typical day, researchers may copy/move/delete 15 terabytes of data — the usual capacity of a Cryo-EM direct attached storage (DAS) — eight times or more. On a 100Gbps network working at 80% utilization, each repeat takes a minimum of 60 minutes.

Over a 24-hour period, this impediment results in eight hours of downtime — the equivalent of ten days of productivity per month. For a research device that carries an average price tag of $7 million, such delays can equate to over $60,000 a month in frustrating downtime.

Bottlenecks have real consequences. Pharmaceutical companies, in particular, are under time constraints, and downtime is not an option. While data copy/move and checksums are performed, the Cryo-EM microscope is unavailable for creating micrographs, and research is stopped. Adding more servers doesn’t solve the ingest challenge, nor will saving the data for batch processing overnight fix the problem. Speed and data flow are at the heart of the issue.

Eliminating the Blockage

Cures for this condition are found in NVMe-oF systems. NVMe-oF (Non-Volatile Memory express over Fabrics) enables the benefits of non-volatile memory — typically high-end flash storage — at the highest scale. NVMe-oF takes the benefits of non-volatile memory to another level of efficiency by maximizing the parallelism and performance of flash memory, boosting application performance. It also allows NVMe arrays to ingest data regardless of scale or distance.

Some Cryo-EM solutions presently use NVMe solid state drives to store images, but without fixing the data flow issue. Because micrographs are essentially video, the data crunching is handled by GPUs (graphical processing units), like NVIDIA’s DGX™ A100. Congestion between storage and processing must be removed to provide seamless analytical workflows. Only NVMe-oF systems offer the speed, scale and intelligence necessary for uninterrupted Cryo-EM research.

Other factors support NVMe-oF adoption as well. Because NVMe-oF systems facilitate file sharing, more researchers can access micrographs immediately. The technology can pay for itself, in productivity improvements alone, in just a few months. It fundamentally clears the ingest and processing obstructions, ending the pause/copy/move/validate restrictions. With NVMe-oF, nothing gets in the way of science.

Accelerating Discovery

To gain maximum availability and speed, NVMe-oF solutions should adhere to standards-based networking. A standards-based approach will not only improve throughput, but also make the system easier to deploy and operate. If the organization has existing investments in NAS (Network Attached Storage) or parallel file systems, look for a solution that can seamlessly plug in as a performance storage tier in the defined storage workflow.

Regardless of existing infrastructure, NVMe-oF holds the key to unlocking the full potential of Cryo-EM. In the research realm, time-to-discovery can make all the difference. In an era when the entire world hungers for breakthroughs, labs can’t afford to be crippled by process. IT teams that remove hindrances will be the unsung heroes of medical innovation.

Put in more practical terms, an investment in storage technology that costs 2-3% of the host device it supports, yet generates a 30% increase in productivity, offers great ROI. If restricted data flow is the arterial disease of Cryo-EM research, NVMe-oF is the statin that scientists — and research organizations — so desperately need.

WALTER HINTON is director of product and solutions marketing for Pavilion.

# # # #

Walter Hinton is Director of Product and Solutions Marketing at Pavilion Data Systems, developer of the industry’s first Hyperparallel Flash Array.  With a deep technical background and extensive experience in building marketing teams, Walt served as Chief Strategist at StorageTek where he helped create the Storage Networking Industry Association (SNIA). Most recently, he was Sr. Global Director of Product Marketing at Western Digital.