Large datasets#

The full model outputs are several terabytes. Post-processing tasks must be done while the data is on the supercomputer. That storage is not backed up, so the tasks must be completed before the data is lost or purged.

Before running a large model, take inventory of expected end products. Write, test, and automate post-processing scripts before generating data, so they may be scheduled when each model run is complete. If the end products take only a small fraction of the raw data, and if rerunning the model is less expensive than archival storage, it may be acceptable to delete that raw data.

Video Transcript#

NC State is an AG school, and North Carolinians like to support their local farms. NC State is also home to your favorite Advanced Computing Specialist, and she supports the high-performance computing center. In this project, she supports the hydrodynamics modeling for Mobile Bay.

Raising a cow takes a lot of time and effort. It is expensive. And it takes a lot of space. Our hydrodynamics model can take several weeks to run - even on a supercomputer! Processing the resulting output can take several months.

The model generates 6TB of data. In other words, our data is a cow.

A small farm can’t afford to make any mistakes. Before they start raising the first cow, they need to know who their consumers will be. Which products will they want? And how do they need the products to be packaged?

Likewise, before generating the next six terabytes of data, we need to know who the consumers will be. Which products do they want? And how do they need the products to be packaged?

To get the cow ready for market, the processing must be done in a plant. This job is only done once. The cow goes in… and the products go out.

The data for our model needs to be processed on the supercomputer. Hopefully, this job is only done once. Ideally, the processing is done immediately. When the job finishes!

The data needs to be processed on the supercomputer. But we can’t store it there. It will need to be archived. And we can’t process data in that type of storage. If we don’t have a list of the products needed when a model is finished, we end up moving the data back and forth.

We take great care to create models that get the science right. And to get things right, sometimes you need a cow. We do need to pause, to think, and to make some data management decisions. Preferably… as soon as possible!

Video Description#

Three cows are in a pasture. One is very close…and looking at you!

A ‘black and white silent-film’ screen appears, saying: NC State is an Ag school…with supercomputing expertise.

A family with NC State apparel pose next to two cows, then a pasture is shown along with several brand logos from local beef suppliers.

Next is a picture of NC State campus, the face of a smiling ‘advanced computing specialist’, and an image of a supercomputing cluster, followed by a short animation of bottom and surface salinity for Mobile Bay.

A ‘black and white silent-film’ screen appears, saying: Raising a cow is hard work…and so is running our model.

A farmer nuzzles cow, a cartoon of cash and coins fall from the sky. A cowboy on a horse is herding cows over a wide open range

The Mobile Bay hydrodynamics modeling grid is shown, with close-ups of areas with high resolution, with a supercomputer, and two smiling faces of people who process the data.

We see an icon of a storage drive that has 6 TB. Then, the cow who was looking at you, reappears apruptly.

A ‘black and white silent-film’ screen appears, saying: How do small, family farms manage to deal with those cows??? …and what can we learn from them?

A farmer is shown in contemplative state. Consumers are represented as box from suppliers or food delivery services. Products are shown by a diagram of a cow with areas labeled for the different cuts, and packaging is shown as individual steaks or patties.

A Mobile Bay model output and a supercomputer are shown. The consumers are climate, watershed, and ecology modelers. Products are quantities such as temperature, salinity, precipitation, and nutrients. Packaging are shown as file icons such as CSV and netCDF.

A ‘black and white silent-film’ screen appears, saying: Limited Time Offer - Order Now! …we don’t want to do this kind of job more than once!

An animated cartoon cow walks in to picture of interior of meat processing plant. Different cuts of beef moves out of the same plant. There is a sketch of a customer with a menu and a waiter. The customer asks “Can I just order a meatball?”

The animated cartoon cow walks in to picture of a supercomputer. The cow fades into the computational products, like temperature and salinity, and the packaging, CSV and netCDF files.

The animated cartoon cow starts behind a fence and walks into picture of a supercomputer, then the file icon image floats up from the supercomputer. The fence is labeled ‘Cold Storage’ and the supercomputer as ‘Network Attached Storage’. The cow moves back and forth between the fence and computer, and files float up from the computer when the cow moves behind it, a total of four times. Text appears saying, “This is even more annoying in real life…And it’s slower too!”

A ‘black and white silent-film’ screen appears, saying: Don’t moooove it twice…please let’s discuss a plan for post-processing model results!

A cow is lying in a grassy field. Three young women are sitting in the grass, next to the cow, one is cuddled close and gently holding the cow’s face. Next, a dozen cows in a feed lot. A farmer with a concerned face looks down at the cows, with hands on hips, as if wondering what to do. Finally, a large herd of cows are being herded down a country road. The image zooms in as to indicate a stampede.

A ‘black and white silent-film’ screen appears, saying: The End. No cows were harmed in the making of this video.