lets-sysadmin - Tumblr blog

lets-sysadmin · 4 years ago

Text

Alpine storage | CU Boulder Research Computing

The first phase of CU Boulder Research Computing’s new / upcoming Alpine compute environment is supported by an ArcaStream Pixstor capable of storing 2PB of data at greater than 22GB/s. The Pixstor uses the IBM Spectrum Scale (GPFS) parallel file system which provides great performance for our large- and many-file workloads.

0 notes

lets-sysadmin · 4 years ago

Text

Alpine GPU compute nodes | CU Boulder Research Computing

In addition to its CPU compute nodes, the first phase of CU Boulder Research Computing’s new / upcoming Alpine compute cluster provides a rack of GPU compute nodes as well. 8 of these nodes provide 3x Nvidia A100 GPU accelerators each, and 8 provide 3x AMD MI100 GPU accelerators each. And each GPU compute node has 2x25Gb Ethernet connectivity to storage and the CU Boulder Science Network.

#cuboulder #dell #gpu #hpc #amd #nvidia

0 notes

lets-sysadmin · 4 years ago

Text

Alpine CPU compute nodes | CU Boulder Research Computing

CU Boulder Research Computing’s new / upcoming Alpine compute cluster gets its first start with these 64 CPU compute nodes. Based on the Dell PowerEdge C6525 platform, each 0.5u compute node provides two AMD EPYC 7543 “Milan” CPUs, 256 GiB of system memory, Mellanox HDR-100 InfiniBand connectivity to other CPU compute nodes, and 25Gb Ethernet connectivity to storage and the CU Boulder Science Network.

#cuboulder #curc #dell #mellanox #infiniband #hpc #amd

0 notes

lets-sysadmin · 4 years ago

Text

A moment of silence for the remnants of Janus.

0 notes

lets-sysadmin · 6 years ago

Photo

We’ve gone back and forth on whether to install Nvidia drivers using their prepared packages in their yum repo or with the .run installer; but since this node has a new GPU, I decided to go ahead with the packages as I find them simpler, and I expected it to work (unlike some of our experiences with older GPUs).

#curc #blanca #login #ics

0 notes

lets-sysadmin · 6 years ago

Photo

I discovered that I already had a class that configured access.conf; so I fixed it up just a bit to make it more flexible for the future, and then configured it with hiera (now that I’ve finally gotten more comfortable with it).

#curc #blanca #login #ics

0 notes

lets-sysadmin · 6 years ago

Photo

The telegraf service seems like the easiest problem to fix, so I’ll start there.

We’re using a dedicated telegraf module to configure the yumrepo that installs telegraf, and that also installs the package.

Everything looked good to me (mostly) so I tried just applying the profile again; and, of course, now it works.

I expect that the problem was (a) it tried to install the package before it configured the repo, then (b) after configuring the repo, yum didn’t refresh the repo cache in time for the next time I tried to apply the manifest.

I’ll “fix” this for the future by defining a dependency between the telegraf package and the influxdata repo.

#curc #blanca #login #ics

0 notes

lets-sysadmin · 6 years ago

Photo

I’ve been working on standing up a new Blanca login node for ICS to replace their long-standing login node that is aging out of manufacturer support. But this time I want to actually design a Blanca login service that other contributors could use; so I’m using this opportunity to genericize our configuration manifest. Most of the work is already done; but a few of our profiles need a bit more work:

- There’s a flat list of packages we’ve installed just for ICS, so I need to decide how to categorize that for the future.

- The node isn’t on the correct network yet, so I haven’t applied the networking configuration yet.

- The telegraf profile wasn’t able to find the package to install, so I need to investigate that.

- I need to get the Nvidia GPU driver installed so I can start monitoring it.

- And, finally, I need to actually write an authz configuration for the host. I plan on doing this with pam_access, to control who is actually allowed to log in to the node.

#curc #blanca #login #ics

0 notes