I guess now I need to figure out what to do with them. @StefanR5R any suggestions? I'd like to throw them both in one machine unless there are any major performance benefits to building 2 separate machines. This would seem to be the logical motherboard to use? - https://www.newegg.com/asrock-rack-rome ... klink=true
Noctua heatsink/fan preferably but I'm not sure what the best one is that is compatible. Any RAM suggestions?
This is definitely uncharted territory for me.
Unlike most DCers, I basically have two separate types of computers: 1) Headless high-core-count computers, 2) GPU-equipped low-core-count computers. There are a few reasons for and against this approach, and I am sure you already thought about this for your own computer zoo.
Under Linux, there shouldn't be a performance impact from 2P compared to 2x 1P at least for CPU-only projects. On Windows, you may need 2 client instances (or multiples of 2) with CPU affinity enforced to one socket each.
I have no insight into GPU application performance on 2P systems.
I went with mainboards which have only 1 Gbit NICs and PCIe 3 myself because I have no need for better I/O and wanted to cut down on power consumption of the remaining (mostly idle) I/O components. The difference between idle 1GbE/10GbE and idle PCIe3/PCIe4 probably isn't much though.Icecold wrote: ↑Sun Sep 26, 2021 5:39 pmThis would seem to be the logical motherboard to use? - https://www.newegg.com/asrock-rack-rome ... klink=true
If you want to use GPUs but won't use PCIe risers, some Supermicro boards are not suitable because of conflicts between onboard connectors and GPU coolers.
Of the mainstream DIY TR4/SP3 air coolers, only Noctua coolers are worthwhile according to the tests which I have seen. Use the 140 mm version unless you have height restrictions.
Thanks to an optional lateral offset in the mounting kits of Noctua TR4/SP3 coolers, they should fit on most 2P mainboards. I checked cooler geometry and board geometry based on vendor drawings before I bought mine.
You must provide ample air flow across or top-down the VRM section(s), and reasonable air flow at RAM, NIC, and M2 SSD. I placed a whole collection of fans of various diameters on top of such board areas for this reason. And of course I have case fans which force front-to-back air flow (even though some of my 2P computers don't have a case, just a shelf and dividers in the shelf as air guides).
IMO it's worthwhile to populate all 8 channels per CPU. If you don't want that, populate 4. (Not 2, and definitely not 6. Reasonable support for 6 channels per socket started with Epyc 7003 Milan. With Rome, 6-channel RAM performs worse than 4-channel RAM. All arbitrary 1-, 3-, 5-, 7-channel populations work too but have similar performance drawbacks.)
My 2x32-core ( = 128 threads total) computers have 256 GBytes RAM which was always sufficient so far. I don't know if you can get by with 256 GB in a 2x64-core or 128 GB in a 1x64-core computer, or would eventually need 512 or 256 GB respectively; never really looked into this.
I went with DDR4-3200 for Epyc Rome myself, i.e. the top possible speed, but merely out of principle, not based on particular performance measurements.
I think I'm going to just order a motherboard and CPU coolers and hold off on the ram for now. I have ecc ddr4 here I can pull from another machine to test but I don't want to invest the extra money until I get the processors and confirm they're not vendor locked.
Even if they are vendor locked this is a good price and even if you need to sell you should get back what you have put into them in the near future.
I think the board you are looking at is good as there are not many options and the additional external I.O. does not suck too many watts nowadays.
From what I have been reading on Serve the Home and other sites is that the dual boards work well with the 140mm Noctua coolers and additional airflow in a case over the VRMs with the OEM cpus. The ES versions you can overclock and that can really push the heat into the VRMs as we all know. These appear to be stock and hopefully not vender locked....
More power to the Team!!!! but now I need to catch up even more
If it's vendor locked then you can only use a vendor (Dell) branded Motherboard with them and even then, they might not work. So essentially you'll have a $1000 (or however much they cost) paperweight.
The next round of workstation and server cpus are coming soon, so hope they work or sell them soon.
I should know in a few days, maybe this weekend. The main hold up will be waiting on the motherboard to arrive, but I ordered that and everything else other than RAM yesterday. I will pull RAM from another machine to test, I didn't want to drop $1500+ on RAM just to find out the CPU's aren't going to work with my motherboard.
Do you think it's likely DDR4-2400 would be a bottleneck for this build? My thought process being that with 8 channel memory the memory speed should matter less, but I wasn't really sure.
Edit - doesn't matter, was able to purchase DDR4-3200 at a decent price.
Back when I routinely had 14-core and 22-core BDW-EPs running the same projects at the same times, I did of course see projects in which throughput / (core count × clock) was the same between them, and others which did not scale as well to the 22-core Xeons. Rosetta@home was one of the latter kind, but there were more. Alas I didn't pursue this topic systematically while I still had the 14-core Xeons in operation.
Apparently, PrimeGrid LLR projects are a counter example which are not dependent on RAM bandwidth once you dialed in the right combination of simultaneously running LLR instances and thread count. Here is my most recent PrimeGrid LLR offline test (321-LLR) with results for both types of Xeons, each with 2×4 channels of DDR4-2400c17:
…dual 14-core Xeons: 215 kPPD / (56 used threads × 2.9 GHz) = 1.32 kPPD/thread/GHz
…dual 22-core Xeons: 299 kPPD / (84 used threads × 2.6 GHz) = 1.37 kPPD/thread/GHz
…dual 22-core Xeons: 299 kPPD / (88 available threads × 2.6 GHz) = 1.31 kPPD/thread/GHz
The "available threads" figure is the more significant one, because the use of SMT in LLR has the only purpose to help saturate the FMA units. I.e. the calculation should actually be kPPD / (available FMA units × clock).
But of course the further you are off of the optimum configuration for a given PrimeGrid LLR subproject, the more dependent on RAM performance it becomes.
In the AMI BIOS which Supermicro uses, it's in "Advanced" -> "NB Configuration". I have no idea if and where it can be found in ASRock's BIOS; perhaps somewhere in "Advanced" -> "AMD CBS". AMD recommends to set TDP and PPT to the same value.
AMD has got a few guides about a variety of more BIOS parameters: Performance Tuning Guides
Particularly, see "Workload Tuning Guide for AMD EPYC™ 7002 Series Processor Based Servers" at the bottom of this list.
And one other thing: Fan speed control — that's of course different between motherboard vendors. But I think the vendors have in common that they can't be bothered to document it. Here is some good info on Supermicro's implementation. I haven't looked for ASRock specific info.
I have 120MM fans pointed at the VRM's currently but they're not super high airflow. 'Vcore1 MOS Temp' in the BMC shows 65 degrees - that seems maybe somewhat high to me. Any thoughts on if that's too high?
I'm pretty sure the RAM is running correctly in 8 channel, I run sudo dmidecode --type memory in linux and it goes up to "bank locator P0 Channel H' and 'bank locator P1 Channel H' which I think means it's running in 8 channel.
I appreciate all the info you provided on how the BMC works.. I'm kicking myself for never using that before. I have a few Xeons that have a BMC and never bothered with it because I never realized how incredibly useful it is. I doubt I'll buy a motherboard without one moving forward unless it's my main workstation PC.
On that fan control doc - I somewhat assumed that fan 0 and 1 would be the CPU fans so that is how I plugged mine in. It would be logical that the first fans would be CPU fans so that's what I went with.
Any suggestions on what to set the PPT to? I have most of my 16 core/32 thread AMD processors set to 95 watts.
I don't actually know, but I believe that's OK for continuous operation.
On my H11DSi board, I have one Noctua NF-A4x20 sitting on the VRM heatsink which covers the VRM area of both CPUs. Another 80 mm fan which sits on some of the RAM banks also reaches a little bit over that heatsink.
When I ran PrimeGrid (at IIRC 2x 180 W PPT/TDP), CPU VRM temperature was 70…72 °C at 3900 RPM of the 40 mm fan and I don't know which ambient temperature. It was with open windows, making for different temperature zones and layers within the living room. :-)
Now with SiDock at 2x 155 W PPT/TDP but actual consumption only 335 W at the wall, CPU VRM temps are 60…61 °C, with 2600 RPM of the 40 mm fan and maybe 24 °C room temperature where this computer sits.
I can't get over how great being able to access the BMC is. Any time anybody posted PPT figures here such as "tested at ____ PPT" I thought "man I can't imagine having to haul a physical keyboard/mouse/monitor over to _____ machine to get in the bios to test that". Being able to do that across the LAN is game changing for me. I have no idea why I was ignorant about that for so long(while purchasing machines advertising IPMI here and there, seeing posts here about it, etc.) but it's fantastic.
7742's possible range of 225…240 W isn't as "wide" as the 155…180 W range of my 7452s, let alone the ranges of PPTs within which you can drive Ryzens. I suspect the efficiency loss when you go from the minimum and default 225 W to the maximum of 240 W is negligible. So if the VRM temps are OK for you at the latter (when you put a load on it which really utilizes it, e.g. a "small FFT" load like maybe PrimeGrid Sophie Germain singlethreaded on all physical cores), then I'd say you might as well set-and-forget it to 240 W PPT and TDP.
Added in 4 minutes 34 seconds:
PS, AMD tell us in their "Workload Tuning Guide for… 7002 series" to switch cTDP and package power limit to "OPN Max" when we run HPC type workloads.
Added in 14 minutes 25 seconds:
Here is what the "Workload Tuning Guide" says on the matter:Icecold wrote: ↑Tue Oct 05, 2021 8:46 amDo you think it's likely DDR4-2400 would be a bottleneck for this build? My thought process being that with 8 channel memory the memory speed should matter less, but I wasn't really sure.
Edit - doesn't matter, was able to purchase DDR4-3200 at a decent price.
AMD wrote:2.2.3 Memory Clock Speed
Benefit: By default, the BIOS for EPYC 7002 Series processors will run at the maximum allowable clock frequency by the platform. This configuration results in the maximum memory bandwidth for the processor, but in some cases, it may not be the lowest latency. The Infinity Fabric will have a maximum speed of 1467 MHz (lower in some platforms), resulting in a single clock penalty to transfer data from the memory channels onto the Infinity Fabric to progress through the SoC. To achieve the lowest latency, you can set the memory frequency to be equal to the Infinity Fabric speed. Lowering the memory clock speed also results in power savings in the memory controller, thus allowing the rest of the SoC to consume more power potentially resulting in a performance boost elsewhere, depending on the workload.
3.4 HPC and Telco Settings
[NUMA and MemorySettings/ Memory Clock Speed]
CFD & Other Memory Bound: Auto
Chem, Physics & Other SIMD Bound: Try 1467
I also looked for rackmount cases but never found one which isn't very deep but, importantly, is high enough to fit the 140/150mm class tower coolers.
They used to make some cheap 4U cases that you could use the server boards in. Then just take a dremel (or whatever similar tool they have in Germany) and cut out holes for the tower coolers to stick out of.