Artifacts of NVIDIA DGX Spark

This landing page provides an overview of the visualization websites generated for all AI enrichment demo experiments that were executed on this node. It brings together experiment summaries and detailed run tables while keeping the individual experiment and run websites directly accessible.

Each experiment section links to the visualization website of the enrichment runs, the original experiment website, the reference experiment website based on the reference descriptions by the researcher who created it, and a plot gallery. The plots show the different performance and quality metrics.

Table of Contents

How to Read This Page

Hardware and Runtime Information

Parameter Value
Node Overfit
GPU NVIDIA GB10
Driver 580.126.20
CPU GB10 Spark CPU @ 3.9GHz
Memory 128 GiB
Ollama Version 0.30.10

Experiments

MoonGen Latency Test

Experiment identifier: 2026-06-18_18-24-55_986663-moongen

Measures L2 load latency in a two-node setup by varying packet rates and sizes while recording latency and energy consumption.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric Value
Benchmark MoonGen Latency Test
Runs 12
Models 6
peek_chars 100, 10000
Prompt Tokens 171,234
Output Tokens 32,111
Total Tokens 203,345
Elapsed (s) 1184.18
GPU Energy (Wh) 16.726
Mean Normalized BERTScore F1 0.3109
Raw Per-Entity BERTScore F1 0.6088
Best Normalized BERTScore Run run04 (gpt-oss:20b, peek=100, 0.3415)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID Model peek_chars Normalized BERTScore F1 Raw BERTScore F1 Elapsed (s) Output Tokens Output tok/s Energy (Wh)
run00 gemma3:4b 100 0.3162 0.6189 61.57 609 64.54 0.496
run01 gemma3:12b 100 0.2934 0.5957 40.39 521 26.08 0.483
run02 qwen3:14b 100 0.3164 0.6197 190.44 4,242 23.64 2.874
run03 deepseek-r1:8b 100 0.2876 0.6019 206.30 7,555 40.24 2.910
run04 gpt-oss:20b 100 0.3415 0.6204 37.48 1,463 57.25 0.552
run05 gpt-oss:120b 100 0.3175 0.6032 56.27 1,651 40.64 1.022
run06 gemma3:4b 10000 0.3073 0.6101 26.89 629 63.14 0.291
run07 gemma3:12b 10000 0.2930 0.5954 45.50 520 25.61 0.610
run08 qwen3:14b 10000 0.3169 0.6202 213.67 4,296 21.82 3.000
run09 deepseek-r1:8b 10000 0.2796 0.5940 210.68 7,702 38.91 3.003
run10 gpt-oss:20b 10000 0.3379 0.6168 39.05 1,412 56.72 0.462
run11 gpt-oss:120b 10000 0.3236 0.6092 55.94 1,511 40.03 1.021

Quic Implementation Benchmark

Experiment identifier: 2026-06-18_19-12-43_964731-quic

Evaluates performance of different QUIC implementations by running repeated file transfers across varying configurations and comparing results.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric Value
Benchmark Quic Implementation Benchmark
Runs 12
Models 6
peek_chars 100, 10000
Prompt Tokens 506,702
Output Tokens 69,723
Total Tokens 576,425
Elapsed (s) 2733.85
GPU Energy (Wh) 37.329
Mean Normalized BERTScore F1 0.3516
Raw Per-Entity BERTScore F1 0.3227
Best Normalized BERTScore Run run09 (deepseek-r1:8b, peek=10000, 0.3758)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID Model peek_chars Normalized BERTScore F1 Raw BERTScore F1 Elapsed (s) Output Tokens Output tok/s Energy (Wh)
run00 gemma3:4b 100 0.3303 0.2964 96.64 1,035 63.23 0.773
run01 gemma3:12b 100 0.3492 0.3321 104.86 983 25.60 1.241
run02 qwen3:14b 100 0.3549 0.3071 535.07 11,634 23.47 7.885
run03 deepseek-r1:8b 100 0.3749 0.3349 406.44 14,648 39.67 5.778
run04 gpt-oss:20b 100 0.3383 0.3059 91.74 2,953 57.26 1.099
run05 gpt-oss:120b 100 0.3618 0.3394 116.09 2,762 40.90 1.708
run06 gemma3:4b 10000 0.3233 0.2803 61.65 1,096 62.22 0.573
run07 gemma3:12b 10000 0.3476 0.3263 107.88 997 25.16 1.304
run08 qwen3:14b 10000 0.3505 0.3182 540.08 11,455 22.71 7.744
run09 deepseek-r1:8b 10000 0.3758 0.3137 453.41 16,448 38.65 6.457
run10 gpt-oss:20b 10000 0.3502 0.3589 87.42 2,903 56.50 0.993
run11 gpt-oss:120b 10000 0.3629 0.3595 132.57 2,809 35.10 1.775

Multipath

Experiment identifier: 2026-06-18_20-33-56_911701-multipath

Automates deployment and evaluation of a multipath-enabled IPv6 topology to study transport protocols using multiple network paths simultaneously.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric Value
Benchmark Multipath
Runs 12
Models 6
peek_chars 100, 10000
Prompt Tokens 285,984
Output Tokens 63,906
Total Tokens 349,890
Elapsed (s) 2356.69
GPU Energy (Wh) 31.839
Mean Normalized BERTScore F1 0.1952
Raw Per-Entity BERTScore F1 0.5853
Best Normalized BERTScore Run run04 (gpt-oss:20b, peek=100, 0.2149)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID Model peek_chars Normalized BERTScore F1 Raw BERTScore F1 Elapsed (s) Output Tokens Output tok/s Energy (Wh)
run00 gemma3:4b 100 0.1922 0.5920 95.57 1,563 64.10 0.780
run01 gemma3:12b 100 0.1738 0.5777 98.90 1,395 25.90 1.157
run02 qwen3:14b 100 0.1757 0.5750 475.78 10,699 23.60 7.038
run03 deepseek-r1:8b 100 0.2126 0.5891 331.44 11,792 40.35 4.539
run04 gpt-oss:20b 100 0.2149 0.5952 89.56 3,589 57.09 1.130
run05 gpt-oss:120b 100 0.2021 0.5830 104.72 2,915 40.72 1.556
run06 gemma3:4b 10000 0.1922 0.5920 58.48 1,563 63.14 0.530
run07 gemma3:12b 10000 0.1738 0.5777 99.85 1,395 25.45 1.171
run08 qwen3:14b 10000 0.1757 0.5750 487.97 10,699 22.97 6.911
run09 deepseek-r1:8b 10000 0.2126 0.5891 317.49 11,792 39.45 4.418
run10 gpt-oss:20b 10000 0.2149 0.5952 90.55 3,589 56.43 1.032
run11 gpt-oss:120b 10000 0.2021 0.5830 106.39 2,915 40.51 1.575

Combined Comparison

This section compares the demo experiments against each other and then groups the same runs by model profile for a cross-experiment view.

Combined Plots

Combined plot gallery for this node

Aggregates values across all published experiments and tested models on this node.

Visualization website of the combined plots

Experiment Comparison

Experiment Identifier Runs Total Tokens Elapsed (s) Mean Normalized BERTScore F1 Mean Raw BERTScore F1 Best Normalized BERTScore Run
MoonGen Latency Test 2026-06-18_18-24-55_986663-moongen 12 203,345 1184.18 0.3109 0.6088 run04 (gpt-oss:20b, peek=100, 0.3415)
Quic Implementation Benchmark 2026-06-18_19-12-43_964731-quic 12 576,425 2733.85 0.3516 0.3227 run09 (deepseek-r1:8b, peek=10000, 0.3758)
Multipath 2026-06-18_20-33-56_911701-multipath 12 349,890 2356.69 0.1952 0.5853 run04 (gpt-oss:20b, peek=100, 0.2149)

Model Comparison

Model Experiments Mean Normalized BERTScore F1 Mean Raw BERTScore F1 Mean Elapsed (s) Mean Output tok/s Mean Energy/run (Wh)
deepseek-r1:8b MoonGen Latency Test, Multipath, Quic Implementation Benchmark 0.2905 0.5038 320.96 39.55 4.518
gemma3:12b MoonGen Latency Test, Multipath, Quic Implementation Benchmark 0.2718 0.5008 82.90 25.63 0.994
gemma3:4b MoonGen Latency Test, Multipath, Quic Implementation Benchmark 0.2769 0.4983 66.80 63.40 0.574
gpt-oss:120b MoonGen Latency Test, Multipath, Quic Implementation Benchmark 0.2950 0.5129 95.33 39.65 1.443
gpt-oss:20b MoonGen Latency Test, Multipath, Quic Implementation Benchmark 0.2996 0.5154 72.63 56.88 0.878
qwen3:14b MoonGen Latency Test, Multipath, Quic Implementation Benchmark 0.2817 0.5025 407.17 23.03 5.909