Artifacts of GPU Server

This landing page provides an overview of the visualization websites generated for all AI enrichment demo experiments that were executed on this node. It brings together experiment summaries and detailed run tables while keeping the individual experiment and run websites directly accessible.

Each experiment section links to the visualization website of the enrichment runs, the original experiment website, the reference experiment website based on the reference descriptions by the researcher who created it, and a plot gallery. The plots show the different performance and quality metrics.

Table of Contents

How to Read This Page

Hardware and Runtime Information

Parameter Value
Node Hactar
GPU 4x NVIDIA GeForce GTX 1080 Ti
Driver 580.82.07
CPU 2x Xeon E5-2620 v4 @ 2.10GHz
Memory 128 GiB
Ollama Version 0.30.10

Experiments

MoonGen Latency Test

Experiment identifier: 2026-06-18_18-40-22_785154-moongen

Measures L2 load latency in a two-node setup by varying packet rates and sizes while recording latency and energy consumption.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric Value
Benchmark MoonGen Latency Test
Runs 10
Models 5
peek_chars 100, 10000
Prompt Tokens 145,277
Output Tokens 30,459
Total Tokens 175,736
Elapsed (s) 1438.05
GPU Energy (Wh) 152.447
Mean Normalized BERTScore F1 0.3143
Raw Per-Entity BERTScore F1 0.6108
Best Normalized BERTScore Run run03 (deepseek-r1:8b, peek=100, 0.3422)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID Model peek_chars Normalized BERTScore F1 Raw BERTScore F1 Elapsed (s) Output Tokens Output tok/s Energy (Wh)
run00 gemma3:4b 100 0.3040 0.6099 48.99 621 53.63 3.706
run01 gemma3:12b 100 0.2928 0.5869 83.34 529 24.62 10.263
run02 qwen3:14b 100 0.3027 0.6110 200.33 4,165 23.85 26.488
run03 deepseek-r1:8b 100 0.3422 0.6267 196.44 6,310 35.82 15.448
run04 gpt-oss:20b 100 0.3413 0.6311 56.01 1,412 44.58 14.769
run05 gemma3:4b 10000 0.2982 0.5968 51.04 649 52.65 3.078
run06 gemma3:12b 10000 0.2948 0.5938 101.68 543 24.40 11.633
run07 qwen3:14b 10000 0.3060 0.6173 247.39 4,272 22.03 30.766
run08 deepseek-r1:8b 10000 0.3254 0.6096 391.35 10,643 30.28 28.397
run09 gpt-oss:20b 10000 0.3358 0.6245 61.48 1,315 43.87 7.900

Quic Implementation Benchmark

Experiment identifier: 2026-06-18_20-34-16_305947-quic

Evaluates performance of different QUIC implementations by running repeated file transfers across varying configurations and comparing results.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric Value
Benchmark Quic Implementation Benchmark
Runs 10
Models 5
peek_chars 100, 10000
Prompt Tokens 425,292
Output Tokens 60,933
Total Tokens 486,225
Elapsed (s) 3214.57
GPU Energy (Wh) 310.399
Mean Normalized BERTScore F1 0.3481
Raw Per-Entity BERTScore F1 0.3182
Best Normalized BERTScore Run run07 (qwen3:14b, peek=10000, 0.3643)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID Model peek_chars Normalized BERTScore F1 Raw BERTScore F1 Elapsed (s) Output Tokens Output tok/s Energy (Wh)
run00 gemma3:4b 100 0.3391 0.2997 121.00 1,031 52.61 6.876
run01 gemma3:12b 100 0.3484 0.3190 231.92 981 24.04 24.786
run02 qwen3:14b 100 0.3540 0.3041 657.99 12,648 22.28 74.321
run03 deepseek-r1:8b 100 0.3420 0.2988 442.89 12,920 34.32 32.488
run04 gpt-oss:20b 100 0.3631 0.3776 137.16 2,780 43.46 18.632
run05 gemma3:4b 10000 0.3339 0.2887 118.08 1,113 52.51 6.267
run06 gemma3:12b 10000 0.3476 0.3106 239.87 1,017 23.98 25.268
run07 qwen3:14b 10000 0.3643 0.3274 652.17 12,161 22.05 71.231
run08 deepseek-r1:8b 10000 0.3423 0.3199 465.36 13,233 33.87 33.489
run09 gpt-oss:20b 10000 0.3461 0.3365 148.14 3,049 42.75 17.042

Multipath

Experiment identifier: 2026-06-18_22-42-17_403726-multipath

Automates deployment and evaluation of a multipath-enabled IPv6 topology to study transport protocols using multiple network paths simultaneously.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric Value
Benchmark Multipath
Runs 10
Models 5
peek_chars 100, 10000
Prompt Tokens 238,698
Output Tokens 58,957
Total Tokens 297,655
Elapsed (s) 2659.69
GPU Energy (Wh) 258.813
Mean Normalized BERTScore F1 0.1904
Raw Per-Entity BERTScore F1 0.5835
Best Normalized BERTScore Run run09 (gpt-oss:20b, peek=10000, 0.2063)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID Model peek_chars Normalized BERTScore F1 Raw BERTScore F1 Elapsed (s) Output Tokens Output tok/s Energy (Wh)
run00 gemma3:4b 100 0.2012 0.5927 101.64 1,571 53.72 5.932
run01 gemma3:12b 100 0.1836 0.5887 185.60 1,406 24.60 20.311
run02 qwen3:14b 100 0.1746 0.5678 504.67 10,360 23.22 59.116
run03 deepseek-r1:8b 100 0.1849 0.5753 438.68 13,617 34.64 32.542
run04 gpt-oss:20b 100 0.2052 0.5835 113.24 2,817 43.95 15.596
run05 gemma3:4b 10000 0.2003 0.5915 95.42 1,571 53.61 4.886
run06 gemma3:12b 10000 0.1802 0.5807 186.35 1,427 24.57 19.519
run07 qwen3:14b 10000 0.1725 0.5795 514.05 10,524 23.09 58.460
run08 deepseek-r1:8b 10000 0.1954 0.5887 406.62 12,841 35.53 29.178
run09 gpt-oss:20b 10000 0.2063 0.5864 113.41 2,823 44.24 13.273

Combined Comparison

This section compares the demo experiments against each other and then groups the same runs by model profile for a cross-experiment view.

Combined Plots

Combined plot gallery for this node

Aggregates values across all published experiments and tested models on this node.

Visualization website of the combined plots

Experiment Comparison

Experiment Identifier Runs Total Tokens Elapsed (s) Mean Normalized BERTScore F1 Mean Raw BERTScore F1 Best Normalized BERTScore Run
MoonGen Latency Test 2026-06-18_18-40-22_785154-moongen 10 175,736 1438.05 0.3143 0.6108 run03 (deepseek-r1:8b, peek=100, 0.3422)
Quic Implementation Benchmark 2026-06-18_20-34-16_305947-quic 10 486,225 3214.57 0.3481 0.3182 run07 (qwen3:14b, peek=10000, 0.3643)
Multipath 2026-06-18_22-42-17_403726-multipath 10 297,655 2659.69 0.1904 0.5835 run09 (gpt-oss:20b, peek=10000, 0.2063)

Model Comparison

Model Experiments Mean Normalized BERTScore F1 Mean Raw BERTScore F1 Mean Elapsed (s) Mean Output tok/s Mean Energy/run (Wh)
deepseek-r1:8b MoonGen Latency Test, Multipath, Quic Implementation Benchmark 0.2887 0.5032 390.22 34.08 28.590
gemma3:12b MoonGen Latency Test, Multipath, Quic Implementation Benchmark 0.2746 0.4966 171.46 24.37 18.630
gemma3:4b MoonGen Latency Test, Multipath, Quic Implementation Benchmark 0.2794 0.4966 89.36 53.12 5.124
gpt-oss:20b MoonGen Latency Test, Multipath, Quic Implementation Benchmark 0.2996 0.5233 104.91 43.81 14.535
qwen3:14b MoonGen Latency Test, Multipath, Quic Implementation Benchmark 0.2790 0.5012 462.77 22.75 53.397