Artifacts of GPU Server

This landing page provides an overview of the visualization websites generated for all AI enrichment demo experiments that were executed on this node. It brings together experiment summaries and detailed run tables while keeping the individual experiment and run websites directly accessible.

Each experiment section links to the visualization website of the enrichment runs, the original experiment website, the reference experiment website based on the reference descriptions by the researcher who created it, and a plot gallery. The plots show the different performance and quality metrics.

How to Read This Page

First, read the description for each experiment to understand what it covers.
Then open the visualization website of original experiment and compare it with the visualization website of reference experiment.
Check the summary metrics for a quick overview of the results.
For more detail, use the detailed run tables.
If you want more metrics and charts, open the plot gallery.
If you want to inspect the actual enrichment runs for this RO-Crate, open the visualization website of enrichment runs.

Hardware and Runtime Information

Parameter	Value
Node	Hactar
GPU	4x NVIDIA GeForce GTX 1080 Ti
Driver	580.82.07
CPU	2x Xeon E5-2620 v4 @ 2.10GHz
Memory	128 GiB
Ollama Version	0.30.10

Experiments

MoonGen Latency Test

Experiment identifier: 2026-06-18_18-40-22_785154-moongen

Measures L2 load latency in a two-node setup by varying packet rates and sizes while recording latency and energy consumption.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

EnrichmentExperiment

Visualization website of enrichment runsIncludes the enrichment runs for this experiment across all tested models and parameters.

Visualization website of enrichment runs

OriginalExperiment

Visualization website of original experimentShows the original experiment website without additional descriptive text.

Visualization website of original experiment

ReferenceExperiment

Visualization website of reference experimentShows the reference experiment website with researcher-written descriptions.

Visualization website of reference experiment

PlotGallery

Generated metric plots and summary chartsBrowse all published boxplots and combined summary charts for this experiment.

Plot gallery for this experiment

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric	Value
Benchmark	MoonGen Latency Test
Runs	10
Models	5
peek_chars	100, 10000
Prompt Tokens	145,277
Output Tokens	30,459
Total Tokens	175,736
Elapsed (s)	1438.05
GPU Energy (Wh)	152.447
Mean Normalized BERTScore F1	0.3143
Raw Per-Entity BERTScore F1	0.6108
Best Normalized BERTScore Run	run03 (deepseek-r1:8b, peek=100, 0.3422)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID	Model	peek_chars	Normalized BERTScore F1	Raw BERTScore F1	Elapsed (s)	Output Tokens	Output tok/s	Energy (Wh)
run00	gemma3:4b	100	0.3040	0.6099	48.99	621	53.63	3.706
run01	gemma3:12b	100	0.2928	0.5869	83.34	529	24.62	10.263
run02	qwen3:14b	100	0.3027	0.6110	200.33	4,165	23.85	26.488
run03	deepseek-r1:8b	100	0.3422	0.6267	196.44	6,310	35.82	15.448
run04	gpt-oss:20b	100	0.3413	0.6311	56.01	1,412	44.58	14.769
run05	gemma3:4b	10000	0.2982	0.5968	51.04	649	52.65	3.078
run06	gemma3:12b	10000	0.2948	0.5938	101.68	543	24.40	11.633
run07	qwen3:14b	10000	0.3060	0.6173	247.39	4,272	22.03	30.766
run08	deepseek-r1:8b	10000	0.3254	0.6096	391.35	10,643	30.28	28.397
run09	gpt-oss:20b	10000	0.3358	0.6245	61.48	1,315	43.87	7.900

Quic Implementation Benchmark

Experiment identifier: 2026-06-18_20-34-16_305947-quic

Evaluates performance of different QUIC implementations by running repeated file transfers across varying configurations and comparing results.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

EnrichmentExperiment

Visualization website of enrichment runsIncludes the enrichment runs for this experiment across all tested models and parameters.

Visualization website of enrichment runs

OriginalExperiment

Visualization website of original experimentShows the original experiment website without additional descriptive text.

Visualization website of original experiment

ReferenceExperiment

Visualization website of reference experimentShows the reference experiment website with researcher-written descriptions.

Visualization website of reference experiment

PlotGallery

Generated metric plots and summary chartsBrowse all published boxplots and combined summary charts for this experiment.

Plot gallery for this experiment

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric	Value
Benchmark	Quic Implementation Benchmark
Runs	10
Models	5
peek_chars	100, 10000
Prompt Tokens	425,292
Output Tokens	60,933
Total Tokens	486,225
Elapsed (s)	3214.57
GPU Energy (Wh)	310.399
Mean Normalized BERTScore F1	0.3481
Raw Per-Entity BERTScore F1	0.3182
Best Normalized BERTScore Run	run07 (qwen3:14b, peek=10000, 0.3643)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID	Model	peek_chars	Normalized BERTScore F1	Raw BERTScore F1	Elapsed (s)	Output Tokens	Output tok/s	Energy (Wh)
run00	gemma3:4b	100	0.3391	0.2997	121.00	1,031	52.61	6.876
run01	gemma3:12b	100	0.3484	0.3190	231.92	981	24.04	24.786
run02	qwen3:14b	100	0.3540	0.3041	657.99	12,648	22.28	74.321
run03	deepseek-r1:8b	100	0.3420	0.2988	442.89	12,920	34.32	32.488
run04	gpt-oss:20b	100	0.3631	0.3776	137.16	2,780	43.46	18.632
run05	gemma3:4b	10000	0.3339	0.2887	118.08	1,113	52.51	6.267
run06	gemma3:12b	10000	0.3476	0.3106	239.87	1,017	23.98	25.268
run07	qwen3:14b	10000	0.3643	0.3274	652.17	12,161	22.05	71.231
run08	deepseek-r1:8b	10000	0.3423	0.3199	465.36	13,233	33.87	33.489
run09	gpt-oss:20b	10000	0.3461	0.3365	148.14	3,049	42.75	17.042

Multipath

Experiment identifier: 2026-06-18_22-42-17_403726-multipath

Automates deployment and evaluation of a multipath-enabled IPv6 topology to study transport protocols using multiple network paths simultaneously.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

EnrichmentExperiment

Visualization website of enrichment runsIncludes the enrichment runs for this experiment across all tested models and parameters.

Visualization website of enrichment runs

OriginalExperiment

Visualization website of original experimentShows the original experiment website without additional descriptive text.

Visualization website of original experiment

ReferenceExperiment

Visualization website of reference experimentShows the reference experiment website with researcher-written descriptions.

Visualization website of reference experiment

PlotGallery

Generated metric plots and summary chartsBrowse all published boxplots and combined summary charts for this experiment.

Plot gallery for this experiment

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric	Value
Benchmark	Multipath
Runs	10
Models	5
peek_chars	100, 10000
Prompt Tokens	238,698
Output Tokens	58,957
Total Tokens	297,655
Elapsed (s)	2659.69
GPU Energy (Wh)	258.813
Mean Normalized BERTScore F1	0.1904
Raw Per-Entity BERTScore F1	0.5835
Best Normalized BERTScore Run	run09 (gpt-oss:20b, peek=10000, 0.2063)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID	Model	peek_chars	Normalized BERTScore F1	Raw BERTScore F1	Elapsed (s)	Output Tokens	Output tok/s	Energy (Wh)
run00	gemma3:4b	100	0.2012	0.5927	101.64	1,571	53.72	5.932
run01	gemma3:12b	100	0.1836	0.5887	185.60	1,406	24.60	20.311
run02	qwen3:14b	100	0.1746	0.5678	504.67	10,360	23.22	59.116
run03	deepseek-r1:8b	100	0.1849	0.5753	438.68	13,617	34.64	32.542
run04	gpt-oss:20b	100	0.2052	0.5835	113.24	2,817	43.95	15.596
run05	gemma3:4b	10000	0.2003	0.5915	95.42	1,571	53.61	4.886
run06	gemma3:12b	10000	0.1802	0.5807	186.35	1,427	24.57	19.519
run07	qwen3:14b	10000	0.1725	0.5795	514.05	10,524	23.09	58.460
run08	deepseek-r1:8b	10000	0.1954	0.5887	406.62	12,841	35.53	29.178
run09	gpt-oss:20b	10000	0.2063	0.5864	113.41	2,823	44.24	13.273

Combined Comparison

This section compares the demo experiments against each other and then groups the same runs by model profile for a cross-experiment view.

Combined Plots

Combined plot gallery for this node

Aggregates values across all published experiments and tested models on this node.

Visualization website of the combined plots

Experiment Comparison

Experiment	Identifier	Runs	Total Tokens	Elapsed (s)	Mean Normalized BERTScore F1	Mean Raw BERTScore F1	Best Normalized BERTScore Run
MoonGen Latency Test	2026-06-18_18-40-22_785154-moongen	10	175,736	1438.05	0.3143	0.6108	run03 (deepseek-r1:8b, peek=100, 0.3422)
Quic Implementation Benchmark	2026-06-18_20-34-16_305947-quic	10	486,225	3214.57	0.3481	0.3182	run07 (qwen3:14b, peek=10000, 0.3643)
Multipath	2026-06-18_22-42-17_403726-multipath	10	297,655	2659.69	0.1904	0.5835	run09 (gpt-oss:20b, peek=10000, 0.2063)

Model Comparison

Model	Experiments	Mean Normalized BERTScore F1	Mean Raw BERTScore F1	Mean Elapsed (s)	Mean Output tok/s	Mean Energy/run (Wh)
deepseek-r1:8b	MoonGen Latency Test, Multipath, Quic Implementation Benchmark	0.2887	0.5032	390.22	34.08	28.590
gemma3:12b	MoonGen Latency Test, Multipath, Quic Implementation Benchmark	0.2746	0.4966	171.46	24.37	18.630
gemma3:4b	MoonGen Latency Test, Multipath, Quic Implementation Benchmark	0.2794	0.4966	89.36	53.12	5.124
gpt-oss:20b	MoonGen Latency Test, Multipath, Quic Implementation Benchmark	0.2996	0.5233	104.91	43.81	14.535
qwen3:14b	MoonGen Latency Test, Multipath, Quic Implementation Benchmark	0.2790	0.5012	462.77	22.75	53.397

Artifacts of GPU Server

Table of Contents

How to Read This Page

Hardware and Runtime Information

Experiments

MoonGen Latency Test

Summary Metrics

Detailed Run Comparison

Quic Implementation Benchmark

Summary Metrics

Detailed Run Comparison

Multipath

Summary Metrics

Detailed Run Comparison

Combined Comparison

Experiment Comparison

Model Comparison