Artifacts of NVIDIA DGX Spark

This landing page provides an overview of the visualization websites generated for all AI enrichment demo experiments that were executed on this node. It brings together experiment summaries and detailed run tables while keeping the individual experiment and run websites directly accessible.

Each experiment section links to the visualization website of the enrichment runs, the original experiment website, the reference experiment website based on the reference descriptions by the researcher who created it, and a plot gallery. The plots show the different performance and quality metrics.

How to Read This Page

First, read the description for each experiment to understand what it covers.
Then open the visualization website of original experiment and compare it with the visualization website of reference experiment.
Check the summary metrics for a quick overview of the results.
For more detail, use the detailed run tables.
If you want more metrics and charts, open the plot gallery.
If you want to inspect the actual enrichment runs for this RO-Crate, open the visualization website of enrichment runs.

Hardware and Runtime Information

Parameter	Value
Node	Overfit
GPU	NVIDIA GB10
Driver	580.126.20
CPU	GB10 Spark CPU @ 3.9GHz
Memory	128 GiB
Ollama Version	0.30.10

Experiments

MoonGen Latency Test

Experiment identifier: 2026-06-18_18-24-55_986663-moongen

Measures L2 load latency in a two-node setup by varying packet rates and sizes while recording latency and energy consumption.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

EnrichmentExperiment

Visualization website of enrichment runsIncludes the enrichment runs for this experiment across all tested models and parameters.

Visualization website of enrichment runs

OriginalExperiment

Visualization website of original experimentShows the original experiment website without additional descriptive text.

Visualization website of original experiment

ReferenceExperiment

Visualization website of reference experimentShows the reference experiment website with researcher-written descriptions.

Visualization website of reference experiment

PlotGallery

Generated metric plots and summary chartsBrowse all published boxplots and combined summary charts for this experiment.

Plot gallery for this experiment

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric	Value
Benchmark	MoonGen Latency Test
Runs	12
Models	6
peek_chars	100, 10000
Prompt Tokens	171,234
Output Tokens	32,111
Total Tokens	203,345
Elapsed (s)	1184.18
GPU Energy (Wh)	16.726
Mean Normalized BERTScore F1	0.3109
Raw Per-Entity BERTScore F1	0.6088
Best Normalized BERTScore Run	run04 (gpt-oss:20b, peek=100, 0.3415)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID	Model	peek_chars	Normalized BERTScore F1	Raw BERTScore F1	Elapsed (s)	Output Tokens	Output tok/s	Energy (Wh)
run00	gemma3:4b	100	0.3162	0.6189	61.57	609	64.54	0.496
run01	gemma3:12b	100	0.2934	0.5957	40.39	521	26.08	0.483
run02	qwen3:14b	100	0.3164	0.6197	190.44	4,242	23.64	2.874
run03	deepseek-r1:8b	100	0.2876	0.6019	206.30	7,555	40.24	2.910
run04	gpt-oss:20b	100	0.3415	0.6204	37.48	1,463	57.25	0.552
run05	gpt-oss:120b	100	0.3175	0.6032	56.27	1,651	40.64	1.022
run06	gemma3:4b	10000	0.3073	0.6101	26.89	629	63.14	0.291
run07	gemma3:12b	10000	0.2930	0.5954	45.50	520	25.61	0.610
run08	qwen3:14b	10000	0.3169	0.6202	213.67	4,296	21.82	3.000
run09	deepseek-r1:8b	10000	0.2796	0.5940	210.68	7,702	38.91	3.003
run10	gpt-oss:20b	10000	0.3379	0.6168	39.05	1,412	56.72	0.462
run11	gpt-oss:120b	10000	0.3236	0.6092	55.94	1,511	40.03	1.021

Quic Implementation Benchmark

Experiment identifier: 2026-06-18_19-12-43_964731-quic

Evaluates performance of different QUIC implementations by running repeated file transfers across varying configurations and comparing results.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

EnrichmentExperiment

Visualization website of enrichment runsIncludes the enrichment runs for this experiment across all tested models and parameters.

Visualization website of enrichment runs

OriginalExperiment

Visualization website of original experimentShows the original experiment website without additional descriptive text.

Visualization website of original experiment

ReferenceExperiment

Visualization website of reference experimentShows the reference experiment website with researcher-written descriptions.

Visualization website of reference experiment

PlotGallery

Generated metric plots and summary chartsBrowse all published boxplots and combined summary charts for this experiment.

Plot gallery for this experiment

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric	Value
Benchmark	Quic Implementation Benchmark
Runs	12
Models	6
peek_chars	100, 10000
Prompt Tokens	506,702
Output Tokens	69,723
Total Tokens	576,425
Elapsed (s)	2733.85
GPU Energy (Wh)	37.329
Mean Normalized BERTScore F1	0.3516
Raw Per-Entity BERTScore F1	0.3227
Best Normalized BERTScore Run	run09 (deepseek-r1:8b, peek=10000, 0.3758)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID	Model	peek_chars	Normalized BERTScore F1	Raw BERTScore F1	Elapsed (s)	Output Tokens	Output tok/s	Energy (Wh)
run00	gemma3:4b	100	0.3303	0.2964	96.64	1,035	63.23	0.773
run01	gemma3:12b	100	0.3492	0.3321	104.86	983	25.60	1.241
run02	qwen3:14b	100	0.3549	0.3071	535.07	11,634	23.47	7.885
run03	deepseek-r1:8b	100	0.3749	0.3349	406.44	14,648	39.67	5.778
run04	gpt-oss:20b	100	0.3383	0.3059	91.74	2,953	57.26	1.099
run05	gpt-oss:120b	100	0.3618	0.3394	116.09	2,762	40.90	1.708
run06	gemma3:4b	10000	0.3233	0.2803	61.65	1,096	62.22	0.573
run07	gemma3:12b	10000	0.3476	0.3263	107.88	997	25.16	1.304
run08	qwen3:14b	10000	0.3505	0.3182	540.08	11,455	22.71	7.744
run09	deepseek-r1:8b	10000	0.3758	0.3137	453.41	16,448	38.65	6.457
run10	gpt-oss:20b	10000	0.3502	0.3589	87.42	2,903	56.50	0.993
run11	gpt-oss:120b	10000	0.3629	0.3595	132.57	2,809	35.10	1.775

Multipath

Experiment identifier: 2026-06-18_20-33-56_911701-multipath

Automates deployment and evaluation of a multipath-enabled IPv6 topology to study transport protocols using multiple network paths simultaneously.

Use the links below to compare the original experiment website with the reference experiment website and explore the website of the enrichment runs.

EnrichmentExperiment

Visualization website of enrichment runsIncludes the enrichment runs for this experiment across all tested models and parameters.

Visualization website of enrichment runs

OriginalExperiment

Visualization website of original experimentShows the original experiment website without additional descriptive text.

Visualization website of original experiment

ReferenceExperiment

Visualization website of reference experimentShows the reference experiment website with researcher-written descriptions.

Visualization website of reference experiment

PlotGallery

Generated metric plots and summary chartsBrowse all published boxplots and combined summary charts for this experiment.

Plot gallery for this experiment

Summary Metrics

This summary table provides the high-level orientation for the demo experiment before inspecting individual runs.

Metric	Value
Benchmark	Multipath
Runs	12
Models	6
peek_chars	100, 10000
Prompt Tokens	285,984
Output Tokens	63,906
Total Tokens	349,890
Elapsed (s)	2356.69
GPU Energy (Wh)	31.839
Mean Normalized BERTScore F1	0.1952
Raw Per-Entity BERTScore F1	0.5853
Best Normalized BERTScore Run	run04 (gpt-oss:20b, peek=100, 0.2149)

Detailed Run Comparison

The detailed run table keeps the full data richness of the landing page and can be used to compare models, token output, runtime, and energy consumption within this demo experiment.

Run ID	Model	peek_chars	Normalized BERTScore F1	Raw BERTScore F1	Elapsed (s)	Output Tokens	Output tok/s	Energy (Wh)
run00	gemma3:4b	100	0.1922	0.5920	95.57	1,563	64.10	0.780
run01	gemma3:12b	100	0.1738	0.5777	98.90	1,395	25.90	1.157
run02	qwen3:14b	100	0.1757	0.5750	475.78	10,699	23.60	7.038
run03	deepseek-r1:8b	100	0.2126	0.5891	331.44	11,792	40.35	4.539
run04	gpt-oss:20b	100	0.2149	0.5952	89.56	3,589	57.09	1.130
run05	gpt-oss:120b	100	0.2021	0.5830	104.72	2,915	40.72	1.556
run06	gemma3:4b	10000	0.1922	0.5920	58.48	1,563	63.14	0.530
run07	gemma3:12b	10000	0.1738	0.5777	99.85	1,395	25.45	1.171
run08	qwen3:14b	10000	0.1757	0.5750	487.97	10,699	22.97	6.911
run09	deepseek-r1:8b	10000	0.2126	0.5891	317.49	11,792	39.45	4.418
run10	gpt-oss:20b	10000	0.2149	0.5952	90.55	3,589	56.43	1.032
run11	gpt-oss:120b	10000	0.2021	0.5830	106.39	2,915	40.51	1.575

Combined Comparison

This section compares the demo experiments against each other and then groups the same runs by model profile for a cross-experiment view.

Combined Plots

Combined plot gallery for this node

Aggregates values across all published experiments and tested models on this node.

Visualization website of the combined plots

Experiment Comparison

Experiment	Identifier	Runs	Total Tokens	Elapsed (s)	Mean Normalized BERTScore F1	Mean Raw BERTScore F1	Best Normalized BERTScore Run
MoonGen Latency Test	2026-06-18_18-24-55_986663-moongen	12	203,345	1184.18	0.3109	0.6088	run04 (gpt-oss:20b, peek=100, 0.3415)
Quic Implementation Benchmark	2026-06-18_19-12-43_964731-quic	12	576,425	2733.85	0.3516	0.3227	run09 (deepseek-r1:8b, peek=10000, 0.3758)
Multipath	2026-06-18_20-33-56_911701-multipath	12	349,890	2356.69	0.1952	0.5853	run04 (gpt-oss:20b, peek=100, 0.2149)

Model Comparison

Model	Experiments	Mean Normalized BERTScore F1	Mean Raw BERTScore F1	Mean Elapsed (s)	Mean Output tok/s	Mean Energy/run (Wh)
deepseek-r1:8b	MoonGen Latency Test, Multipath, Quic Implementation Benchmark	0.2905	0.5038	320.96	39.55	4.518
gemma3:12b	MoonGen Latency Test, Multipath, Quic Implementation Benchmark	0.2718	0.5008	82.90	25.63	0.994
gemma3:4b	MoonGen Latency Test, Multipath, Quic Implementation Benchmark	0.2769	0.4983	66.80	63.40	0.574
gpt-oss:120b	MoonGen Latency Test, Multipath, Quic Implementation Benchmark	0.2950	0.5129	95.33	39.65	1.443
gpt-oss:20b	MoonGen Latency Test, Multipath, Quic Implementation Benchmark	0.2996	0.5154	72.63	56.88	0.878
qwen3:14b	MoonGen Latency Test, Multipath, Quic Implementation Benchmark	0.2817	0.5025	407.17	23.03	5.909

Artifacts of NVIDIA DGX Spark

Table of Contents

How to Read This Page

Hardware and Runtime Information

Experiments

MoonGen Latency Test

Summary Metrics

Detailed Run Comparison

Quic Implementation Benchmark

Summary Metrics

Detailed Run Comparison

Multipath

Summary Metrics

Detailed Run Comparison

Combined Comparison

Experiment Comparison

Model Comparison