SpiecsEngine
 
Loading...
Searching...
No Matches

◆ Create()

void Spices::NsightPerfGPUProfilerContinuous::Create ( VulkanState & state)

Begin this Session.

Initialize the periodic sampler.

Create the metrics evaluator.

transfer ownership to m_metricsEvaluator.

Create the config builder, this is used to create a counter configuration.

transfer pRawMetricsConfig's ownership to configBuilder.

Add metrics into config builder.

By setting "keepInstances" to false, the counter data will only store GPU-level values, reducing its size and improving the performance of metric evaluation. However, this option has the drawback of making max/min submetrics non-evaluable.

Create the counter configuration out of the config builder.

Periodic sampler supports only single-pass configurations, meaning that all scheduled metrics must be collectable in a single pass.

Initialize the counter data Below setting determines the maximum size of a counter data image. However, because the counter data here is requested to work in the ring buffer mode, when the put pointer reaches the end, it will start from the beginning and overwrite previous data even if it hasn't been read yet. Therefore, the size specified here must be sufficient to cover the latency.

Update the metrics evaluator with the actual device's attributes stored in the counter data.

Output the header in CSV format.

Start a periodic sampler session.

Apply the previously generated counter configuration to the periodic sampler.

Start sampling. Ideally, sampling should only start right before executing the target workloads to prevent the record buffer from being occupied by records generated by GPU triggers before the target workloads. However, in this use case, it is acceptable because the trigger source is set to "NVPW_GPU_PERIODIC_SAMPLER_TRIGGER_SOURCE_GPU_ENGINE_TRIGGER", which doesn't automatically generate GPU triggers but relies on clients manually pushing triggers through the command list. Furthermore, since the metric configuration used is for low-speed sampling, no "overflow prevention records" will be emitted.

Set InSession true.

Initialize the periodic sampler.

Create the metrics evaluator.

transfer ownership to m_metricsEvaluator.

Create the config builder, this is used to create a counter configuration.

transfer pRawMetricsConfig's ownership to configBuilder.

Add metrics into config builder.

By setting "keepInstances" to false, the counter data will only store GPU-level values, reducing its size and improving the performance of metric evaluation. However, this option has the drawback of making max/min submetrics non-evaluable.

Create the counter configuration out of the config builder.

Periodic sampler supports only single-pass configurations, meaning that all scheduled metrics must be collectable in a single pass.

Initialize the counter data Below setting determines the maximum size of a counter data image. However, because the counter data here is requested to work in the ring buffer mode, when the put pointer reaches the end, it will start from the beginning and overwrite previous data even if it hasn't been read yet. Therefore, the size specified here must be sufficient to cover the latency.

Update the metrics evaluator with the actual device's attributes stored in the counter data.

Output the header in CSV format.

Start a periodic sampler session.

Apply the previously generated counter configuration to the periodic sampler.

Start sampling. Ideally, sampling should only start right before executing the target workloads to prevent the record buffer from being occupied by records generated by GPU triggers before the target workloads. However, in this use case, it is acceptable because the trigger source is set to "NVPW_GPU_PERIODIC_SAMPLER_TRIGGER_SOURCE_GPU_ENGINE_TRIGGER", which doesn't automatically generate GPU triggers but relies on clients manually pushing triggers through the command list. Furthermore, since the metric configuration used is for low-speed sampling, no "overflow prevention records" will be emitted.

Set InSession true.

Definition at line 59 of file NsightPerfGPUProfilerContinuous.cpp.

60 {
62
63 //NSPERF_CHECK(nv::perf::InitializeNvPerf())
64 NSPERF_CHECK(nv::perf::VulkanIsNvidiaDevice(state.m_PhysicalDevice))
65 const size_t deviceIndex = nv::perf::VulkanGetNvperfDeviceIndex(state.m_Instance, state.m_PhysicalDevice, state.m_Device);
66
70 NSPERF_CHECK(sampler.Initialize(deviceIndex))
71 const nv::perf::DeviceIdentifiers deviceIdentifiers = sampler.GetDeviceIdentifiers();
72
76 {
77 std::vector<uint8_t> metricsEvaluatorScratchBuffer;
78 NVPW_MetricsEvaluator* pMetricsEvaluator = nv::perf::sampler::DeviceCreateMetricsEvaluator(metricsEvaluatorScratchBuffer, deviceIdentifiers.pChipName);
79
83 metricsEvaluator = nv::perf::MetricsEvaluator(pMetricsEvaluator, std::move(metricsEvaluatorScratchBuffer));
84 }
85
89 nv::perf::MetricsConfigBuilder configBuilder;
90 {
91 NVPA_RawMetricsConfig* pRawMetricsConfig = nv::perf::sampler::DeviceCreateRawMetricsConfig(deviceIdentifiers.pChipName);
92
96 NSPERF_CHECK(configBuilder.Initialize(metricsEvaluator, pRawMetricsConfig, deviceIdentifiers.pChipName))
97 }
98
102 for (size_t ii = 0; ii < sizeof(Metrics) / sizeof(Metrics[0]); ++ii)
103 {
104 const char* const pMetric = Metrics[ii];
105 NVPW_MetricEvalRequest request{};
106 NSPERF_CHECK(ToMetricEvalRequest(metricsEvaluator, pMetric, request))
107
108
112 constexpr bool keepInstances = false;
113 NSPERF_CHECK(configBuilder.AddMetrics(&request, 1, keepInstances))
114 metricEvalRequests.emplace_back(std::move(request));
115 }
116
120 nv::perf::CounterConfiguration counterConfiguration;
121 NSPERF_CHECK(CreateConfiguration(configBuilder, counterConfiguration))
122
123
126 assert(counterConfiguration.numPasses == 1);
127
134 constexpr uint32_t MaxSamples = 1024;
135 constexpr bool Validate = true; // Setting this to true enables extra validation, which is useful for debugging. In production environments, it can be set to false for improved performance.
136 NSPERF_CHECK(counterData.Initialize(
137 MaxSamples ,
138 Validate ,
139 [&](
140 uint32_t maxSamples ,
141 NVPW_PeriodicSampler_CounterData_AppendMode appendMode ,
142 std::vector<uint8_t>& counterData
143 )
144 {
145 return nv::perf::sampler::GpuPeriodicSamplerCreateCounterData(
146 deviceIndex ,
147 counterConfiguration.counterDataPrefix.data() ,
148 counterConfiguration.counterDataPrefix.size() ,
149 maxSamples ,
150 appendMode ,
151 counterData
152 );
153 }))
154
155
158 NSPERF_CHECK(MetricsEvaluatorSetDeviceAttributes(
160 counterData.GetCounterData().data() ,
161 counterData.GetCounterData().size()
162 ))
163
164
167 {
168 std::cout << "StartTime, EndTime, Duration";
169 const auto countersEnumerator = EnumerateCounters(metricsEvaluator);
170 const auto ratiosEnumerator = EnumerateRatios(metricsEvaluator);
171 const auto throughputsEnumerator = EnumerateThroughputs(metricsEvaluator);
172 for (const NVPW_MetricEvalRequest& metricEvalRequest : metricEvalRequests)
173 {
174 std::cout << ", " << ToString(countersEnumerator, ratiosEnumerator, throughputsEnumerator, metricEvalRequest);
175 }
176 std::cout << "\n";
177 }
178
182 constexpr size_t SamplingFrequency = 120; // 120 Hz
183 constexpr size_t samplingIntervalInNanoSeconds = 1000 * 1000 * 1000 / SamplingFrequency;
184 constexpr size_t MaxDecodeLatencyInNanoSeconds = 1000 * 1000 * 1000 * 10; // tolerate maximum DecodeCounters() latency up to 1 second
185 const nv::perf::sampler::GpuPeriodicSampler::GpuPulseSamplingInterval samplingInterval = sampler.GetGpuPulseSamplingInterval(samplingIntervalInNanoSeconds);
186 const size_t maxNumUndecodedSamples = MaxDecodeLatencyInNanoSeconds / samplingIntervalInNanoSeconds;
187 size_t recordBufferSize = 0;
188 NSPERF_CHECK(nv::perf::sampler::GpuPeriodicSamplerCalculateRecordBufferSize(deviceIndex, counterConfiguration.configImage, maxNumUndecodedSamples, recordBufferSize))
189
190 const size_t MaxNumUndecodedSamplingRanges = 1; // must be 1
191 NSPERF_CHECK(sampler.BeginSession(
192 recordBufferSize ,
193 MaxNumUndecodedSamplingRanges ,
194 { samplingInterval.triggerSource } ,
195 samplingInterval.samplingInterval
196 ))
197
198
201 constexpr size_t passIndex = 0; // This is a single-pass configuration, so the pass index is fixed at 0.
202 NSPERF_CHECK(sampler.SetConfig(counterConfiguration.configImage, passIndex))
203
210 NSPERF_CHECK(sampler.StartSampling())
211
215 m_IsInSession = true;
216 }
#define NSPERF_CHECK(val)
#define SPICES_PROFILE_ZONE
std::vector< NVPW_MetricEvalRequest > metricEvalRequests
Metrics requests.
nv::perf::MetricsEvaluator metricsEvaluator
MetricsEvaluator.
nv::perf::sampler::RingBufferCounterData counterData
This is used to store the counter values collected during profiling.
nv::perf::sampler::GpuPeriodicSampler sampler
the periodic sampler.
const char * Metrics[]
The following metrics are for demonstration purposes only. For a more comprehensive set of single-pas...

References m_IsInSession.

Referenced by BeginFrame(), and NsightPerfGPUProfilerContinuous().