support for per-vGPU mode profiling#165
Conversation
|
Looks like this PR breaks some of the travis tests. |
For profiling counters in per-vGPU mode, this menu requests for vgpu_id and hw_id of each vGPU when vGPUs are created Signed-off-by: Pengyuan Jiao <pengyuan.jiao@intel.com>
When a vGPU is selected on the menu, the related hw_id is passed to client-c through webui, then per-vgpu filtering by hw_id is achieved in gputop-client-c.c in userspace. Signed-off-by: Pengyuan Jiao <pengyuan.jiao@intel.com>
add new option "-vgpu" for gputop-csv to print counter data in per-vGPU mode. Signed-off-by: Pengyuan Jiao <pengyuan.jiao@intel.com>
|
I have modified my patch to pass the travis CI test, please check, thanks. |
| } | ||
| ); | ||
|
|
||
| parser.addArgument( |
There was a problem hiding this comment.
I think you can drop '-vgpu'.
There was a problem hiding this comment.
Thanks, I have dropped '-vgpu'.
| var col_width = 0; | ||
|
|
||
| if (this.pretty_print_csv_) { | ||
|
|
There was a problem hiding this comment.
Please remove the newlines.
There was a problem hiding this comment.
Sorry for my mistake, I have removed this line.
| this.column_titles_.map((line) => { | ||
| this.stream.write(line + this.endl); | ||
| }); | ||
|
|
There was a problem hiding this comment.
Please remove the newlines.
There was a problem hiding this comment.
Sorry for my mistake, I have removed this line.
| } | ||
| } | ||
|
|
||
| var flag = 0; |
There was a problem hiding this comment.
Maybe you find a better name for this.
Using a boolean type seems appropriate too.
Probably make it part of the GputopCSV instance.
There was a problem hiding this comment.
Thanks, I have renamed this parameter with "warning_once" and put it into GputopCSV instance.
| for (var i = 0; i < vgpu_id_.length; i++) { | ||
| if (vgpu_id_[i] === parseInt(args.vgpu)) { | ||
| vgpu_id = vgpu_id_[i]; | ||
| break; |
There was a problem hiding this comment.
Please align this break statement.
There was a problem hiding this comment.
Sorry for my mistake, I have corrected it.
| }).call(this, this.metrics_[i]); | ||
| } | ||
| this.select_metric_set(this.metrics_[0]); | ||
| this.select_metric_set(this.metrics_[0], 0); |
There was a problem hiding this comment.
Why not this.current_hw_id?
There was a problem hiding this comment.
yes, I have replaced this with this.current_hw_id.
| this.update_metric_set_stream(metric); | ||
| var hw_id = this.current_hw_id; | ||
| if (hw_id === undefined) | ||
| return; |
There was a problem hiding this comment.
So with this change, if no hw id is selected, we can't pause the stream?
There was a problem hiding this comment.
Let me briefly introduce my idea. If we create 2 vGPUs, there are 3 states. ctx_mode = ["Global", "vGPU1", "vGPU2"], and corresponding hw_id = ["0", hw_id1, hw_id2]. The "Global" state means the current state, without applying my patch, that is, it shows the whole counter values for any context id. All of the vgpu_id and hw_id starts from non-zero value. Hence I define the hw_id as 0 to map the "Global" state. So in the beginning without selecting any hw id, the current_hw_id is 0, it works in the "Global" state.
As a result, we can pause the stream whether we select one hw id or not.
| { | ||
| repeated uint64 ctx_hw_id = 1; | ||
| repeated uint64 vgpu_id = 2; | ||
| } |
There was a problem hiding this comment.
Does this mean you have the UI query every single vgpu, hw_id tuple one by one?
Sounds like you should return an array instead.
There was a problem hiding this comment.
Yes, I have the UI query each vgpu to the vgpu_id and ctx_hw_id array one by one. Beside of the vgpu_id and ctx_id, we also get the current vgpu number, which is defined as "n_ctx_hw_id" and "n_vgpu_id".
In the function handle_get_features, the array notices[] = "RC6 power saving mode disabled" is sent through defining the "notices" and "n_notices". In the same way, we send the vgpu_id array, ctx_hw_id array, n_ctx_hw_id and n_ctx_hw_id every time after receiving the query in the gputop-server. Then the vgpu_id and ctx_hw_id array can be resolved in the client.
| return res; | ||
| } | ||
|
|
||
| void handle_update_hw_id_map(uint64_t *vgpu_id, |
There was a problem hiding this comment.
This function should be static.
You can remove the double space after void.
There was a problem hiding this comment.
Thanks, I have corrected it.
|
|
||
| while (entry = readdir(vgpu_dir)) { | ||
| if (entry->d_type == DT_DIR && (strlen(entry->d_name)==UUID_length)) { | ||
| int ret_vgpu_id_path = asprintf(&vgpu_id_path, "%s/%s/intel_vgpu/vgpu_id", vgpu_path, entry->d_name); |
There was a problem hiding this comment.
Indentation is wrong here.
You seem to mix 2 different spacing in your change (2 vs 4 spaces), please be consistent with the rest of the file. I believe it's 4 spaces.
There was a problem hiding this comment.
Sorry for my mistake, I have corrected it.
|
Closing since you've opened #166. |
There are 4 commits in the pull-request.
The first one is, same as #164, to fix counter filtering and graph plotting issues when one metric set is selected again. I think you can ignore or close that #164 request.
The last 3 commits are about per virtual GPU counter profiling, which are usage for graphics workloads running inside virtual machines. User can select a specific vGPU on the webui or through gputop-csv to observe each vGPU performance.