I am currently using CCP as part of a research project but I am encountering problems in logging congestion control information correctly, specifically, no logs are produced after around 30 seconds from starting the userspace agent. I am using the reno binary generated from this project, details of modifications to support obtaining measurements are given below.
In impl GenericCongAvoidAlg for Reno in src/reno.rs, a reference to the DatapathInfo struct is passed and additional information about the flow is stored to be logged out later:
fn new_flow(&self, logger: Option<slog::Logger>, info: &DatapathInfo) -> Self::Flow {
let le_src_ip = info.src_ip.to_be();
let src_ip = Ipv4Addr::from(le_src_ip);
let le_dst_ip = info.dst_ip.to_be();
let dst_ip = Ipv4Addr::from(le_dst_ip);
Reno {
logger,
mss: info.mss,
init_cwnd: f64::from(info.init_cwnd),
cwnd: f64::from(info.init_cwnd),
src_ip: Some(src_ip.to_string()),
src_port: Some(info.src_port),
dst_ip: Some(dst_ip.to_string()),
dst_port: Some(info.dst_port),
}
}
In impl GenericCongAvoidFlow for Reno in src/reno.rs the following log statement is added in each of the implemented functions in order to provide information about the congestion state of the flow:
self.logger.as_ref().map(|log| {
info!(log, "curr_cwnd()";
"curr_cwnd (bytes)" => self.cwnd,
"mss (bytes)" => self.mss,
"src_ip" => self.src_ip.as_ref(),
"src_port" => self.src_port,
"dst_ip" => self.dst_ip.as_ref(),
"dst_port" => self.dst_port,
);
});
In addition, the trait declaration for GenericCongAvoidAlg in src/lib.rs is modified mirroring the implementation detail above:
fn new_flow(&self, logger: Option<slog::Logger>, info: &DatapathInfo) -> Self::Flow;
Some minor modifications were made to src/cubic.rs in order to comply with the modified trait declaration, but these compile successfully and the cubic binary has not been used.
Initially I used the default logging set up which is initialised in src/bin/reno.rs as:
let log = portus::algs::make_logger();
I redirected the standard out and standard error streams to a file and observed the problem with logs stopping after around 30 seconds. I then added the following to src/bin_helper.rs to instead log directly to a file, the chan_size of 2048 was arrived at through trial and error, this figure results in no messages advising of message loss appearing in the resulting file:
pub fn make_logger() -> slog::Logger {
let now = std::time::SystemTime::now();
let ts = now.duration_since(std::time::SystemTime::UNIX_EPOCH)
.expect("Time went backwards");
let ms_ts = ts.as_millis();
let log_path = format!("/usr/src/output/server/cwnd/{}.txt", ms_ts);
let file = std::fs::OpenOptions::new()
.create(true)
.write(true)
.truncate(true)
.open(log_path)
.unwrap();
let decorator = slog_term::PlainDecorator::new(file);
let drain = slog_term::FullFormat::new(decorator).build().fuse();
let drain = slog_async::Async::new(drain).chan_size(2048).build().fuse();
slog::Logger::root(drain, o!())
}
So far as I can tell, the main userspace agent is running correctly since the data flows measured at the client side show an approximately equal share of bandwidth being assigned to each but for some reason the log output stops after around 30 seconds. In an attempt to eliminate the slog library as a cause I replaced the above statements with simple println!(...) and eprintln!(...) lines but the same behaviour was observed.
Given this doesn't seem to be a problem with the logging method, is there some behaviour (either here or in portus) that causes the four functions in impl GenericCongAvoidFlow for Reno to no longer be called after a certain point in time?
Additionally, given the example script in the eval-scripts repository uses tcpprobe to measure the congestion window, is there a reason you chose not to instrument the userspace code to obtain this measurement?
I've previously been using the tcp_probe tracepoint to obtain measurements of the congestion window for the kernel datapath using the in-kernel implementations of Reno & Cubic. I'd like to include the QUIC datapath as part of my research and I intended to use the instrumentation detailed above as a single solution for obtaining necessary measurements.
In summary:
- Is the absense of further calls to the four functions in the
impl GenericCongAvoidFlow for Reno block in src/reno.rs deliberate?
- Is there a reason why measurements of the congestion window should not be obtained here?
- If there is a reason detailed in answer to question 2, would you be able to provide details of how you obtained congestion window measurements for QUIC in figure 8 in this paper?
- If there is not a reason detailed in answer to question 2, would you be able to open to collaborating on a way to obtaining these measurements in the userspace code?
I am currently using CCP as part of a research project but I am encountering problems in logging congestion control information correctly, specifically, no logs are produced after around 30 seconds from starting the userspace agent. I am using the
renobinary generated from this project, details of modifications to support obtaining measurements are given below.In
impl GenericCongAvoidAlg for Renoinsrc/reno.rs, a reference to theDatapathInfostruct is passed and additional information about the flow is stored to be logged out later:In
impl GenericCongAvoidFlow for Renoinsrc/reno.rsthe following log statement is added in each of the implemented functions in order to provide information about the congestion state of the flow:In addition, the trait declaration for
GenericCongAvoidAlginsrc/lib.rsis modified mirroring the implementation detail above:Some minor modifications were made to
src/cubic.rsin order to comply with the modified trait declaration, but these compile successfully and the cubic binary has not been used.Initially I used the default logging set up which is initialised in
src/bin/reno.rsas:I redirected the standard out and standard error streams to a file and observed the problem with logs stopping after around 30 seconds. I then added the following to
src/bin_helper.rsto instead log directly to a file, thechan_sizeof2048was arrived at through trial and error, this figure results in no messages advising of message loss appearing in the resulting file:So far as I can tell, the main userspace agent is running correctly since the data flows measured at the client side show an approximately equal share of bandwidth being assigned to each but for some reason the log output stops after around 30 seconds. In an attempt to eliminate the
sloglibrary as a cause I replaced the above statements with simpleprintln!(...)andeprintln!(...)lines but the same behaviour was observed.Given this doesn't seem to be a problem with the logging method, is there some behaviour (either here or in portus) that causes the four functions in
impl GenericCongAvoidFlow for Renoto no longer be called after a certain point in time?Additionally, given the example script in the eval-scripts repository uses
tcpprobeto measure the congestion window, is there a reason you chose not to instrument the userspace code to obtain this measurement?I've previously been using the
tcp_probetracepoint to obtain measurements of the congestion window for the kernel datapath using the in-kernel implementations of Reno & Cubic. I'd like to include the QUIC datapath as part of my research and I intended to use the instrumentation detailed above as a single solution for obtaining necessary measurements.In summary:
impl GenericCongAvoidFlow for Renoblock insrc/reno.rsdeliberate?