Skip to content

Obtaining congestion window measurements #5

@03k64

Description

@03k64

I am currently using CCP as part of a research project but I am encountering problems in logging congestion control information correctly, specifically, no logs are produced after around 30 seconds from starting the userspace agent. I am using the reno binary generated from this project, details of modifications to support obtaining measurements are given below.

In impl GenericCongAvoidAlg for Reno in src/reno.rs, a reference to the DatapathInfo struct is passed and additional information about the flow is stored to be logged out later:

fn new_flow(&self, logger: Option<slog::Logger>, info: &DatapathInfo) -> Self::Flow {
    let le_src_ip = info.src_ip.to_be();
    let src_ip = Ipv4Addr::from(le_src_ip);
    let le_dst_ip = info.dst_ip.to_be();
    let dst_ip = Ipv4Addr::from(le_dst_ip);

    Reno {
        logger,
        mss: info.mss,
        init_cwnd: f64::from(info.init_cwnd),
        cwnd: f64::from(info.init_cwnd),
        src_ip: Some(src_ip.to_string()),
        src_port: Some(info.src_port),
        dst_ip: Some(dst_ip.to_string()),
        dst_port: Some(info.dst_port),
    }
}

In impl GenericCongAvoidFlow for Reno in src/reno.rs the following log statement is added in each of the implemented functions in order to provide information about the congestion state of the flow:

self.logger.as_ref().map(|log| {
    info!(log, "curr_cwnd()";
        "curr_cwnd (bytes)" => self.cwnd,
        "mss (bytes)" => self.mss,
        "src_ip" => self.src_ip.as_ref(),
        "src_port" => self.src_port,
        "dst_ip" => self.dst_ip.as_ref(),
        "dst_port" => self.dst_port,
    );
});

In addition, the trait declaration for GenericCongAvoidAlg in src/lib.rs is modified mirroring the implementation detail above:

fn new_flow(&self, logger: Option<slog::Logger>, info: &DatapathInfo) -> Self::Flow;

Some minor modifications were made to src/cubic.rs in order to comply with the modified trait declaration, but these compile successfully and the cubic binary has not been used.

Initially I used the default logging set up which is initialised in src/bin/reno.rs as:

let log = portus::algs::make_logger();

I redirected the standard out and standard error streams to a file and observed the problem with logs stopping after around 30 seconds. I then added the following to src/bin_helper.rs to instead log directly to a file, the chan_size of 2048 was arrived at through trial and error, this figure results in no messages advising of message loss appearing in the resulting file:

pub fn make_logger() -> slog::Logger {
    let now = std::time::SystemTime::now();
    let ts = now.duration_since(std::time::SystemTime::UNIX_EPOCH)
        .expect("Time went backwards");
    let ms_ts = ts.as_millis();

    let log_path = format!("/usr/src/output/server/cwnd/{}.txt", ms_ts);
    
    let file = std::fs::OpenOptions::new()
        .create(true)
        .write(true)
        .truncate(true)
        .open(log_path)
        .unwrap();

    let decorator = slog_term::PlainDecorator::new(file);
    let drain = slog_term::FullFormat::new(decorator).build().fuse();
    let drain = slog_async::Async::new(drain).chan_size(2048).build().fuse();
    slog::Logger::root(drain, o!())
}

So far as I can tell, the main userspace agent is running correctly since the data flows measured at the client side show an approximately equal share of bandwidth being assigned to each but for some reason the log output stops after around 30 seconds. In an attempt to eliminate the slog library as a cause I replaced the above statements with simple println!(...) and eprintln!(...) lines but the same behaviour was observed.

Given this doesn't seem to be a problem with the logging method, is there some behaviour (either here or in portus) that causes the four functions in impl GenericCongAvoidFlow for Reno to no longer be called after a certain point in time?

Additionally, given the example script in the eval-scripts repository uses tcpprobe to measure the congestion window, is there a reason you chose not to instrument the userspace code to obtain this measurement?

I've previously been using the tcp_probe tracepoint to obtain measurements of the congestion window for the kernel datapath using the in-kernel implementations of Reno & Cubic. I'd like to include the QUIC datapath as part of my research and I intended to use the instrumentation detailed above as a single solution for obtaining necessary measurements.

In summary:

  1. Is the absense of further calls to the four functions in the impl GenericCongAvoidFlow for Reno block in src/reno.rs deliberate?
  2. Is there a reason why measurements of the congestion window should not be obtained here?
  3. If there is a reason detailed in answer to question 2, would you be able to provide details of how you obtained congestion window measurements for QUIC in figure 8 in this paper?
  4. If there is not a reason detailed in answer to question 2, would you be able to open to collaborating on a way to obtaining these measurements in the userspace code?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions