Skip to content

cwccie/privacynet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

privacynet

Privacy-preserving network telemetry — IP anonymization, flow aggregation, and differential privacy for sharing infrastructure data safely.

The Problem

Network engineers and security teams need to share telemetry (NetFlow, sFlow, IPFIX) for collaborative analysis, benchmarking, and research. But raw flow data contains sensitive information: internal IP addresses reveal topology, individual flows expose user behavior, and exact counts can be used to fingerprint organizations.

privacynet provides a composable toolkit to sanitize network telemetry before sharing, with configurable privacy guarantees ranging from simple anonymization to formal differential privacy.

Installation

pip install privacynet

Or from source:

git clone https://github.com/cwccie/privacynet.git
cd privacynet
pip install -e ".[dev]"

Quick Start

import pandas as pd
from privacynet import PrivacyPipeline

# Load your NetFlow data
df = pd.read_csv("flows.csv")

# Apply medium privacy (anonymize IPs + aggregate by time window)
pipeline = PrivacyPipeline(level="medium", key="your-secret-key")
safe_df = pipeline.process(df)

# Share safe_df with collaborators
safe_df.to_csv("safe_flows.csv", index=False)

Privacy Levels

Level Anonymization Aggregation DP Noise Use Case
low Prefix-preserving IPs No No Internal sharing, subnet analysis
medium Prefix-preserving IPs 5-min window No Cross-team sharing, trend analysis
high Prefix-preserving IPs 5-min window Laplace External sharing, published research

Components

IP Anonymization

Three methods for different privacy/utility tradeoffs:

from privacynet import IPAnonymizer

anon = IPAnonymizer(method="prefix", key="secret")

# Prefix-preserving: maintains subnet relationships
anon.anonymize_ip("10.0.1.100")           # → deterministic mapping

# Subnet truncation: simple but effective
anon.anonymize_ip("10.0.1.100", method="truncate")  # → "10.0.1.0"

# Random mapping: maximum privacy, consistent within session
anon.anonymize_ip("10.0.1.100", method="random")    # → random but consistent

# Also supports MAC and hostname anonymization
anon.anonymize_mac("aa:bb:cc:dd:ee:ff")   # → "aa:bb:cc:<hashed>"
anon.anonymize_hostname("db-primary")      # → "host-a3f1b2c9"

Flow Aggregation

Group individual flows to hide specific connections while preserving statistical value:

from privacynet import FlowAggregator

agg = FlowAggregator(min_group_size=5)  # k-anonymity: k=5

# Aggregate by time window
result = agg.temporal_aggregate(df, window="5min")

# Aggregate by source subnet
result = agg.subnet_aggregate(df, prefix_len=24)

# Aggregate by protocol
result = agg.protocol_aggregate(df)

Differential Privacy

Add calibrated noise with formal privacy guarantees:

from privacynet import DPMechanism

dp = DPMechanism(epsilon=1.0)

# Laplace mechanism for numeric values
noisy_bytes = dp.add_laplace_noise(value=150000, sensitivity=1000)

# Private count queries
noisy_count = dp.private_count(true_count=42)

# Track privacy budget
print(f"Budget spent: {dp.budget_spent}")
print(f"Remaining: {dp.remaining_budget(total_budget=10.0)}")

Validation

Verify that anonymized data retains analytical utility:

from privacynet import PrivacyValidator

validator = PrivacyValidator()
report = validator.validate(original_df, anonymized_df, value_cols=["bytes", "packets"])

print(f"Distribution similarity (bytes): {report['dist_similarity_bytes']:.2f}")
print(f"Correlation preservation: {report['correlation_diff']:.4f}")
print(f"Mean relative error (bytes): {report['mre_bytes']:.4f}")

CLI

# Anonymize IPs in a CSV
privacynet anonymize flows.csv --method prefix --key my-secret

# Aggregate by time window
privacynet aggregate flows.csv --window 5min --min-group 5

# Run the full pipeline
privacynet pipeline flows.csv --level high --key my-secret

Development

pip install -e ".[dev]"
pytest --cov=privacynet
ruff check src/ tests/

License

MIT License — Copyright (c) 2026 Corey Wade

About

Privacy-preserving network telemetry — IP anonymization, flow aggregation, differential privacy for infrastructure data sharing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors