| docs | ||
| notebooks | ||
| packages | ||
| scripts | ||
| tests | ||
| .cookiecutter.json | ||
| .editorconfig | ||
| .gitattributes | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| API_REFERENCE.md | ||
| CHANGELOG.md | ||
| CLAUDE.md | ||
| CODE_OF_CONDUCT.md | ||
| CONTRIBUTING.md | ||
| coverage.json | ||
| LICENSE | ||
| Makefile | ||
| package-lock.json | ||
| package.json | ||
| pyproject.toml | ||
| QUICK_START.md | ||
| README.md | ||
| REFACTORING_SUMMARY.md | ||
| stochastic-analysis.md | ||
| test_data.db | ||
| test_full_pipeline_results.png | ||
| test_geometry_results.png | ||
| twbii_log.sqlite3 | ||
| twbii_log_schema.sql | ||
| USER_GUIDE.md | ||
| uv.lock | ||
phdsan - PhD Structural Analysis for Offshore Wind
phdsan is a Python package for structural health monitoring and dynamic analysis of offshore wind turbine installations. It provides tools for processing sensor data, performing advanced signal analysis, and extracting structural parameters from vibration measurements collected during installation operations.
Table of Contents
- Features
- Installation
- Quick Start
- Tutorials
- Package Structure
- Database Schema
- CLI Reference
- API Documentation
- Development
- Citation
- License
Features
Data Management
- DuckDB-based storage: Efficient time-series database with full reproducibility tracking
- Multi-sensor support: TOM boxes (IMU + GPS), MSR sensors, environmental data
- Data catalog system: Automatic discovery and ingestion status tracking
- Environmental data integration: Wave, wind, lidar, and atmospheric measurements
Signal Processing
- Filtering: Butterworth bandpass/highpass/lowpass filters with zero-phase option
- Integration: Double integration with drift correction (detrending, highpass)
- Geometric transforms: GPS-based coordinate transformations for nacelle alignment
- Envelope analysis: Hilbert transform-based envelope extraction
- Wavelet analysis: Continuous wavelet transforms with ridge detection
Advanced Analysis
-
Stochastic methods:
- Narrowband signal analysis (RMS, peak, crest factor, spectral bandwidth)
- Rayleigh distribution testing
- Random decrement damping estimation
-
Nonlinear dynamics:
- Phase space reconstruction (Takens embedding, optimal delay/dimension selection)
- Lyapunov exponent estimation
- Correlation dimension (Grassberger-Procaccia algorithm)
- Bispectrum/bicoherence for quadratic coupling detection
-
Pattern recognition:
- Dynamic Time Warping (DTW) with constraints (Sakoe-Chiba, Itakura)
- Signal similarity analysis
-
Fatigue analysis:
- Rainflow cycle counting (ASTM E1049)
- Cumulative fatigue damage (Palmgren-Miner rule)
- S-N curve support
-
Environmental correlation:
- Metocean data ingestion (ERA5, buoy data)
- Wind-wave-structure correlation analysis
- Time-aligned multi-source analysis
Reproducibility
- Processing run tracking: Every analysis records full configuration, git commit, and environment
- Data lineage: Track source data for all derived results
- Configuration validation: Type-checked dataclass configs with comprehensive validation
- Version control integration: Automatic git commit hash recording
Installation
Requirements
- Python 3.13 or higher
- uv package manager (recommended)
From Source
# Clone the repository
git clone https://github.com/flucto-gmbh/phdsan.git
cd phdsan
# Install with uv (recommended)
make install
# or manually:
uv sync && uv run pre-commit install
# Verify installation
uv run pytest tests/
Development Installation
# Install with development dependencies
make install
uv run pre-commit install
# Run tests
make test
# Run linter
uv run ruff check --fix
# Build documentation (if available)
# make docs
Quick Start
1. Initialize a Database
from phdsan_db.connection import create_database, DatabaseConnection
# Create a new database
create_database("turbine_data.db")
# Use the database
with DatabaseConnection("turbine_data.db") as conn:
# Your analysis here
pass
2. Ingest Sensor Data
from phdsan_db.ingestion import ingest_from_catalog
from phdsan_db.catalog.scanner import scan_directory
# Scan for available data
catalog_entries = scan_directory("data/raw/turbine_20")
# Ingest TOM data
with DatabaseConnection("turbine_data.db") as conn:
for entry in catalog_entries:
ingest_from_catalog(conn, entry)
3. Process Acceleration Data
from phdsan_processing.pipeline import process_tom_to_deflection
from phdsan_core.config import ProcessingConfig
# Configure processing
config = ProcessingConfig(
filter_type="butterworth",
filter_order=4,
highpass_freq=0.01,
lowpass_freq=1.0,
sampling_rate_hz=33.3333
)
# Process data
with DatabaseConnection("turbine_data.db") as conn:
deflection_df = process_tom_to_deflection(
conn,
turbine_id="20_BW27",
location="helihoist-1",
start_time="2021-07-15 10:00",
end_time="2021-07-15 12:00",
config=config
)
4. Perform Analysis
from phdsan_analysis.nonlinear.phase_space import analyze_phase_space_embedding
from phdsan_analysis.fatigue.rainflow import analyze_rainflow_cycles
# Phase space reconstruction
delay, dim, diagnostics = analyze_phase_space_embedding(
deflection_df,
component="deflection"
)
# Rainflow counting
cycles_df = analyze_rainflow_cycles(deflection_df, component="deflection")
Tutorials
Tutorial 1: Data Ingestion
This tutorial demonstrates how to discover, catalog, and ingest sensor data into the database.
Step 1: Scan for available data
from phdsan_db.catalog.scanner import scan_directory
from phdsan_db.connection import create_database, DatabaseConnection
# Create database
create_database("tutorial.db")
# Scan raw data directory
catalog_entries = scan_directory(
"data/raw/turbine_20_BW27",
turbine_id="20_BW27",
recursive=True
)
print(f"Found {len(catalog_entries)} data sources")
for entry in catalog_entries:
print(f" {entry['sensor']}/{entry['location']}: {entry['file_count']} files")
Step 2: Ingest data with progress tracking
from phdsan_db.ingestion import ingest_from_catalog
from phdsan_db.models import get_catalog_entries
with DatabaseConnection("tutorial.db") as conn:
# Ingest each catalog entry
for i, entry in enumerate(catalog_entries, 1):
print(f"[{i}/{len(catalog_entries)}] Ingesting {entry['sensor']}/{entry['location']}...")
rows_inserted = ingest_from_catalog(conn, entry)
print(f" → Inserted {rows_inserted:,} records")
# Verify ingestion
catalog = get_catalog_entries(conn)
print(f"\nTotal catalog entries: {len(catalog)}")
Step 3: Query ingested data
from phdsan_db.models import get_tom_data
from datetime import datetime
with DatabaseConnection("tutorial.db") as conn:
# Get data for specific time range
df = get_tom_data(
conn,
turbine_id="20_BW27",
location="helihoist-1",
start_time=datetime(2021, 7, 15, 10, 0),
end_time=datetime(2021, 7, 15, 12, 0)
)
print(f"Retrieved {len(df)} measurements")
print(f"Time range: {df['timestamp'].min()} to {df['timestamp'].max()}")
print(f"Columns: {list(df.columns)}")
Tutorial 2: Signal Processing
This tutorial shows how to filter and integrate acceleration data to compute deflections.
Step 1: Load raw acceleration data
from phdsan_db.connection import DatabaseConnection
from phdsan_db.models import get_tom_data
import pandas as pd
with DatabaseConnection("tutorial.db") as conn:
df = get_tom_data(
conn,
turbine_id="20_BW27",
location="helihoist-1",
start_time="2021-07-15 10:00",
end_time="2021-07-15 12:00"
)
# Set timestamp as index
df = df.set_index("timestamp")
print(f"Sampling rate: ~{1/df.index.to_series().diff().median().total_seconds():.1f} Hz")
Step 2: Apply bandpass filtering
from phdsan_processing.filters import apply_bandpass_filter
from phdsan_core.config import FilterConfig
# Configure filter
filter_config = FilterConfig(
filter_type="butterworth",
filter_order=4,
lowcut_hz=0.01,
highcut_hz=1.0,
sampling_rate_hz=33.3333,
zero_phase=True
)
# Filter acceleration
df_filtered = df.copy()
for axis in ['acc_x', 'acc_y', 'acc_z']:
df_filtered[f"{axis}_filtered"] = apply_bandpass_filter(
df[axis].to_numpy(),
config=filter_config
)
print("Applied bandpass filter (0.01-1.0 Hz)")
Step 3: Integrate to velocity and displacement
from phdsan_processing.integration import double_integrate
from phdsan_core.config import IntegrationConfig
# Configure integration with drift correction
integration_config = IntegrationConfig(
sampling_rate_hz=33.3333,
detrend_method="linear",
apply_highpass=True,
highpass_freq=0.01,
highpass_order=2
)
# Integrate filtered Z-axis acceleration
velocity, displacement = double_integrate(
df_filtered['acc_z_filtered'].to_numpy(),
config=integration_config
)
df_filtered['vel_z'] = velocity
df_filtered['pos_z'] = displacement
print(f"Displacement range: {displacement.min():.3f} to {displacement.max():.3f} m")
Step 4: Visualize results
import matplotlib.pyplot as plt
fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)
# Acceleration
axes[0].plot(df_filtered.index, df_filtered['acc_z_filtered'])
axes[0].set_ylabel("Acceleration (m/s²)")
axes[0].set_title("Filtered Acceleration (Z-axis)")
axes[0].grid(True)
# Velocity
axes[1].plot(df_filtered.index, df_filtered['vel_z'])
axes[1].set_ylabel("Velocity (m/s)")
axes[1].set_title("Integrated Velocity")
axes[1].grid(True)
# Displacement
axes[2].plot(df_filtered.index, df_filtered['pos_z'])
axes[2].set_ylabel("Displacement (m)")
axes[2].set_title("Double-Integrated Displacement")
axes[2].grid(True)
axes[2].set_xlabel("Time")
plt.tight_layout()
plt.savefig("signal_processing_results.png", dpi=150)
Tutorial 3: Nonlinear Analysis
This tutorial demonstrates phase space reconstruction and Lyapunov exponent estimation.
Step 1: Prepare displacement data
# Using displacement from Tutorial 2
signal = df_filtered['pos_z'].to_numpy()
fs = 33.3333 # Hz
print(f"Signal length: {len(signal)} samples ({len(signal)/fs:.1f} seconds)")
Step 2: Find optimal embedding parameters
from phdsan_analysis.nonlinear.phase_space import analyze_phase_space_embedding
from phdsan_core.config import PhaseSpaceConfig
# Configure phase space analysis
ps_config = PhaseSpaceConfig(
time_delay_method="ami", # Average Mutual Information
ami_max_delay=100,
ami_bins=16,
max_embedding_dim=10,
fnn_threshold=0.05
)
# Find optimal parameters
optimal_delay, optimal_dim, diagnostics = analyze_phase_space_embedding(
df_filtered,
component="pos_z",
config=ps_config
)
print(f"Optimal time delay: {optimal_delay} samples")
print(f"Optimal embedding dimension: {optimal_dim}")
Step 3: Visualize phase space
from phdsan_analysis.nonlinear.phase_space import embed_time_series
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Create 3D embedding
embedded = embed_time_series(signal, embedding_dim=3, time_delay=optimal_delay)
# Plot 3D phase space
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot(embedded[:, 0], embedded[:, 1], embedded[:, 2],
linewidth=0.5, alpha=0.7)
ax.set_xlabel("x(t)")
ax.set_ylabel(f"x(t + {optimal_delay}Δt)")
ax.set_zlabel(f"x(t + {2*optimal_delay}Δt)")
ax.set_title("3D Phase Space Reconstruction")
plt.savefig("phase_space_3d.png", dpi=150)
Step 4: Estimate Lyapunov exponent
from phdsan_analysis.nonlinear.lyapunov import estimate_largest_lyapunov_exponent
# Estimate largest Lyapunov exponent
lambda_max, is_chaotic, diagnostics = estimate_largest_lyapunov_exponent(
signal,
embedding_dim=optimal_dim,
time_delay=optimal_delay,
sampling_rate=fs
)
print(f"Largest Lyapunov exponent: {lambda_max:.6f}")
print(f"Is chaotic: {is_chaotic}")
print(f"Confidence interval: [{diagnostics['confidence_interval'][0]:.6f}, {diagnostics['confidence_interval'][1]:.6f}]")
Tutorial 4: Fatigue Analysis
This tutorial shows how to perform rainflow cycle counting and estimate fatigue damage.
Step 1: Extract stress cycles
from phdsan_analysis.fatigue.rainflow import analyze_rainflow_cycles
# Perform rainflow counting on displacement signal
cycles_df = analyze_rainflow_cycles(
df_filtered,
component="pos_z",
use_residuals=True
)
print(f"Detected {len(cycles_df)} cycles")
print(f"Range: {cycles_df['range'].min():.4f} to {cycles_df['range'].max():.4f}")
print(f"Total counted cycles: {cycles_df['count'].sum():.1f}")
Step 2: Visualize cycle distribution
import matplotlib.pyplot as plt
import numpy as np
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Range histogram
axes[0].hist(cycles_df['range'], bins=50, weights=cycles_df['count'],
edgecolor='black', alpha=0.7)
axes[0].set_xlabel("Cycle Range")
axes[0].set_ylabel("Cycle Count")
axes[0].set_title("Rainflow Cycle Range Distribution")
axes[0].grid(True, alpha=0.3)
# Range-mean scatter
scatter = axes[1].scatter(cycles_df['mean'], cycles_df['range'],
c=cycles_df['count'], s=cycles_df['count']*10,
alpha=0.6, cmap='viridis')
axes[1].set_xlabel("Cycle Mean")
axes[1].set_ylabel("Cycle Range")
axes[1].set_title("Rainflow Cycle Distribution (Mean vs Range)")
axes[1].grid(True, alpha=0.3)
plt.colorbar(scatter, ax=axes[1], label="Count")
plt.tight_layout()
plt.savefig("rainflow_cycles.png", dpi=150)
Step 3: Estimate fatigue damage
from phdsan_analysis.fatigue.damage import estimate_fatigue_damage
from phdsan_core.config import FatigueConfig
# Configure S-N curve for steel (example)
fatigue_config = FatigueConfig(
sn_curve_name="Steel-C",
sn_exponent=3.0, # Wöhler exponent (m)
stress_scaling_factor=1.0 # Adjust if needed
)
# Estimate cumulative damage
damage_result = estimate_fatigue_damage(
cycles_df,
config=fatigue_config
)
print(f"Total Miner damage: {damage_result['total_damage']:.6f}")
print(f"Estimated lifetime: {damage_result['lifetime_years']:.1f} years")
print(f"Most damaging cycle range: {damage_result['max_damage_cycle_range']:.4f}")
Tutorial 5: Environmental Correlation
This tutorial demonstrates correlating structural response with environmental conditions.
Step 1: Ingest environmental data
from phdsan_io.wave_parser import parse_wave_data
from phdsan_io.wind_parser import parse_wind_data
from phdsan_db.environmental_models import insert_wave_measurements, insert_wind_measurements
# Parse wave data (example: buoy data)
wave_df = parse_wave_data("data/environmental/buoy_waves.csv", source="Buoy_X")
# Parse wind data
wind_df = parse_wind_data("data/environmental/met_mast.csv", source="MetMast_Y")
# Ingest into database
with DatabaseConnection("tutorial.db") as conn:
insert_wave_measurements(conn, wave_df)
insert_wind_measurements(conn, wind_df)
print(f"Ingested {len(wave_df)} wave measurements")
print(f"Ingested {len(wind_df)} wind measurements")
Step 2: Time-align structural and environmental data
from phdsan_io.aggregation import aggregate_timeseries
# Aggregate to hourly bins
structural_hourly = aggregate_timeseries(
df_filtered[['pos_z']],
freq='1H',
aggregation={'pos_z': ['mean', 'std', 'max', 'min']}
)
# Query environmental data
with DatabaseConnection("tutorial.db") as conn:
wave_data = conn.execute("""
SELECT timestamp, significant_height, peak_period, wave_direction
FROM wave_measurements
WHERE source = 'Buoy_X'
ORDER BY timestamp
""").df()
wave_data = wave_data.set_index('timestamp')
# Merge on timestamp
combined = structural_hourly.join(wave_data, how='inner')
print(f"Combined dataset: {len(combined)} hourly records")
Step 3: Correlation analysis
from phdsan_analysis.environmental.correlation import analyze_environmental_correlation
# Analyze correlations
correlation_results = analyze_environmental_correlation(
combined,
structural_component="pos_z_max",
environmental_components=["significant_height", "peak_period"]
)
print("Correlation Results:")
for env_var, result in correlation_results.items():
print(f" {env_var}:")
print(f" Pearson r = {result['pearson_r']:.3f} (p = {result['p_value']:.4f})")
print(f" Spearman ρ = {result['spearman_rho']:.3f}")
Step 4: Visualize correlations
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Displacement vs Wave Height
axes[0].scatter(combined['significant_height'], combined['pos_z_max'],
alpha=0.5, s=20)
axes[0].set_xlabel("Significant Wave Height (m)")
axes[0].set_ylabel("Max Displacement (m)")
axes[0].set_title(f"Displacement vs Wave Height\\n(r = {correlation_results['significant_height']['pearson_r']:.3f})")
axes[0].grid(True, alpha=0.3)
# Displacement vs Wave Period
axes[1].scatter(combined['peak_period'], combined['pos_z_max'],
alpha=0.5, s=20, color='orange')
axes[1].set_xlabel("Peak Wave Period (s)")
axes[1].set_ylabel("Max Displacement (m)")
axes[1].set_title(f"Displacement vs Wave Period\\n(r = {correlation_results['peak_period']['pearson_r']:.3f})")
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("environmental_correlation.png", dpi=150)
Package Structure
This is a uv workspace monorepo with multiple packages:
phdsan/
├── packages/
│ ├── phdsan-core/ # Core types, config, exceptions
│ │ └── src/phdsan_core/
│ ├── phdsan-db/ # Database connection, schema, models, catalog
│ │ └── src/phdsan_db/
│ ├── phdsan-io/ # Data parsers (TOM, MSR, wave, wind, LIDAR)
│ │ └── src/phdsan_io/
│ ├── phdsan-processing/ # Filtering, integration, pipelines
│ │ └── src/phdsan_processing/
│ ├── phdsan-analysis/ # Analysis modules
│ │ └── src/phdsan_analysis/
│ │ ├── fatigue/ # Rainflow, damage estimation
│ │ ├── nonlinear/ # Phase space, Lyapunov, bispectrum
│ │ ├── similarity/ # Dynamic Time Warping
│ │ ├── stochastic/ # Envelope, Rayleigh, random decrement
│ │ └── timefreq/ # Wavelet analysis
│ ├── phdsan-ais/ # AIS vessel tracking
│ │ └── src/phdsan_ais/
│ ├── phdsan-windfarm/ # Windfarm matching and installations
│ │ └── src/phdsan_windfarm/
│ ├── phdsan-metocean/ # ERA5 environmental data
│ │ └── src/phdsan_metocean/
│ └── phdsan-cli/ # Command-line interface
│ └── src/phdsan_cli/
├── tests/ # Test suite
├── notebooks/ # Example Jupyter notebooks
└── docs/ # Sphinx documentation
Database Schema
The phdsan database uses DuckDB and contains the following table groups:
Raw Sensor Data
tom_measurements: TOM box IMU + GPS data (~30 Hz)msr_measurements: MSR accelerometer datainstallations: Sensor installation metadata
Environmental Data
wave_measurements: Wave height, period, directionwind_measurements: Wind speed, direction, turbulencelidar_measurements: Vertical wind profilesatmospheric_measurements: Gridded reanalysis data (ERA5)
Processing Metadata
processing_runs: Configuration, git hash, environmentdata_lineage: Source data tracking
Processed Data
processed_acceleration: Filtered accelerationsprocessed_position: Integrated positions/velocities
Analysis Results
- Stochastic:
envelope_analysis,narrowband_metrics,rayleigh_tests,random_decrement_results - Nonlinear:
phase_space_embeddings,lyapunov_results,correlation_dimension,bispectrum_analysis - Similarity:
dtw_pairwise - Time-Frequency:
wavelet_analysis_summary,wavelet_ridges - Fatigue:
rainflow_cycles,fatigue_damage
See packages/phdsan-db/src/phdsan_db/schema.py for detailed schema definitions.
CLI Reference
Data Management
# Scan directory for available data
phdsan catalog scan data/raw/turbine_20 --turbine-id 20_BW27 --output catalog.json
# Ingest data into database
phdsan ingest tom-measurements data.db data/raw/turbine_20/tom/helihoist-1 \\
--turbine-id 20_BW27 --location helihoist-1
# Clean database (drop all tables)
phdsan clean data.db --confirm
Signal Processing
# Process TOM data to deflection
phdsan process tom-to-deflection data.db 20_BW27 helihoist-1 \\
--start "2021-07-15 10:00" --end "2021-07-15 12:00" \\
--output deflection.parquet \\
--filter-type butterworth --highpass 0.01 --lowpass 1.0
Analysis
# Rainflow cycle counting
phdsan analyze rainflow deflection.parquet --component deflection \\
--output cycles.csv
# Fatigue damage estimation
phdsan analyze fatigue-damage cycles.csv --sn-exponent 3.0 \\
--output damage.json
# Random decrement damping estimation
phdsan analyze random-decrement deflection.parquet --component deflection \\
--trigger-method level_crossing --threshold 1.5 \\
--output damping.json
API Documentation
Core Configuration Classes
All analysis functions use type-checked @dataclass configurations:
from phdsan_core.config import (
FilterConfig,
IntegrationConfig,
PhaseSpaceConfig,
BispectrumConfig,
FatigueConfig,
RandomDecrementConfig
)
# Example: Configure filter
filter_config = FilterConfig(
filter_type="butterworth",
filter_order=4,
lowcut_hz=0.01,
highcut_hz=1.0,
sampling_rate_hz=33.3333,
zero_phase=True
)
# Validate configuration
filter_config.validate() # Raises InvalidParameterError if invalid
Exception Hierarchy
from phdsan_core.exceptions import (
PhdsanError, # Base exception
DataValidationError, # Data validation failures
InvalidParameterError, # Invalid config parameters
IncompatibleConfigError, # Incompatible config combinations
ProcessingError # Processing failures
)
See API.md for complete API reference (if available).
Development
Running Tests
# Run all tests
make test
# or
uv run pytest
# Run specific test module
uv run pytest tests/test_analysis/test_nonlinear/
# Run with coverage
uv run pytest --cov=src/phdsan --cov-report=html
Code Quality
# Run linter
uv run ruff check --fix
# Run formatter
uv run ruff format
# Type checking (if configured)
uv run mypy src/
Pre-commit Hooks
Pre-commit hooks automatically run on every commit:
- TOML/YAML syntax validation
- Trailing whitespace removal
- Ruff linting and formatting
To run manually:
uv run pre-commit run --all-files
Citation
If you use this software in your research, please cite:
@software{phdsan2025,
title = {phdsan: Structural Analysis for Offshore Wind},
author = {[Your Name]},
year = {2025},
url = {https://github.com/flucto-gmbh/phdsan}
}
License
[Specify license here - e.g., MIT, Apache 2.0, etc.]
Acknowledgments
This work was conducted as part of PhD research on offshore wind turbine structural health monitoring.
Issues and Contributing
If you encounter any problems, please file an issue with a detailed description.
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Project Status: Active development | Test coverage: 61% | Tests: 545 passing
For more information, see the documentation or example notebooks.