Turning an AIS firehose into an analyst's triage queue
A Palantir Foundry prototype that takes a noisy, high-volume stream of vessel positions and turns it into something an analyst can actually work: a short queue of "look at this" instead of an ocean of dots.
This was a private build in a Foundry environment. What follows is the engineering reasoning and architecture — the parts I can speak to generally — with environment and data specifics deliberately left out.
The problem
AIS is the automatic identification signal that ships broadcast — position, heading, speed, identity. In aggregate it's a firehose: high volume, irregular, and messy, with dropped fields, duplicate messages, and the occasional impossible coordinate. An analyst staring at every vessel learns nothing. The job was to compress that stream into the handful of events worth a human's attention.
Architecture
Ingest the live stream, parse and validate it with PySpark, model the cleaned world as an Ontology of vessels and anomaly events, then surface that to an analyst as map-based triage in Workshop. The Ontology is the hinge: it turns rows into domain objects an analyst can navigate and an application can build on.
Decisions & tradeoffs
| Decision | Why | What I traded away |
|---|---|---|
| Model the domain as an Ontology of Vessel / AnomalyEvent objects, not raw tables | Analysts think in vessels and incidents, not joins. Objects make a vessel's track and its anomalies first-class, and give the Workshop app something stable to build on. | More modeling work upfront than dumping the stream into flat tables. |
| Validate and normalize aggressively at ingest with PySpark | The stream is genuinely dirty; if bad coordinates and duplicates reach the Ontology, every downstream triage decision inherits the noise. Better to drop or flag at the boundary. | Risk of discarding edge-case-but-real signals — so questionable records get flagged, not silently dropped. |
| Ship a triage queue in Workshop, not a generic dashboard | The deliverable is a workflow, not a chart. Matching how an analyst actually triages — surface, inspect, decide — is what makes it usable. | Less general-purpose than an open-ended explorer; tuned to one job. |
What was hard
The interesting tension is that "anomaly" is a judgment call. Too sensitive and the triage queue is as useless as the raw firehose; too strict and you miss the thing that mattered. Getting validation and anomaly logic to where the queue was short enough to act on and honest about what it flagged — that was the real work, far more than wiring the pipeline together.
Why I built it
It has the shape of the work I want to do: a messy real-world domain, a high-volume data problem, and a person who has to make decisions from the result. Modeling the world into objects someone can act on — and being honest about the data's limits — is exactly the judgment forward-deployed engineering runs on.