Insights / Delivery & Technical

Time-series data: sourcing and delivery

DataSupplier·13 min read

A great deal of valuable external data is time-series: prices, demand, telemetry, weather. It has its own pitfalls around timestamps, gaps and revisions. This guide covers sourcing and delivering time-series data well.

Why time-series needs special care

Time-series data is defined by its time index, and small inconsistencies, time zones, frequencies, revisions, cause large analytical errors. Getting the temporal structure right is as important as the values.

Timestamps and frequency

Confirm the time zone, whether timestamps mark the start or end of a period, and the frequency (and whether it is regular). Mixing conventions across sources is a common, costly mistake.

Gaps and irregularity

Real time-series have gaps, missing readings, holidays, outages. How gaps are represented and filled (or not) affects every downstream calculation. Sourcing should document gap handling.

Revisions and point-in-time

Many series are revised after first publication. For backtesting and audit, point-in-time data (what was known when) matters; using revised data as if it were available earlier creates look-ahead bias.

Storage and delivery

Columnar formats such as Parquet and time-series databases suit large histories; APIs and streams suit live updates. Consistent indexing and metadata keep long histories usable.

In a managed model

A managed partner can align timestamps, document gaps and revisions, and deliver consistent historical and live time-series in your preferred format.

Timestamps, time zones and conventions

Most time-series errors are temporal, not numeric. Confirm the time zone (and how daylight-saving is handled), whether a timestamp marks the start or end of its interval, and whether the frequency is truly regular. Two sources that disagree on any of these will misalign when joined, producing subtle, hard-to-trace errors. Converting every series to one canonical UTC, interval-start convention before analysis removes a whole class of bugs.

Revisions and point-in-time data

Many series, especially economic and operational ones, are revised after first publication. For backtesting, forecasting and audit, you need point-in-time data, the values as they were known on a given date, not the latest revised figures. Using revised data as if it had been available earlier creates look-ahead bias that flatters models and breaks in production. A vintage-aware feed records what was known when, and is worth insisting on.

Key takeaways

Time-series errors usually come from timestamps, frequencies and revisions.
Confirm time zone, period convention and frequency across sources.
Document how gaps are represented and filled.
Use point-in-time data to avoid look-ahead bias in backtesting.

Sources & further reading

Industry references on time-series data management.
Eurostat and statistical agencies: revision policies.
Apache Parquet and time-series database documentation.
Internal practice: DataSupplier time-series delivery.

Need consistent time-series data?

We align timestamps, document gaps and revisions, and deliver clean historical and live series. Get a no-obligation quote.

Request a Quote Book a 30-minute call

Related

Batch vs near-real-time vs real-time: choosing cadence →Data quality: dimensions, validation and acceptance criteria →