Pipeline API (v2) — How to Use It

🧪 Beta. This API is in beta — we are not committing to backward compatibility yet. Endpoints and request/response shapes may change without notice during the beta, so build defensively and expect breaking changes. (The path is deliberately /api/v2-beta/… to make this explicit.)

Toloka's Pipeline API (v2) lets you operate the Pipeline product programmatically over HTTP: bring your data, run a pipeline you designed, watch it, and pull the results back.

📖 API reference (interactive docs)

The complete, always-current reference is public — no login needed to read it:

Interactive docs (try requests right in the browser): https://platform.toloka.ai/api/v2-beta/scalar
OpenAPI schema (JSON): https://platform.toloka.ai/api/v2-beta/doc

🔑 Authentication

Every data request carries an API-key header:

Authorization: ApiKey <your-api-key>

The documentation pages themselves are public; all data endpoints require the key.

🧠 How it works — the core concepts

1. You design pipelines in the Toloka app, by hand. The API does not build a pipeline from scratch. You first create a project and design its pipeline — the workflow itself (for example, "annotate bounding boxes on these images") — once, in the app UI.

2. The API operates existing pipelines. Once a pipeline exists, the API lets you copy it, attach data to it, run it, watch the run, and download results.

3. Everything is run-based, and a run is a snapshot.

Important: You cannot add new items to an existing run. A run processes a fixed set of dataset items, captured when the run starts. There is no "append an item to a running job."

To process new data, you create a new run. The loop is always:

Take your pipeline — copy an existing one (or import an exported .pipeline file).
Attach a dataset to it (new data, or a selection of existing items).
Create a run.
Watch the run via the API until it finishes.

So "send more data" or "re-process" always means: a new pipeline bound to that data → a new run — never a change to a live run. This keeps every run reproducible and auditable.

🚀 End-to-end workflow

(First build the project and its pipeline in the app. Then, via the API:)

1. Pick or create a project

GET /api/v2-beta/projects — list the projects you can access
POST /api/v2-beta/projects — create a project (the pipeline graph is still authored in the app)

2. Prepare a dataset

POST /api/v2-beta/projects/{projectId}/datasets — create a dataset
Add your data — two ways, depending on size:
- Small payloads (≤ 2 MB), one synchronous call: POST /api/v2-beta/datasets/{datasetId}/items — send rows inline as a JSON array and, optionally, declare the fields to create. Each row is a JSON object keyed by field name.
- Large files (upload straight to storage, bypassing our servers):
  1. POST /api/v2-beta/datasets/{datasetId}/uploads → returns a one-time uploadUrl (you can declare fields here too)
  2. Upload the file bytes — a JSON array of row objects — directly to that uploadUrl
  3. POST /api/v2-beta/uploads/{uploadId}/finalize — kicks off ingestion
  4. GET /api/v2-beta/uploads/{uploadId} — poll until the status is succeeded
GET /api/v2-beta/datasets/{datasetId}/fields — list a dataset's fields (e.g. a url field for image links)

Fields are declared when you add data (on items or uploads); they're created up front and filled as rows land. When you declare fields, each row is mapped into them by name; omit fields and the whole row is stored under an uploaded field. (There is no standalone "add a field to an existing dataset" — adding a field after data already exists would require re-propagating values, so declare fields with the data instead.)

3. Bring your pipeline

Copy an existing pipeline: POST /api/v2-beta/pipelines/{pipelineId}/copy
Or import a .pipeline file exported from the app: POST /api/v2-beta/projects/{projectId}/pipelines/import
You can also export any pipeline as a portable file: GET /api/v2-beta/pipelines/{pipelineId}/export

4. Attach the dataset to the pipeline

POST /api/v2-beta/pipelines/{pipelineId}/dataset — binds the dataset to the pipeline
- Optional itemLimit — cap how many items the run processes
- Optional itemFilter — select which items (e.g. by field values)

5. Start the run

POST /api/v2-beta/pipelines/{pipelineId}/runs

6. Watch the run

GET /api/v2-beta/runs/{runId} — status and progress (total / completed / failed …)
GET /api/v2-beta/pipelines/{pipelineId}/runs — list all runs of a pipeline

7. Get the results

POST /api/v2-beta/datasets/{datasetId}/exports — request an export of the fields you want (identify each field by its name or id)
GET /api/v2-beta/exports/{exportId} — poll until ready; the response includes a downloadUrl
Download the result file from downloadUrl

♻️ "I have more data to process"

Copy the pipeline — POST /api/v2-beta/pipelines/{pipelineId}/copy
Attach the new dataset / items — POST /api/v2-beta/pipelines/{pipelineId}/dataset
Run — POST /api/v2-beta/pipelines/{pipelineId}/runs
Watch — GET /api/v2-beta/runs/{runId}

Each batch of data gets its own pipeline and its own run.

For exact request/response shapes, parameters, and examples, use the interactive docs linked at the top — every endpoint is documented there.

0 0

Was this article helpful?

0 out of 0 found this helpful

Articles in this section