๐งช Beta. This API is in beta โ we are not committing to backward compatibility yet. Endpoints and request/response shapes may change without notice during the beta, so build defensively and expect breaking changes. (The path is deliberately
/api/v2-beta/โฆto make this explicit.)
Toloka's Pipeline API (v2) lets you operate the Pipeline product programmatically over HTTP: bring your data, run a pipeline you designed, watch it, and pull the results back.
๐ API reference (interactive docs)
The complete, always-current reference is public โ no login needed to read it:
- Interactive docs (try requests right in the browser): https://platform.toloka.ai/api/v2-beta/scalar
- OpenAPI schema (JSON): https://platform.toloka.ai/api/v2-beta/doc
๐ Authentication
Every data request carries an API-key header:
Authorization: ApiKey <your-api-key>
The documentation pages themselves are public; all data endpoints require the key.
๐ง How it works โ the core concepts
1. You design pipelines in the Toloka app, by hand. The API does not build a pipeline from scratch. You first create a project and design its pipeline โ the workflow itself (for example, "annotate bounding boxes on these images") โ once, in the app UI.
2. The API operates existing pipelines. Once a pipeline exists, the API lets you copy it, attach data to it, run it, watch the run, and download results.
3. Everything is run-based, and a run is a snapshot.
Important: You cannot add new items to an existing run. A run processes a fixed set of dataset items, captured when the run starts. There is no "append an item to a running job."
To process new data, you create a new run. The loop is always:
Take your pipeline โ copy an existing one (or import an exported
.pipelinefile).Attach a dataset to it (new data, or a selection of existing items).
Create a run.
Watch the run via the API until it finishes.
So "send more data" or "re-process" always means: a new pipeline bound to that data โ a new run โ never a change to a live run. This keeps every run reproducible and auditable.
๐ End-to-end workflow
(First build the project and its pipeline in the app. Then, via the API:)
1. Pick or create a project
GET /api/v2-beta/projectsโ list the projects you can accessPOST /api/v2-beta/projectsโ create a project (the pipeline graph is still authored in the app)
2. Prepare a dataset
POST /api/v2-beta/projects/{projectId}/datasetsโ create a dataset-
Add your data โ two ways, depending on size:
Small payloads (โค 2 MB), one synchronous call:
POST /api/v2-beta/datasets/{datasetId}/itemsโ send rows inline as a JSON array and, optionally, declare thefieldsto create. Each row is a JSON object keyed by field name.-
Large files (upload straight to storage, bypassing our servers):
POST /api/v2-beta/datasets/{datasetId}/uploadsโ returns a one-timeuploadUrl(you can declarefieldshere too)Upload the file bytes โ a JSON array of row objects โ directly to that
uploadUrlPOST /api/v2-beta/uploads/{uploadId}/finalizeโ kicks off ingestionGET /api/v2-beta/uploads/{uploadId}โ poll until the status issucceeded
GET /api/v2-beta/datasets/{datasetId}/fieldsโ list a dataset's fields (e.g. aurlfield for image links)
Fields are declared when you add data (on
itemsoruploads); they're created up front and filled as rows land. When you declare fields, each row is mapped into them by name; omit fields and the whole row is stored under anuploadedfield. (There is no standalone "add a field to an existing dataset" โ adding a field after data already exists would require re-propagating values, so declare fields with the data instead.)
3. Bring your pipeline
-
Copy an existing pipeline:
POST /api/v2-beta/pipelines/{pipelineId}/copy - Or import a
.pipelinefile exported from the app:POST /api/v2-beta/projects/{projectId}/pipelines/import - You can also export any pipeline as a portable file:
GET /api/v2-beta/pipelines/{pipelineId}/export
4. Attach the dataset to the pipeline
-
POST /api/v2-beta/pipelines/{pipelineId}/datasetโ binds the dataset to the pipelineOptional
itemLimitโ cap how many items the run processesOptional
itemFilterโ select which items (e.g. by field values)
5. Start the run
POST /api/v2-beta/pipelines/{pipelineId}/runs
6. Watch the run
GET /api/v2-beta/runs/{runId}โ status and progress (total / completed / failed โฆ)GET /api/v2-beta/pipelines/{pipelineId}/runsโ list all runs of a pipeline
7. Get the results
POST /api/v2-beta/datasets/{datasetId}/exportsโ request an export of the fields you want (identify each field by its name or id)GET /api/v2-beta/exports/{exportId}โ poll until ready; the response includes adownloadUrlDownload the result file from
downloadUrl
โป๏ธ "I have more data to process"
Copy the pipeline โ
POST /api/v2-beta/pipelines/{pipelineId}/copyAttach the new dataset / items โ
POST /api/v2-beta/pipelines/{pipelineId}/datasetRun โ
POST /api/v2-beta/pipelines/{pipelineId}/runsWatch โ
GET /api/v2-beta/runs/{runId}
Each batch of data gets its own pipeline and its own run.
For exact request/response shapes, parameters, and examples, use the interactive docs linked at the top โ every endpoint is documented there.