Layer 0 — Token Intelligence
Every article enters Polari through Layer 0. It filters noise, assigns a quality score, generates a 384-dimensional semantic embedding, and creates a fingerprint for deduplication. Articles that pass the quality threshold proceed to Layers 1–3.
What Layer 0 does
Layer 0 runs four operations on every submitted article, in order:
- Quality scoring — a weighted multi-factor score (0–1) that determines whether the article contains meaningful signal or noise.
- Embedding generation — a 384-dimensional vector using
BAAI/bge-small-en-v1.5, stored for semantic similarity search. - Semantic hashing — a content fingerprint for fast cross-source deduplication, independent of URL or title.
Quality scoring
The quality score is the most important output from Layer 0. It gates all downstream processing and is available on every article object.
| Factor | Weight | What it measures |
|---|---|---|
| Content depth | 40% | Length, information density, paragraph structure |
| Coherence | 30% | Sentence count, avg sentence length, sentiment consistency |
| Source credibility | 20% | Domain reputation scoring |
| Spam detection | 10% | Excessive caps, repetition, URL density, special characters |
Quality tiers
| Score | Rating | Typical content |
|---|---|---|
0.8+ |
Excellent | High-quality journalism, academic content |
0.65–0.8 |
Good | Professional content, established outlets |
0.5–0.65 |
Medium | Mixed quality, unknown sources |
0.3–0.5 |
Low | Short content, social media, thin articles |
<0.3 |
Noise | Spam, clickbait, malformed content |
min_quality on search endpoints to filter results to
your desired signal level. All articles are forwarded downstream regardless of score.
Async job pattern
All Layer 0 processing endpoints return immediately with a job_id. Poll
/v1/status/{job_id} until completion, then retrieve the result. Most articles complete in under
1 second.
Endpoints
https://layer0.api.polariapi.com. For example:
POST https://layer0.api.polariapi.com/v1/process
Submit article
| Field | Type | Description |
|---|---|---|
| textrequired | string | Article body. Minimum 100 characters. |
| metadata.title | string | Article headline |
| metadata.url | string | Used as deduplication key if provided |
| metadata.source | string | Publisher name — factors into source credibility scoring |
| metadata.author | string | Byline |
| metadata.published_date | ISO 8601 | Original publication datetime |
Submit batch
Submit up to 50 articles in a single request. Articles process in parallel. Counts as one API call against your rate limit regardless of article count.
Poll job status
Retrieve article
| Parameter | Type | Description |
|---|---|---|
| include_embedding | boolean | Return the raw 384-dimensional embedding vector. Default: false. Pass
?include_embedding=true to include it.
|
| Field | Description |
|---|---|
| quality_score | 0–1. Signal quality of the article. Use min_quality on search endpoints to filter by
this value. |
| semantic_hash | Content fingerprint for cross-source deduplication — independent of URL or title. |
| token_count | Article length in tokens. |
| embedding | Array of 384 floats. Only present when ?include_embedding=true is passed. Omitted by
default. |
Semantic search
Semantic search across all processed articles ranked by embedding similarity — not keyword overlap. Finds conceptually related articles even when they share no exact terms.
| Parameter | Type | Description |
|---|---|---|
| queryrequired | string | Natural language search query |
| limit | integer | Max results. Default: 10. Max: 100 |
| min_quality | float | Minimum quality score filter (0.0–1.0) |