md-platform

ShortVideoFeedAlgorithm.md
View raw Back to list

Shorts Feed Algorithm

Routing

GetShortsFeed(page, pageSize, channelId)
│
├─ channelId provided?  → Channel Feed
├─ authenticated user?  → Authenticated Feed
└─ otherwise            → Anonymous Feed

Channel Feed

Chronological DB query, cached per channel/page.

Flow:

  1. Get cache version
  2. Check cache → hit? Return cached result
  3. Miss: query DB (IsShort, Complete, Public, Approved channel, filtered by channelId, ordered by ScheduledAt DESC)
  4. Cache result, return
Step Call Type
1 GET videos:short:version Redis
2 GET videos:short:v{ver}:page-{p}:pagesize-{ps}:{channelId} Redis
Cache hit → stop here
3a SELECT COUNT(*) FROM Videos WHERE ... DB
3b SELECT Id FROM Videos WHERE ... ORDER BY ScheduledAt DESC OFFSET/LIMIT DB
4 SET videos:short:v{ver}:page-{p}:pagesize-{ps}:{channelId} Redis
Scenario Redis DB
Cache hit 2 1 (GetVideos)
Cache miss 3 3 (count + query + GetVideos)

Anonymous Feed

Shared global cache. Reads from ranked sorted set (top 100 by score) then random pool sorted set (up to 500 shuffled daily). Falls back to DB if both pools are empty.

Flow:

  1. Get cache version
  2. Check cache → hit? Return cached result
  3. Miss: read ranked pool page via ZRANGEBYSCORE (descending by score)
  4. If page spans both pools, fill remainder from random pool (ascending by position index)
  5. If both pools empty, fall back to DB query
  6. Cache result, return
Step Call Type
1 GET videos:short:version Redis
2 GET videos:short:v{ver}:page-{p}:pagesize-{ps}:feed Redis
Cache hit → stop here
3 ZCARD videos:short:sorted Redis
4 ZCARD videos:short:random-pool Redis
5 ZRANGEBYSCORE videos:short:sorted (page slice) Redis
6 ZRANGEBYSCORE videos:short:random-pool (if needed) Redis
7 SET ...feed (cache write) Redis
Scenario Redis DB
Cache hit 2 1 (GetVideos)
Cache miss, pools populated 6–7 1 (GetVideos)
Cache miss, pools empty 5 3 (count + query + GetVideos)

All anonymous users share the same cache key per page — one miss populates it for everyone.


Authenticated Feed

Per-user filtering. No result caching (each user's seen set is different).

Data sources

Redis key Type Max size TTL
videos:short:sorted Sorted set 100 12h
videos:short:random-pool Sorted set 500 12h
videos:user:{id}:seen-shorts Set Unbounded 30d

Flow

1. Fetch ALL ranked IDs in one call (max 100)
2. Get random pool count (ZCARD)
3. totalPoolSize = ranked.Count + randomPoolCount
   └─ If 0 → fall back to DB query
4. Get seen count (SCARD)
   └─ If seenCount ≥ totalPoolSize → skip to Ranked Fallback
5. Filter ranked: single bulk SISMEMBER for all ≤100 IDs
   └─ Collect unseen IDs until we have enough (skip + pageSize)
6. If still need more → walk random pool with adaptive batching:
   a. Compute seenRatio = seenCount / totalPoolSize (capped at 0.95)
   b. batchSize = clamp(ceil(needed / (1 - seenRatio) * 1.5), needed, 500)
   c. Fetch batch via ZRANGEBYSCORE
   d. Bulk SISMEMBER the batch
   e. Collect unseen; if not enough, fetch next batch
7. If collected 0 total → Ranked Fallback
8. Paginate collected list, return

Ranked Fallback: serves the ranked + random pool in order, ignoring seen status (prevents empty feed).

Call counts by user profile

New user (seen 0)

Step Call Type
1 ZRANGEBYSCORE videos:short:sorted (all ≤100) Redis
2 ZCARD videos:short:random-pool Redis
3 SCARD videos:user:{id}:seen-shorts → 0 Redis
4 SISMEMBER × ≤100 (bulk) → all false Redis
Collected ≥10 from ranked, done
5 GetVideos(10 ids) DB
Redis DB Notes
4 1 Single SISMEMBER batch, nothing filtered

Moderate user (seen ~40)

Step Call Type
1 ZRANGEBYSCORE sorted (all ≤100) Redis
2 ZCARD random-pool Redis
3 SCARD seen-shorts → 40 Redis
4 SISMEMBER × ≤100 → ~40 seen, ~60 unseen Redis
Collected ≥10 from ranked, done
5 GetVideos(10 ids) DB
Redis DB Notes
4 1 Still resolved within ranked set

Active user (seen ~90 of 100 ranked)

Step Call Type
1 ZRANGEBYSCORE sorted (all 100) Redis
2 ZCARD random-pool Redis
3 SCARD seen-shorts → 90 Redis
4 SISMEMBER × 100 → ~90 seen, ~10 unseen Redis
Collected 10, done
5 GetVideos(10 ids) DB
Redis DB Notes
4 1 All ranked checked in 1 bulk call

Heavy user (seen ~150: all ranked + 50 random)

Step Call Type
1 ZRANGEBYSCORE sorted (all 100) Redis
2 ZCARD random-pool → 500 Redis
3 SCARD seen-shorts → 150 (< 600) Redis
4 SISMEMBER × 100 → all seen, 0 unseen Redis
Need 10 more from random pool
5 ZRANGEBYSCORE random-pool (adaptive batch ~20) Redis
6 SISMEMBER × ~20 → ~5 seen, ~15 unseen Redis
Collected ≥10, done
7 GetVideos(10 ids) DB

Adaptive batch calculation: seenRatio = 150/600 = 0.25ceil(10 / 0.75 * 1.5) = 20

Redis DB Notes
6 1 1 batch for random pool

Very heavy user (seen ~500 of 600)

Step Call Type
1 ZRANGEBYSCORE sorted (all 100) Redis
2 ZCARD random-pool → 500 Redis
3 SCARD seen-shorts → 500 (< 600) Redis
4 SISMEMBER × 100 → all seen Redis
Need 10 from random pool
5 ZRANGEBYSCORE random-pool (adaptive batch ~88) Redis
6 SISMEMBER × ~88 → ~73 seen, ~15 unseen Redis
Collected ≥10, done
7 GetVideos(10 ids) DB

Adaptive batch calculation: seenRatio = 500/600 ≈ 0.83ceil(10 / 0.17 * 1.5) = 89

Redis DB Notes
6 1 Larger batch compensates for high filter rate

Exhausted user (seen ≥600, more than pool size)

Step Call Type
1 ZRANGEBYSCORE sorted (all ≤100) Redis
2 ZCARD random-pool → 500 Redis
3 SCARD seen-shorts → 700 ≥ 600 → early exit Redis
4 ZCARD sorted (in fallback) Redis
5 ZCARD random-pool (in fallback) Redis
6 ZRANGEBYSCORE sorted (page slice) Redis
7 GetVideos(10 ids) DB
Redis DB Notes
6 1 SCARD short-circuits, no batch walking

Summary

Scenario Redis DB
Channel (cache hit) 2 1
Channel (cache miss) 3 3
Anonymous (cache hit) 2 1
Anonymous (cache miss) 6–7 1–3
Auth — new (seen 0) 4 1
Auth — moderate (seen ~40) 4 1
Auth — active (seen ~90) 4 1
Auth — heavy (seen ~150) 6 1
Auth — very heavy (seen ~500) 6 1
Auth — exhausted (seen ≥ pool) 6 1
Auth — empty pools 2 3

All DB counts include the final GetVideos(ids) call to hydrate video DTOs.


Pool generation (workflow)

The RankShortVideosWorkflow runs on a schedule and populates both pools:

  1. GetShortVideoMetrics — queries DB + ClickHouse for top 100 short video candidates with engagement metrics
  2. RankShortVideos — scores and writes to videos:short:sorted (sorted set, score = ranking score)
  3. GenerateRandomPool — queries all eligible short IDs from DB, excludes ranked 100, shuffles with deterministic daily seed (DateOnly.DayNumber), writes up to 500 to videos:short:random-pool (sorted set, score = position index)

Seen tracking

When SaveWatchtime is called for a short video, MarkShortAsSeen adds the video ID to the user's videos:user:{id}:seen-shorts Redis set (30-day TTL). This is a single SADD — O(1).