0
0
mirror of https://github.com/PostHog/posthog.git synced 2024-11-28 18:26:15 +01:00
Commit Graph

29 Commits

Author SHA1 Message Date
Julian Bez
6c95fd18ba
chore(ruff): Add ruff rules for exception handling (#23251) 2024-06-27 12:39:21 +01:00
Ben White
33a0757a7c
fix: Loading of embedddings (#22260) 2024-05-20 16:26:22 +01:00
Julian Bez
9576fab1e4
chore: Add Pyupgrade rules (#21714)
* Add Pyupgrade rules
* Set correct Python version
2024-04-25 08:22:28 +01:00
David Newell
140dfbceda
fix: sparkline generation (#21274) 2024-04-02 16:17:16 +01:00
David Newell
4e22252235
chore: run clustering in background task (#21080) 2024-03-28 12:12:51 +00:00
David Newell
6b6d40a666
feat: sparkline errors (#21081)
* feat: errors page playlist link

* feat: create playlist from errors

* remove sample data

* update title

* add frontend protection to samples

* feat: sparkline errors
2024-03-22 10:20:37 +00:00
David Newell
21922ff9e3
feat: create playlists from errors (#21037) 2024-03-21 16:59:21 +00:00
David Newell
088399bbe0
feat: error clustering UI (#20958) 2024-03-19 16:02:07 +00:00
David Newell
8b92cd1c62
fix: picking embedding input samples (#20913) 2024-03-14 09:12:13 +00:00
David Newell
6b4a9f20b1
chore: include input in fetched data (#20908) 2024-03-13 16:52:55 +00:00
David Newell
dc0faa5a79
chore: add input to clickhouse rows (#20901) 2024-03-13 15:57:22 +00:00
David Newell
428c48084b
fix: error clustering data shape (#20859)
* fix: error clustering data shape

* use new input column

* remove logger
2024-03-13 14:49:20 +00:00
David Newell
6251ed481f
feat: error clustering UI (#20823) 2024-03-11 18:00:35 +00:00
David Newell
a677a3fd64
fix: embeddings runner variable name (#20802) 2024-03-08 20:43:41 +00:00
David Newell
1d3c7417fb
fix: add labelnames to prometheus metrics (#20800) 2024-03-08 19:32:29 +00:00
David Newell
0c1c05e38c
feat: cluster errors (#20779) 2024-03-08 18:16:24 +00:00
David Newell
ea340fc765
feat: embed errors (#20752) 2024-03-08 18:12:06 +00:00
David Newell
6c5ad0c414
fix: return most not least similar recordings (#20693) 2024-03-05 16:29:05 +00:00
Paul D'Ambra
776e0e1c38
feat: fewer tokens sent to embed from urls (#20680)
* feat: fewer tokens sent to embed from urls

* need to stringify the input before logging it
2024-03-03 17:23:00 +00:00
Paul D'Ambra
0d476baeb6
fix: fewer loopy loops (#20678)
* fix: fewer loopy loops

* and add a rate limit to the queue

* swallow open AI errors

* count different failures differently
2024-03-03 13:59:31 +00:00
Paul D'Ambra
2085f68b05
fix: some embeddings faff (#20649)
* meaningful histogram buckets for tracking tokens

* we're not being clever about deduplicating, let's only ever look at the most recent

* Select a random selection of eligible recordings
2024-02-29 22:33:54 +00:00
Paul D'Ambra
8d0efa1c5b
fix: query needs a max time range (#20626) 2024-02-29 10:17:58 +00:00
David Newell
a9496eace8
chore: count tokens before hitting OpenAI (#20621)
* chore: count tokens before hitting OpenAI

* log the offending input

---------

Co-authored-by: Paul D'Ambra <paul@posthog.com>
2024-02-28 23:04:43 +00:00
Paul D'Ambra
5e89d9124a
chore: even more logging (#20612)
* chore: even more embeddings logging

* and more settings

* fix

* fix
2024-02-28 21:24:39 +00:00
David Newell
c09812e1b4
chore: add better logs to embeddings (#20582) 2024-02-27 17:36:58 +00:00
David Newell
125d4e8a3e
feat: embeddings similarity (#20268) 2024-02-21 13:34:11 +00:00
Paul D'Ambra
12b685a22d
chore: better buckets for timings (#20362) 2024-02-15 12:29:00 +00:00
Paul D'Ambra
3e23550b93
fix: don't load recordings we know we'll skip (#20360)
* fix: don't load recordings we know we'll skip

* fix
2024-02-15 12:18:45 +00:00
David Newell
4f6d9c8673
feat: generate recording text embeddings (#20046)
* make migration

* general flow

* abstract shared methods

* generate input

* remove postgres migration

* generate embedding strings

* remove random file

* Update query snapshots

* Update query snapshots

* feat: create periodic replay embedding

* first sketch of table

* batch and flush embeddings

* add default to timestamp generation

* fetch recordings query

* save first embeddings to CH

* dump session metadata into tokens

* fix lint

* brain dump to help th future traveller

* prom timing instead

* fix input concatenation

* add an e :/

* obey mypy

* some time limits to reduce what we query

* a little fiddling to get it to run locally

* paging and counting

* Update query snapshots

* Update query snapshots

* move the AI stuff to EE for now

* Update query snapshots

* kick off the task with interval from settings

* push embedding generation onto its own queue

* on a different queue

* EE to the max

* doh

* fix

* fangling

* Remove clashes so we can merge this into the other PR

* Remove clashes so we can merge this into the other PR

* start wiring up Celery task

* hmmm

* it's a chord

* wire up celery simple version

* rename

* why is worker failing

* Update .run/Celery.run.xml

* update embedding input to remove duplicates

* ttl on the table

* Revert "update embedding input to remove duplicates"

This reverts commit 9a09d9c9f0.

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Paul D'Ambra <paul@posthog.com>
2024-02-14 12:50:42 +00:00