In addition to events having properties, users can also have properties. There are two different methods that can be used: set and set_once. Depending on the integration library the actual function calls look a bit different, but internally they all work the same way. Note that both methods can be used in properties with normal event capture, e.g. using our js library:
posthog.capture('Set some user properties',{$set: { location: 'London' },$set_once: { referred_by: 'some ID' },}
When to use set and set_once?
Use set, when the value should always be the newest one, e.g. user email, if user does an update to their email, we always want to use the latest value.
Use set_once when we want the first value and it to never be updated afterwards, e.g. initial URL for the first URL we ever saw this user on.
Sometimes we might want to mix set and set_once usage. Imagine, when we have some heuristics that help us determine a value, but users can also specify it. In these cases it might be beneficial to use set_once for the heuristically computed value and set for user specified value. Note that we will never override an existing value if set_once is used, so if heuristics are used on a continuous basis that would not be a good strategy.
How do values get overridden?
The section below explains how overrides will work in the future, but currently overrides ignore timestamps. This means at the moment
setalways overrides,set_onceonly writes when the value doesn't exist andaliasupon property name conflict uses the value from person 1.
We rely on timestamps to know the order in which events happened. Note that we don't use ingestion order because events can arrive to PostHog in different order compared to when they were performed, e.g. when network was unreliable and some packets were resent later or when we're importing older events from a different PostHog instance.
This table explains, when values get overridden (TS means timestamp):
| value exists | method | previous method | previous TS is ___ call TS | write/override | |
|---|---|---|---|---|---|
| 1 | no | N/A | N/A | N/A | yes |
| 2 | yes | set | set | before | yes |
| 3 | yes | set | set_once | before | yes |
| 4 | yes | set | set | equal | no |
| 5 | yes | set | set_once | equal | yes |
| 6 | yes | set | set | after | no |
| 7 | yes | set | set_once | after | yes |
| 8 | yes | set_once | set | before | no |
| 9 | yes | set_once | set_once | before | no |
| 10 | yes | set_once | set | equal | no |
| 11 | yes | set_once | set_once | equal | no |
| 12 | yes | set_once | set | after | no |
| 13 | yes | set_once | set_once | after | yes |
When identify or alias methods are used, then all user properties individually get merged based on the same logic.
P1 refers to Person or 1's property and P2 to Person or distinct id 2's property. Notice how "yes" from above turned into "2".
| P1 exists | P2 exists | P2 method | P1 method | P1 TS is ___ P2 TS | value used | |
|---|---|---|---|---|---|---|
| 0 | yes | no | N/A | N/A | N/A | 1 |
| 1 | no | yes | N/A | N/A | N/A | 2 |
| 2 | yes | yes | set | set | before | 2 |
| 3 | yes | yes | set | set_once | before | 2 |
| 4 | yes | yes | set | set | equal | 1 or 2 |
| 5 | yes | yes | set | set_once | equal | 2 |
| 6 | yes | yes | set | set | after | 1 |
| 7 | yes | yes | set | set_once | after | 2 |
| 8 | yes | yes | set_once | set | before | 1 |
| 9 | yes | yes | set_once | set_once | before | 1 |
| 10 | yes | yes | set_once | set | equal | 1 |
| 11 | yes | yes | set_once | set_once | equal | 1 or 2 |
| 12 | yes | yes | set_once | set | after | 1 |
| 13 | yes | yes | set_once | set_once | after | 2 |
Note: Here with equal timestamps and the current call (set/set_once) equal to the previous (rows 4 and 11) this means that we could end up using either person's value. We will optimize for efficiency here to minimize database writes (similarly to why set/set_once doesn't override). To elaborate look at the following examples using our python library:
For identify this series of calls would mean we end up with location being Rome or New York.
ts = datetime.datetime.now()posthog.capture('Alice', event='set location', properties={'$set': {'location': 'Rome'}}, timestamp=ts)posthog.identify('Alice', {'location': 'New York'}, timestamp=ts)
For alias this series of calls would also mean we end up with location being Rome or New York.
ts = datetime.datetime.now()posthog.capture('Alice 1', event='set location', properties={'$set': {'location': 'Rome'}}, timestamp=ts)posthog.capture('Alice 2', event='set location', properties={'$set': {'location': 'New York'}}, timestamp=ts)posthog.alias('Alice 1', 'Alice 2')