this post was submitted on 07 May 2024

737 points (98.2% liked)

Technology

61227 readers

4247 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

737

Mullvad VPN: Introducing Defense against AI-guided Traffic Analysis (DAITA) (mullvad.net)

submitted 8 months ago by ForgottenFlux@lemmy.world to c/technology@lemmy.world

178 comments fedilink hide all child comments

Even if you have encrypted your traffic with a VPN (or the Tor Network), advanced traffic analysis is a growing threat against your privacy. Therefore, we now introduce DAITA.

Through constant packet sizes, random background traffic and data pattern distortion we are taking the first step in our battle against sophisticated traffic analysis.

you are viewing a single comment's thread
view the rest of the comments

[–] jet@hackertalks.com 2 points 8 months ago (1 children)

I'm afraid just generating random traffic from your IP address won't do anything against traffic flow analysis. Because most internet traffic is point to point, people who are interested in the flow, just follow the traffic moving between various points. So if you're sending extra traffic to other random sites, it doesn't interfere with point-to-point flow analysis.

In the context of a VPN, because all of your traffic is encrypted, you have to work harder to determine what traffic is going where. Because all traffic is going from your network to another virtual network. So an outside observer just sees the size and frequency of traffic but not the destinations. In this context since they don't see the destinations, it makes sense to add random traffic flows, because that'll obscure the signal that the observers are looking for.

[–] MigratingtoLemmy@lemmy.world 2 points 8 months ago (1 children)

Considering that VPNs are Point-to-point too (home->VPN), I was wondering if one could use DAITA with TCP directly instead of having to use a VPN. Imagine if TCP had DAITA baked in.

[–] jet@hackertalks.com 2 points 8 months ago* (last edited 8 months ago) (1 children)

Even if you baked in variable packet size into TCP. It would be trivial for anybody monitoring network flow, to see you who you're talking to. There would be no ambiguity.

The only reason this makes sense for a VPN, is there's a lot of traffic bundled together, so a third party doesn't actually know where your traffic flow is going.

Consider the example if you ran your own personal VPN endpoint. So you were the only user on the VPN. Even with randomized traffic flow injected into your VPN connection, it would be trivial for any third party who's monitoring traffic flow to know that traffic is yours. Because you're the only VPN connection talking to the VPN server. This thought experiment applies when you don't have a VPN at all.

[–] MigratingtoLemmy@lemmy.world 2 points 8 months ago (1 children)

If I were to send packets to a single entity over time, I'd have no use for DAITA. I agree with you on this.

However, let's say that I run a bunch of VPN endpoints across VPSes, and the entity trying to track me doesn't know about all of these IP ranges. I could be renting from a colo, the cloud and even a a bunch of friends who have their ports open. If I were to mix this in with my usual internet traffic, it becomes significantly harder for third-parties to figure out what I'm doing connecting to all of these different IPs. A state actor could put the resources behind it, but the average third-party will have a hard time with it. I can certainly see use-cases for it.

[–] jet@hackertalks.com 2 points 8 months ago (1 children)

I think we're mixing up vocabulary.

Every IP you talk to is visible to anybody monitoring your network. The sale of net flow data is commonly acknowledged by ISPs. So every computer you talk to is common knowledge for sale.

In your scenario, let's say you have five VPN connections set up to go to five endpoints that you control. But if nobody else is using those same endpoints. Your net flow data still exposes exactly what you're doing. There's no ambiguity. Your traffic is plainly obvious to anybody observing the network. Even if those VPN connections are adding randomized traffic onto the links.

[–] MigratingtoLemmy@lemmy.world 1 points 8 months ago (1 children)

Except that I will not necessarily be connecting to the exact same IPs over time, just going to do so in specific ranges which the VPS/colo owns. There's plenty of people who are going to be renting VPSes and will have their traffic originate from the same IP range as mine, which means that if everybody using TCP had their traffic anonymized like so, the third party wouldn't actually know that MigratingToLemmy specifically was connecting to AWS at a certain time and from a certain location, so to speak. This hypothesis doesn't include correlation through other data in the threat model. But it could definitely prevent correlation with traffic across locations, which is similar to what Mullvad states

[–] jet@hackertalks.com 2 points 8 months ago* (last edited 8 months ago) (2 children)

I'm sorry no. This will not help you avoid flow analysis

[–] xabadak@lemmings.world 2 points 8 months ago (1 children)

I think you both are talking past each other. You said "But if nobody else is using those same endpoints." but @MigratingtoLemmy@lemmy.world said "There’s plenty of people who are going to be renting VPSes and will have their traffic originate from the same IP range as mine". Reading this thread, it seems like you both have different network setups in mind.

[–] jet@hackertalks.com 1 points 8 months ago (1 children)

Thanks for pointing that out. I tried to address that. When I responded about net flow analysis. Having the same IP range as other people does not let you hide in the crowd. The net flow data will identify exact IPs.

[–] xabadak@lemmings.world 1 points 8 months ago (1 children)

Hypothetically, what if everybody in the world were using mixnets to obfuscate destination/origin, and then mullvad's DAITA to obfuscate traffic timing and size. Would netflow analysis be able to defeat that?

[–] jet@hackertalks.com 1 points 8 months ago (1 children)

What is a mix net? Something like TOR? An onion overlay Network where the routing goes between multiple hops before it exits the network?

Let's go through a few scenarios first

Scenario A: you have a link to a common VPN endpoint, that other people use. On this link you generate traffic, a consistent 1 megabyte per second up and down.

There is now ambiguity about what traffic goes into the VPN, and goes to you. And outside observer would not be able to deduce what traffic is yours just by size and timing.

This is the gold standard. You remove all possible signal data.

Scenario B: everyone is using a onion overlay network, and their traffic has a little padding added, and a little extra timing added at every link. This would reduce the probability and outside observer could deduce the entire end to end flow of your traffic. But the type of your traffic could defeat whatever level of obscuring is happening. Imagine you have a real time connection to an network, and you're typing out Morse code.... - - - - sort of thing. Imagine each of those packets has a different size. If I'm observing the network for long enough, I'm going to notice the Morse code type of packets, with the timing and the size go through the onion network. There will be some ambiguity. But enough traffic over enough time would give me high confidence that you're the source of the traffic. Because the extra obscuring traffic has a probability, but not a guarantee, of masking the shape of your traffic.

So scenario a is the gold standard, scenario b would be better then nothing. Having a global onion network has its own issues, now you have to trust many nodes instead of one node. All this is down to your threat model and how much effort you're willing to do.

[–] xabadak@lemmings.world 2 points 8 months ago

Yeah TOR is an example of a mixnet. WHat I was talking about was a combination of your Scenario A and Scenario B, where you have a mixnet where everybody's traffic goes through multiple proxies, and many people are using each proxy, and you have padding and timing added to make sure traffic flows are consistent. As far as trusting nodes, you have to do that regardless of your set up. If you don't use any VPN, you have to trust your ISP. If you use a VPN like Mullvad, you have to trust Mullvad. If you use a mixnet, you have to trust that all your chosen proxies aren't colluding. So like you said, it's up to your own judgement and threat model.

[–] MigratingtoLemmy@lemmy.world 1 points 8 months ago (1 children)

What am I missing?

[–] jet@hackertalks.com 1 points 8 months ago

https://hackertalks.com/comment/3687086