Hey there 👋
Glad to have you back.
Do you remember the time when making a call actually meant something very specific?
You had a phone sitting in one place. You picked up the receiver, dialed each number one by one, and waited for the connection. There was a clear start, a clear end, and in many cases, a cost attached to how long you stayed on.
But underneath that experience was a very rigid setup. Calls were built on dedicated lines and once a connection was established, that line stayed reserved for the entire duration of the call.
Now compare that to how things work today.
You open WhatsApp or Discord, tap once, and you’re instantly connected with ‘Undertaker69’ from Netherlands and you can talk with him for as long as you want, just with an internet connection.
This change wasn’t the result of some sort of magic, rather, it came from a complete change in how voice communication works underneath the surface and like every major change in technology, it brought both convenience and new risks.
That is what we will be exploring in today’s piece. Let’s start!
What Voice Over Internet (VoIP) Means
Voice over Internet Protocol, or VoIP, is a method of transmitting voice over the internet instead of traditional telephone lines.
In older systems, a call required a dedicated connection between two points. That connection stayed active for the entire duration of the call, even if no one was speaking.
But VoIP handles this differently - your voice is converted into digital data, broken into smaller packets, and sent across the internet. These packets travel through networks just like any other type of data, independent of its format.
Because of this change, the system focuses on delivering data efficiently. Packets can take different routes before reaching their destination, depending on network conditions, which allows the voice to be integrated into platforms, combined with other features, and scaled without relying on traditional telecom systems.
That’s why modern communication tools treat voice as a part of a larger environment.
How Voice Over Internet (VoIP) Works
At a basic level, VoIP follows a structured process.
When you speak into a device, your voice is captured as an analog signal. That signal is converted into digital form using codecs, which prepare the data for transmission. The data is then compressed and divided into smaller packets.
These packets are sent across the internet using packet switching. Unlike traditional systems, there is no single fixed path. Each packet can take a different route depending on network availability.
On the receiving side, those packets are collected, reassembled, decompressed, and converted back into audio.
The process happens quickly enough that it feels like a real-time conversation, but the underlying behavior is very different from traditional telephony because everything depends on network conditions, factors like latency, packet loss, and jitter can affect call quality. This is the trade-off. The system gains flexibility and scalability, but it becomes dependent on the quality of the internet connection.
There is also a coordination layer involved. Systems use protocols such as Session Inition Protocol (SIP) to establish, manage, and terminate calls. These protocols handle the signaling side of communication, ensuring both ends know when a session starts and ends.
How Voice Calling Worked in the Past
Before voice moved onto the internet, communication ran through traditional telephone networks built on circuit-switched systems.
These systems relied on in-band signaling, where the same line carried both voice and the control signals that told the network what to do. Routing a call, marking a line as busy, connecting long-distance, all of it depended on specific tones moving through the line.
The design worked well for its time, but the system trusted whatever signal it received and that assumption turned out to be a big problem.
As covered in the earlier phreaking piece, people eventually figured out that if the network responded to tones, those tones could be reproduced. Once that happened, the system could be influenced from the outside.
Over time, it became clear that this approach couldn’t be secured through small fixes. The entire design needed to change which led to a transition where control signals were separated from voice communication. Instead of relying on tones coming through the same line, signaling moved to dedicated systems like Signaling System No. 7 (SS7).
If you want to dive into more details regarding how earlier telephones worked and what phreaking is, you can check out the following piece:
Important Note
It’s easy to assume that VoIP replaced older telephone systems like Signaling System No. 7 (SS7) or was introduced to fix issues like phreaking, but keep in mind that this is NOT correct.
SS7 belongs to traditional telecom infrastructure. It was introduced to separate control signals from voice and reduce the kind of weaknesses exposed in earlier systems, while VoIP treats voice as data and sends it over packet-based networks instead of dedicated lines.
These technologies exist at different layers and solve different problems.
Early Foundations of Voice over Internet (VoIP)
Now that the fundamentals are clear, it makes sense to step back and look at how all of this began.
Voice over the internet came from decades of experimentation around how voice could be processed, transmitted, and eventually detached from physical communication systems.
To understand where we are today, we need to move through that timeline.
Pre-VoIP Era
1920s–1960s: Early Voice Research
The foundations of VoIP can be traced back to 1925, when AT&T and Western Electric established Bell Labs to advance communication technologies.
A major breakthrough came in 1939, when engineer Homer Dudley developed the Voder, the first electronic voice synthesizer. It used oscillators, filters, and manual controls to recreate human speech, demonstrating that voice could be generated and manipulated electronically.

This work evolved into the vocoder, which could analyze and encode speech. During World War II, it was used in the SIGSALY system to secure Allied communications, showing early practical use of voice processing.
By 1966, researchers introduced Linear Predictive Coding (LPC), which made it possible to convert speech into digital signals that could be transmitted and reconstructed. This became a core building block for modern voice communication systems.
Around the same time, another piece of the puzzle was forming.
In 1969, ARPA (now DARPA) developed ARPANET, an early computer network designed to maintain communication even under extreme conditions. Unlike traditional phone systems that relied on dedicated circuits, ARPANET used packet switching, where data was broken into smaller pieces and routed dynamically across the network.
This approach required protocols like TCP/IP to standardize how data was packaged and transmitted.
The system didn’t carry voice in a practical sense yet, but the idea was in place. Communication no longer had to depend on fixed connections.
If you want to go deeper into how ARPANET evolved into the modern internet and why packet switching changed everything, I’ve covered that in detail in a separate piece and yes it connects directly to what’s happening here.
1970s-1980s - Packet-Based Voice Experiments
By the early 1970s, researchers began applying these concepts to voice communication.
Using Linear Predictive Coding, engineers at MIT’s Lincoln Lab transmitted the first voice packet over ARPANET in 1973. A year later, two-way voice communication was successfully tested between different systems, marking one of the earliest working examples of VoIP.
Further developments followed. By 1976, conference calls were conducted using packet-based voice transmission. By 1982, systems were capable of connecting across multiple network types, including cable networks, packet radio, and traditional telephone systems.
At the same time, access to these networks remained limited.
Services like CompuServe, launched in 1969 (commercially available to consumers in 1979), introduced early forms of public internet access. Users could exchange messages through bulletin boards and email, and by 1980, instant messaging had emerged as a concept.

These developments showed that communication was moving toward digital systems, but voice transmission over networks was still far from mainstream use.
Late 1990s - Pre-Mainstream VoIP
By the 1990s, VoIP began transitioning from research to practical use.
In 1993, early video conferencing systems like TeleSuite demonstrated that real-time communication over the internet was possible, even if limited. Businesses began experimenting with these tools to reduce the need for physical presence.

In 1997, the first hosted IP PBX system, VirtualPBX, was introduced. It allowed users to make and receive calls over the internet without relying on traditional on-premise hardware. This was a big move toward software-based communication systems.
Shortly after, in 1999, the open-source PBX platform Asterisk was released, allowing developers to build and customize communication systems more freely.
The rate at which this technology was progressing, made people realize that some sort of standardizations must be in place, due to which SIP (Session Initiation Protocol) was introduced in 1996 by the Internet Engineering Task Force. It provided a structured way to establish, manage, and terminate communication sessions between devices. Basically, it handled things like user availability, call setup, connection management and making large-scale VoIP systems more reliable.
These developments laid the groundwork for modern VoIP, even though the technology was still not widely used by everyday users.
Post-VoIP Era
Early 2000s: VoIP Goes Public
By the early 2000s, VoIP had moved past experimentation and started becoming usable in a real-world sense. A major turning point came in 2003 with the launch of Skype. Within just two years, it had reached more than 50 million users, which gives you an idea of how quickly the model spread.
Skype introduced free voice calls between users on the same platform, while charging for calls that connected to the traditional Public Switched Telephone Network (PSTN). That hybrid model made it accessible while still fitting into existing infrastructure. As time passed, it expanded beyond voice into video calls, file sharing, and messaging, turning it into a broader communication platform rather than just a calling tool.
Its growth didn’t go unnoticed. The company was acquired by eBay in 2005 and later by Microsoft in 2011, reflecting how valuable internet-based communication had become. Around the same time, alternatives started entering the market. Services like Vonage Business, developed in 2001 (went public in 2006), reached around 2 million subscribers by 2006, showing that VoIP was no longer limited to a single platform.
Regulation also started catching up. In 2004, Federal Communications Commission Chairman Michael Powell classified VoIP as an information service rather than a traditional phone service. This distinction reduced regulatory burden and taxes, which helped accelerate adoption. By 2005, VoIP services connecting to the PSTN were required to support Enhanced 911 (E911), ensuring emergency services could still locate users during calls.
At the same time, mobility started entering the picture. In 2005, Calypso Wireless introduced the C1250i, one of the first mobile phones capable of switching between cellular networks and Wi-Fi. This made real-time internet-based calling on mobile devices possible.
By 2006, apps like Truphone extended this further, allowing users to make calls over internet connections using SIP instead of relying entirely on cellular networks. These developments marked the point where VoIP started moving beyond desktops and into everyday usage scenarios.
Mid-Late 2000s - Present: Expansion of VoIP
As the decade progressed, VoIP continued expanding at both consumer and enterprise levels. By 2012, hosted VoIP services were growing at an annual rate of around 17%, while SIP trunking saw a sharp increase of roughly 83% between 2011 and 2012, as stated in this report.
By 2015, many organizations had either fully adopted VoIP or were in the process of moving away from traditional telephony. The shift became large enough that companies like AT&T pushed for regulatory changes, requesting permission to phase out copper wire infrastructure in favor of fiber optics and IP-based systems.
During this period, the VoIP market expanded rapidly, with multiple providers competing on pricing, features, and scalability. Mobile VoIP gained more attention as smartphones became central to communication.
And at this point, it becomes easier to see what happened next.
Look around today. Platforms like Discord, WhatsApp, and Google Meet have voice built directly into them. Calling is no longer a separate system you think about. It’s embedded into the platforms you already use.
Why Traditional Calling Fell Behind
The transition toward VoIP didn’t happen because of one single thing or some magic. It happened because multiple conditions came together at the same time.
Internet infrastructure improved, broadband became widely available, and smartphones brought that connectivity into everyday use. Communication was no longer tied to a fixed location. People could connect from anywhere, which made internet-based systems more practical than traditional ones.
Cost also played a major role. Traditional telephone systems relied on dedicated infrastructure, which made long-distance communication expensive. VoIP reduced those costs significantly by using existing internet networks. This made it especially attractive for businesses and international communication.
At the same time, the way people communicated was changing. Online communities, gaming platforms, and remote work environments created a need for constant communication.
Another factor was integration. VoIP didn’t exist as a standalone tool. It worked alongside other services inside the same platform. Messaging, video, file sharing, and voice all existed in one place, which made the experience more efficient.
Put simply, the newer system aligned better with how people were already using technology. It was easier to integrate into everyday workflows due to which adoption of this technology became the default.
The Flip Side of the Coin
So far, we’ve looked at how VoIP evolved and why it became the default way we communicate.
But this is also the part where things usually get overlooked.
Every system that improves access also increases the surface area for abuse. VoIP is no exception. The same flexibility that makes communication easier also makes it easier to manipulate.
And if you’ve read the earlier piece on phreaking, this shouldn’t come as a surprise.
Telecommunication Fraud
VoIP systems introduced a different kind of weakness compared to traditional telephone networks.
Instead of tones being exploited like in the phreaking era, the focus now is on how calls are routed, authenticated, and billed. One of the most common forms is toll fraud, where attackers gain access to a system and route calls through it to generate revenue or avoid charges. Call routing abuse follows a similar pattern, where the infrastructure itself is used in ways it wasn’t intended.
The technical details can get deep and boring, but the pattern is familiar.
In the past, attackers learned how telephone systems interpreted signals and used that knowledge to manipulate them. Today, the same mindset applies to internet-based systems.
So, yeah…the technology changed, but the approach didn’t and the scale has increased.
According to the Communications Fraud Control Association, global telecommunications fraud losses reached an estimated $38.95 billion in 2023, reflecting a 12% increase compared to the previous year.

And by the way, it’s important to keep in mind that - VoIP is only one part of the broader telecom ecosystem, but it plays a role in how these systems are exploited and my point isn’t that VoIP is inherently insecure. The point is that large, interconnected systems create opportunities, and those opportunities get used, both in good and bad ways.
AI Voice Cloning
Welcome to the modern world where you can’t trust what you see nor what you hear.
AI-generated voices can now mimic real people with a high degree of accuracy. There are tools available that can clone a voice from a short sample and reproduce tone, accent, and speaking style. In some cases, these systems are paired with visual deepfakes, making both voice and appearance manipulatable.
This has already moved beyond theory.
In one reported case, scammers used an AI-generated voice to impersonate a CEO and convince a company executive to transfer $243,000.
And interestingly, I’ve tried a version of this myself.
As a joke, I once transformed my voice into something completely different, kept the conversation going, and ended up convincing a guy to send me money. He was a friend…or maybe he wasn’t.
Anyway, this small experiment says more than it should because this is exactly how social engineering works - you’re working your way into someone’s trust. The voice is used to make the interaction feel real, you start to trust that person and before you know it, the damage has been done.
So yeah…voice just became another attack surface, but you can protect yourself from these attacks and I’ve already covered that in detail in my social engineering guide:
What This Means Going Forward
Looking back at how voice communication has evolved, the progression is hard to ignore.
We moved from analog telephone systems to internet-based communication, and now to AI-generated voice. Each step made communication more accessible and faster.
At the same time, each step introduced new risks as well.
VoIP is becoming a part of a broader environment where communication blends into platforms, applications, and automated systems. Real-time translation, AI-assisted conversations, and synthetic voices are already being developed and integrated.
But alongside that, trust becomes the central issue.
If voice can be generated or spoofed, then hearing someone is no longer enough to verify who they are. We’re already seeing this pattern in other areas, where AI-generated images and videos are becoming harder to distinguish from real ones.
Voice is heading in the same direction and like most technology shifts, this also comes down to how the system is being used.












