Some time ago, I had the need to interrupt the execution of a program at very specific instruction addresses in code. Basically, I needed to implement breakpoints from scratch. You will probably never want to do this, and that's completely normal. However, it is an interesting thing to know about, and if you are some explorer from the internet who wants to implement breakpoint-like functionality without 0xCC, then you're in the right place!

First, a little about how breakpoints are implemented in your favorite C++ debugger on x86 platforms...

How breakpoints work

Most people don't know how breakpoints work at the core. You hopefully at least know that breakpoints will stop your program whenever it hits a specific line of code. If you've done much at the assembly level, you probably know they actually stop when you hit a specific processor instruction. But how are they implemented? You may find the solution to be surprisingly simple.

I'm going to assume that you know that code is compiled down into assembly code, and have a basic understanding that assembly code is essentially human-readable machine code.

So say we have the following assembly code for a function that adds 2+5 and returns the result: (The code bytes are shown on the left.)

B8 05 00 00 00      mov eax,5  
83 C0 02            add eax,2  
C3                  ret

Now say I want to place a breakpoint on the add instruction. The way the debugger implements this is that it rewrites the code in place in memory. Specifically, it replaces the add instruction with int 3.

B8 05 00 00 00      mov eax,5  
CC                  int 3  
C0 02               ; Effectively garbage  
C3                  ret

When the int 3 instruction is hit, the breakpoint interrupt is called, which then notifies the debugger attached to the application. The debugger will let you know the breakpoint is hit, let you do your thing, and before you resume the program it will replace the old instruction.

Unfortunately, the process behind rewriting the instruction is out of the scope of this post. There are various tutorials out there on the internet, so you can look at them. If you're comfortable with reading reference material, the general process is:

Enable writing on the code segment
Save the old instruction, write int 3
Flush the instruction cache
Disable writing again.

There's a bit more to it, like getting the application into a suspended state, but that should get you started.

Why int 3 is special

What is so special about int 3? As you may be able to guess, the int assembly instruction simply causes the processor to call the given interrupt.

int 3 is extra special though. Normally, an int instruction is two bytes: 0xCD 0x03. However, the x86 instruction set has a special opcode specifically for int 3 which is only one byte: 0xCC. Intel's x86 developer's manual describes it as "Interrupt 3—trap to debugger."

This makes is very ideal for replacing other instructions, because some instructions (like ret) are only one byte long. If we replaced ret (0xC3) with a two byte instruction, we would be overwriting unrelated code that follows the ret (probably the prologue of another function.)

Why I couldn't use int 3

Unfortunately, for my particular use case, I couldn't use int 3! Why? For this specific project, we needed to be able to use the Visual Studio debugger on the program as it ran. In Windows environments, int 3 always goes straight to the debugger if one is attached. You can't even intercept it with a vectored exception handler!
(If a debugger isn't attached though, you can capture the int 3 as an EXCEPTION_BREAKPOINT exception.)

So therefore, I needed a similar solution that still met the requirements that int 3 does for debuggers.

Must be one byte instruction*
Must cause execution to jump to a specific point
We must be able to resume execution as normal afterwards

Quite the tall order!
*Or, a multi-byte instruction that isn't affected by the operands as far as we care.

The quest for another method

Long story short, there is actually an instruction that meets criteria: The hlt instruction! This instruction is normally used to halt the processor and stop execution. However, it is a privileged instruction. When you call it from unprivileged code, the processor fires the #GP(0) exception. On Windows, this will cause your vectored exception handler to fire with a EXCEPTION_PRIV_INSTRUCTION exception. (On Linux and friends, you receive SIGILL.)

So if you implement breakpoints exactly like a debugger would, but with the hlt instruction, you can co-exist with other debuggers manipulating your code.

Is this a hack? Yes. Does it work? Yes. If (for whatever reason) you find yourself wanting to have breakpoint-like functionality without using the 0xCC instruction, now you know how! Uhh...but don't try it in a kernel mode driver. That'd probably not go well.

Of course, there is the theoretical problem of what happens when you and the debugger start fighting over a specific instruction, but we weren't too concerned about this situation in our case.

Sample code

Here's a simple example of this concept in use. In this example, I left out all of the debugger-like stuff in favor of having the hlt instructions placed at compile-time.

This sample simulates round-robin cooperative thread-switching in a Windows application.

//A simple example of using the hlt instruction to trigger a context switch
//NOTE: You should never actually use this as a method of thread cooperation.
//This is an example from https://blog.pathogenstudios.com/rolling-your-own-breakpoints-without-0xcc/
#include <stdio.h>
#include <Windows.h>
#include <assert.h>

#define NUM_THREADS 2
static HANDLE threads[NUM_THREADS];  
static int currentThread = 0;

#define HLT_INSTRUCTION 0xF4

LONG CALLBACK VectoredHandler(PEXCEPTION_POINTERS exceptionInfo)  
{
    //If the exception isn't EXCEPTION_PRIV_INSTRUCTION, or the instruction that caused the exception isn't HLT, we don't do anything:
    if (exceptionInfo->ExceptionRecord->ExceptionCode != EXCEPTION_PRIV_INSTRUCTION
        || *((unsigned char*)(void*)exceptionInfo->ContextRecord->Eip) != HLT_INSTRUCTION
    )
    {
        return EXCEPTION_CONTINUE_SEARCH;
    }

    //Advance past the hlt instruction: (Normally, you'd just restore the old instruction instead.)
    exceptionInfo->ContextRecord->Eip++;

    //If this was a HLT, we do a "task switch":
    int previousThread = currentThread;
    currentThread++;
    if (currentThread >= NUM_THREADS) { currentThread = 0; }
    ResumeThread(threads[currentThread]);
    SuspendThread(threads[previousThread]);

    return EXCEPTION_CONTINUE_EXECUTION;
}

DWORD ThreadMain(int threadNumber)  
{
    while (1)
    {
        printf("Thread %d is running...\n", threadNumber);
        printf("Thread %d is switching away...\n", threadNumber);
        __asm hlt;
        printf("Thread %d is waking up!\n", threadNumber);
    }
}

int main()  
{
    //Register the exception handler:
    void* exceptionHandler = AddVectoredExceptionHandler(1, VectoredHandler);
    assert(exceptionHandler != NULL);

    //Create the threads:
    for (int i = 0; i < NUM_THREADS; i++)
    {
        threads[i] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)ThreadMain, (LPVOID)(i + 1), CREATE_SUSPENDED, NULL);
        assert(threads[i] != NULL);
    }

    //Start the first thread and wait for all threads to exit:
    assert(ResumeThread(threads[0]));
    assert(WaitForMultipleObjects(NUM_THREADS, threads, TRUE, INFINITE) != WAIT_FAILED);

    //Cleanup:
    for (int i = 0; i < NUM_THREADS; i++)
    { assert(CloseHandle(threads[i])); }
    assert(RemoveVectoredExceptionHandler(exceptionHandler));
}

If you run this program, you'll see that the two threads never run at the same time. The output will show one thread running after another.

(NOTE: As stated in the documentation for SuspendThread, you shouldn't ever actually do this. Although SuspendThread is probably safe in this situation, you're much better off using synchronization objects or some other thread cooperation method.)