Three Ways to Lose Data (And How the Outbox Pattern Prevents It)

Well, about the outbox pattern I've read in one of the posts made by Oskar Dudycz, and second time it was in better software design episode, so as they say three times a charm, when I was practicing with go-event-driven from three dots lab with one of the modules about theory and implementing in practice I decided that I should give it a deeper dive. I will try to shed some light on when and why it could be useful to use it. The examples are picked from one of the projects made along with the mentioned training.

The need for this kind of solution, could happen when you need to maintain consistency between database writes and event publishing in high-traffic, distributed systems. Along the way I will explore a couple of reasons why this pattern is solving some of the possible problems with examples in go.

The Business Context

Our ticketing system had a problem. We relied on a third-party service to handle ticket bookings, but it couldn't handle the traffic spikes during popular shows. When a big concert went on sale, the third-party service would go down, and our customers couldn't buy tickets.

The solution seemed straightforward: handle bookings ourselves and notify the ticket provider asynchronously through events. We'd check seat availability, store the booking in our PostgreSQL database, and emit a BookingMade event that other services could consume.

Simple, right? Not quite.

The most critical constraint in our system is preventing overbooking. We can't sell the same seat twice. This means our seat availability check and booking insertion must be atomic - they must succeed or fail together. When you add asynchronous event publishing to this equation, things get tricky fast.

Scenario 1: Commit First, Publish After (The Lost Event)

Let's start with what seems like the most intuitive approach: complete the database transaction first, then publish the event.

Here's what the code might look like:

func (b BookingsRepository) AddBooking(ctx context.Context, booking models.Booking) error {
    tx, err := b.db.BeginTxx(ctx, &sql.TxOptions{Isolation: sql.LevelSerializable})
    if err != nil {
        return err
    }
    defer tx.Rollback()
    _, err = tx.ExecContext(ctx, `
        INSERT INTO bookings (booking_id, show_id, number_of_tickets, customer_email)
        VALUES ($1, $2, $3, $4)
    `, booking.BookingID, booking.ShowID, booking.NumberOfTickets, booking.CustomerEmail)
    if err != nil {
        return err
    }
    var remainingSeats int
    err = tx.QueryRowContext(ctx, `
        SELECT (s.number_of_tickets - COALESCE(SUM(b.number_of_tickets), 0))
        FROM shows s
        LEFT JOIN bookings b ON s.show_id = b.show_id
        WHERE s.show_id = $1
    `, booking.ShowID).Scan(&remainingSeats)
    if err != nil {
        return err
    }
    if remainingSeats < 0 {
        return ErrNoPlacesLeft
    }
    if err := tx.Commit(); err != nil {
        return err
    }
    // 💥 THE BOOM
    event := BookingMade{
        BookingID:  booking.BookingID,
        ShowID:     booking.ShowID,
        NumTickets: booking.NumberOfTickets,
    }
    return b.publisher.Publish(ctx, event) 
}

What Goes Wrong

Well, you can easily guess, we have the transaction committed to database, but the event is never published. For any reason the service is down, service crash, loses network connection or killed by the orchestration, we have a persistent booking added to database, but the event is lost forever and never distributed to any listener.

The outcome

The booking exists in your database
Your customer sees a successful purchase
But third-party never receives the event
No ticket is generated
The customer shows up at the venue with no valid ticket

Why It's Dangerous

This is a silent failure. No error is thrown, no alert fires, and you might not discover the problem until customers start complaining. By then, the concert might be over.

Scenario 2: Publish First, Commit After (The Phantom Event)

Maybe the solution is to reverse the order? Publish the event first, then commit the transaction.

The code would look like this:

func (b BookingsRepository) AddBooking(ctx context.Context, booking models.Booking) error {
    tx, err := b.db.BeginTxx(ctx, &sql.TxOptions{Isolation: sql.LevelSerializable})
    if err != nil {
        return err
    }
    defer tx.Rollback()
    _, err = tx.ExecContext(ctx, `
        INSERT INTO bookings (booking_id, show_id, number_of_tickets, customer_email)
        VALUES ($1, $2, $3, $4)
    `, booking.BookingID, booking.ShowID, booking.NumberOfTickets, booking.CustomerEmail)
    if err != nil {
        return err
    }
    var remainingSeats int
    err = tx.QueryRowContext(ctx, /* ... */).Scan(&remainingSeats)
    if err != nil {
        return err
    }
    if remainingSeats < 0 {
        return ErrNoPlacesLeft
    }
    event := BookingMade{
        BookingID:  booking.BookingID,
        ShowID:     booking.ShowID,
        NumTickets: booking.NumberOfTickets,
    }
    if err := b.publisher.Publish(ctx, event); err != nil {
        return err
    }
      // 💥 THE BOOM 
    
    return tx.Commit() 
}

What Goes Wrong

Now the event is published and starts to be processed by subscribers but transaction is not committed yet tx.Commit(), so in case of service crash before it is committed, it will be rolled back and the booking will not be persisted in database.

The outcome:

This is worse than Scenario 1. Here's what happens:

Third-party service receives the BookingMade event
They generate and email a ticket to the customer
The customer thinks they have successfully booked a seat
But the booking doesn't exist in your database
Your system thinks the seat is still available
Another customer successfully books the same seat
Two people, same seat, KISS concert confrontation

This violates your core business constraint: no overbooking.

Why It's Worse

Not only does this cause customer frustration, but it's also harder to detect. The event made it through the system. Tickets were generated. Everyone thinks everything worked. The race condition only becomes apparent when two people show up to the same seat.

Scenario 3: The Network Partition (The Zombie Event)

Let's say you've been careful and avoided the crash scenarios. You commit first, and your service doesn't crash. But distributed systems have another trick up their sleeve: network partitions.

func (b BookingsRepository) AddBooking(ctx context.Context, booking models.Booking) error {
    tx, err := b.db.BeginTxx(ctx, &sql.TxOptions{Isolation: sql.LevelSerializable})
    if err != nil {
        return err
    }
    defer tx.Rollback()
    // Insert booking and check availability
    if err := tx.Commit(); err != nil {
        return err
    }
    event := BookingMade{...}
    err = b.publisher.Publish(ctx, event)
    // 💥 Network timeout! Did it succeed or fail?
    if err != nil {
        // What do we return here?
        return err
    }
    return nil
}

What Goes Wrong

Network issues create insecurity:

The publish call times out
Did the event reach the pub/sub? Maybe.
Should you retry? If you retry and the first attempt succeeded, you'll publish duplicates.
Should you not retry? Then you're back to Scenario 1 - lost events.
What do you return to the user? The booking succeeded, but the event status is unknown.

The Dilemma

This case is called a zombie event. The event might be never published or it might be succesfully published with lost response. The biggest twist is that we do not know for certain what happened.

The Outbox Pattern: A Solution to All Three Scenarios

All three scenarios above share a common root cause: you're trying to make two separate systems (your database and your message broker) consistent with each other, but you can't wrap them in a single transaction.

The Outbox Pattern solves this by using a simple but powerful principle:

Store both your business data and your events in the same database, in the same transaction. Then, have a separate process forward the events from the database to your message broker.

Here's how it works:

The two-phase approach:

Write Phase: The service stores both the booking and the event in PostgreSQL within a single transaction
Forward Phase: A separate forwarder process reads events from the database and publishes them to the message broker

Now, let's see how this is implemented in the real ticketing system using the Watermill library.

The Real Implementation

1. The Booking Repository

Here's the actual production code that handles booking creation:

func (b BookingsRepository) AddBooking(ctx context.Context, booking models.Booking) error {
    return updateInTx(
        ctx,
        b.db,
        sql.LevelSerializable, // Strongest isolation level to prevent race conditions
        func(ctx context.Context, tx *sqlx.Tx) error {
            _, err := tx.NamedExecContext(ctx, `
                INSERT INTO
                    bookings (booking_id, show_id, number_of_tickets, customer_email)
                VALUES (:booking_id, :show_id, :number_of_tickets, :customer_email)
            `, booking)
            if err != nil {
                var pqErr *pq.Error
                if errors.As(err, &pqErr) && pqErr.Code == "23505" {
                    return ErrBookingAlreadyExists
                }
                return fmt.Errorf("could not add booking: %w", err)
            }
            // Check available seats
            availableSeats := 0
            err = tx.GetContext(ctx, &availableSeats, `
                SELECT
                    number_of_tickets AS available_seats
                FROM
                    shows
                WHERE
                    show_id = $1
            `, booking.ShowID)
            if err != nil {
                return fmt.Errorf("could not get available seats: %w", err)
            }
            alreadyBookedSeats := 0
            err = tx.GetContext(ctx, &alreadyBookedSeats, `
                SELECT
                    COALESCE(SUM(number_of_tickets), 0) AS already_booked_seats
                FROM
                    bookings
                WHERE
                    show_id = $1
            `, booking.ShowID)
            if err != nil {
                return fmt.Errorf("could not get already booked seats: %w", err)
            }
            // Verify we're not overbooking
            remainingSeats := availableSeats - alreadyBookedSeats
            if remainingSeats < booking.NumberOfTickets {
                return ErrNoPlacesLeft
            }
            // HERE'S THE KEY: Create an outbox publisher that writes to the database
            outboxPublisher, err := NewOutboxPublisher(tx, b.logger)
            if err != nil {
                return fmt.Errorf("could not create outbox publisher: %w", err)
            }
            eventBus := handler.NewEventBus(outboxPublisher)
            bookingMadeEvent := events.NewBookingMade(
                booking.BookingID,
                booking.CustomerEmail,
                booking.ShowID,
                booking.NumberOfTickets,
                booking.BookingID,
            )
            err = eventBus.Publish(ctx, bookingMadeEvent)
            if err != nil {
                return fmt.Errorf("could not publish BookingMade event: %w", err)
            }
            // If we return nil, the entire transaction commits:
            // - The booking is stored
            // - The event is stored in the watermill_messages table
            // Both atomically!
            return nil
        },
    )
}

A couple of things should be pointed out, I use the sql.LevelSerializable, as the strongest isolation level to prevent race conditions. The outbox publisher is created within the transaction, and when the event is published to event bus eventBus.Publish() it is written into database table not message broker. Both booking and event are inserted in one commit together atomically.

2. The Outbox Publisher

Here's how the outbox publisher is set up:

func NewOutboxPublisher(tx *sqlx.Tx, logger watermill.LoggerAdapter) (message.Publisher, error) {
    var publisher message.Publisher
    sqlPublisher, err := watermillSQL.NewPublisher(
        tx, 
        watermillSQL.PublisherConfig{
            SchemaAdapter: watermillSQL.DefaultPostgreSQLSchema{},
        },
        logger,
    )
    if err != nil {
        return nil, err
    }
    publisher = log.CorrelationPublisherDecorator{Publisher: sqlPublisher}
    publisher = tracing.NewTracePublisher(publisher)
    publisher = forwarder.NewPublisher(publisher, forwarder.PublisherConfig{
        ForwarderTopic: OutboxTopic, 
    })
    publisher = log.CorrelationPublisherDecorator{Publisher: publisher}
    publisher = tracing.NewTracePublisher(publisher)
    return publisher, nil
}

As shown above, when the sqlPublisher writes messages to table, the decorator pattern is used to add correlation IDs and tracing. At the end forwarder wraps the messages for one special topic "events_to_forward".

This all happens within the same transaction that handles business data.

Behind the scenes, this writes to a table that looks like:

CREATE TABLE watermill_messages (
    uuid VARCHAR(36) NOT NULL PRIMARY KEY,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    payload BYTEA,
    metadata JSONB,
    topic VARCHAR(255) NOT NULL
);

3. The Forwarder

A separate process reads from the outbox table and forwards messages to publisher in our case - redis:

const OutboxTopic = "events_to_forward"
func NewForwarder(
    db *sqlx.DB,
    publisher message.Publisher, 
    logger watermill.LoggerAdapter,
) (*forwarder.Forwarder, error) {
    sqlSubscriber, err := watermillSQL.NewSubscriber(
        db,
        watermillSQL.SubscriberConfig{
            SchemaAdapter:    watermillSQL.DefaultPostgreSQLSchema{},
            OffsetsAdapter:   watermillSQL.DefaultPostgreSQLOffsetsAdapter{},
            InitializeSchema: true, 
        },
        logger,
    )
    if err != nil {
        return nil, err
    }
    err = sqlSubscriber.SubscribeInitialize(OutboxTopic)
    if err != nil {
        return nil, err
    }
    fwd, err := forwarder.NewForwarder(
        sqlSubscriber, 
        publisher,     
        logger,
        forwarder.Config{
            ForwarderTopic: OutboxTopic,
        },
    )
    if err != nil {
        return nil, err
    }
    return fwd, nil
}

So look at the factory method, there is passed publisher which publishes this time to redis not database like before. So the sqlSubscriber reads the events from table in database. The forwarder uses as the source database table and as publisher the redis (or any other pub/sub). The forwarder runs independently of the booking request lifecycle. It pulls over and over again the events from database.

How the Outbox Solves Each Scenario

So now when we've explained the implementation details of this pattern let's take a look at the scenarios that we hoped it solves.

Scenario 1 Solved: No More Lost Events

Original Problem: Service commits the transaction, then crashes before publishing the event.

With Outbox:

The service commits both the booking and the event to PostgreSQL in one transaction
If the service crashes after the commit, both pieces of data are safely persisted
The forwarder process (running independently) will pick up the event and publish it
Even if the forwarder is down temporarily, the event waits in the database
When the forwarder restarts, it resumes from its last offset and publishes pending events

Result: No lost events. Ever.

Scenario 2 Solved: No More Phantom Events

Original Problem: Service publishes the event, then crashes before committing, causing overbooking.

With Outbox:

Events are stored in the same PostgreSQL transaction as the booking
If the transaction rolls back (due to crash, validation failure, or any error), the event is also rolled back
The event never makes it to the watermill_messages table
The forwarder never sees it
No phantom event reaches third-party services.

Result: No phantom events. No overbooking. Data stays consistent.

Scenario 3 Solved: No More Uncertainty

Original Problem: Network partition during event publishing creates uncertainty - did it succeed or fail?

With Outbox:

Network issues between your service and the message broker don't matter during the write phase
The service only writes to PostgreSQL (local, reliable)
The transaction either commits or it doesn't - there's no ambiguity
If network issues occur during the forward phase (forwarder → Redis), the forwarder simply retries
The offset tracker ensures the forwarder knows exactly which messages have been successfully forwarded

Result: No uncertainty. Clear, deterministic behavior.

There's One Catch: Idempotency

Notice in the diagram earlier that if the forwarder crashes after publishing to Redis but before marking the message as forwarded, it will republish the message when it restarts:

This means your event handlers must be idempotent - they must handle the same event multiple times without causing incorrect behavior. This is a standard requirement for distributed systems with at-least-once delivery guarantees.

For example:

func (h *BookingHandler) HandleBookingMade(event BookingMade) error {
    exists, err := h.repo.BookingProcessed(event.BookingID)
    if err != nil {
        return err
    }
    if exists {
        // Already processed, safe to skip
        return nil
    }
    err = h.thirdPartyService.NotifyBooking(event)
    if err != nil {
        return err
    }
    return h.repo.MarkBookingProcessed(event.BookingID)
}

Trade-offs and Considerations

The outbox pattern isn't free. Here's what you gain and what you give up:

Benefits

Atomic writes: Your data and events are truly consistent. Either both are saved or neither is.
No lost events: Events are persisted durably before any async processing happens.
No phantom events: If the transaction fails, the event doesn't exist.
Built-in crash recovery: The forwarder automatically resumes from where it left off.
Clear error handling: Errors during the write phase cause transaction rollback. Errors during forwarding are retried.

Trade-offs

Additional complexity: You need to run and monitor a forwarder process.
Eventual consistency: There's a delay between committing the transaction and the event reaching subscribers. This is usually milliseconds, but it's not instant.
Requires idempotent handlers: Your event consumers must handle duplicate events gracefully.
Database is temporary event store: Your PostgreSQL database now stores events temporarily. You'll need to clean up old messages periodically.
Transaction isolation matters: The ticketing system uses sql.LevelSerializable to prevent race conditions during seat checks. This can impact throughput under high concurrency.

Implementation Details Worth Noting

Offset Tracking: The forwarder uses DefaultPostgreSQLOffsetsAdapter to track which messages have been forwarded. This is stored in a separate table:

CREATE TABLE watermill_offsets (
    topic VARCHAR(255) NOT NULL,
    consumer_group VARCHAR(255) NOT NULL,
    offset_value BIGINT NOT NULL,
    PRIMARY KEY (topic, consumer_group)
);

Decorators for Observability: The decorator pattern adds correlation IDs and OpenTelemetry traces at both the message level and the envelope level, giving you end-to-end visibility:

When a booking request comes in, it gets a correlation ID
This ID is attached to the database transaction
It's embedded in the event payload
It's forwarded through the envelope
You can trace the entire flow from HTTP request → database → outbox → forwarder → Redis → event handler

Schema Auto-initialization: The forwarder is configured with InitializeSchema: true, so it automatically creates the necessary tables on startup. This simplifies deployment but means your application needs DDL permissions.

Conclusion

The Outbox Pattern is a powerful solution for maintaining consistency between database writes and event publishing in distributed systems. By storing both business data and events in the same database transaction, you eliminate the three critical failure modes we explored:

Lost Events - When services crash after committing but before publishing
Phantom Events - When events are published but transactions roll back, leading to data inconsistencies
Zombie Events - When network partitions create uncertainty about whether events were delivered

While the pattern introduces additional complexity through the forwarder process and requires idempotent event handlers, these trade-offs are worthwhile for systems where data consistency is critical. In our ticketing system example, preventing overbooking is non-negotiable, making the Outbox Pattern an essential architectural choice.

The implementation using Watermill demonstrates that this pattern can be applied practically in Go applications with relatively straightforward code. The key insight is simple but powerful: if you can't make two systems transactional together, make one system the source of truth and derive the other from it asynchronously.

Whether you're building a ticketing system, an e-commerce platform, or any distributed system where events must reliably reflect state changes, the Outbox Pattern provides a battle-tested approach to achieving consistency without sacrificing reliability.

The Business Context

Scenario 1: Commit First, Publish After (The Lost Event)

What Goes Wrong

The outcome

Why It's Dangerous

Scenario 2: Publish First, Commit After (The Phantom Event)

What Goes Wrong

The outcome:

Why It's Worse

Scenario 3: The Network Partition (The Zombie Event)

What Goes Wrong

The Dilemma

The Outbox Pattern: A Solution to All Three Scenarios

The Real Implementation

1. The Booking Repository

2. The Outbox Publisher

3. The Forwarder

How the Outbox Solves Each Scenario

Scenario 1 Solved: No More Lost Events

Scenario 2 Solved: No More Phantom Events

Scenario 3 Solved: No More Uncertainty

There's One Catch: Idempotency

Trade-offs and Considerations

Benefits

Trade-offs

Implementation Details Worth Noting

Conclusion

Sources