Articles Snippets Projects

The Flaky Test Chronicles III: The Determinism Principle

Time, Randomness, and Predictable Test Data

December 26th ʼ25 2 months ago 16 min 3030 words

The test passed at 11:59 PM. It failed at midnight. Same code. Same data. Different day.

You run the test. It passes. You run it again. It fails. You run it five more times. Three passes, two failures. You haven't changed anything. The code is the same. The data is the same. But something is different.

In Part 2: Mock Madness, we conquered Mockery's alias and overload mocks. Now we face the enemies of determinism - the forces that make tests behave differently on each run:

  • Time - milliseconds pass between operations

  • Randomness - Faker values change every run

  • Data order - database records with identical timestamps

  • Factory states - implicit defaults that vary


The Timestamp Trap

The most insidious flaky tests are time-related. The code works correctly. The test looks reasonable. It passes locally. It fails in CI at random.

You spend hours debugging. Then you realize: millisecond differences.

#[Test]
public function it_reports_overdue_order(): void
{
    $order = Order::factory()->create();

    $expectedPayload = OrderReportDTO::fromOrder($order);

    ProcessOrderJob::dispatchSync($order->id);

    // FAILS: payload timestamps differ by milliseconds
    $this->assertDatabaseHas('order_reports', [
        'payload' => $expectedPayload->toString(),
    ]);
}

When $requestDTO is created, timestamp is X. When the job runs, timestamp is Y. The difference is 3 milliseconds. JSON payloads don't match. Test fails.


When to Freeze Time

Situations where you need to freeze time:

  1. Payload/DTO comparisons - Timestamps in serialized data

  2. Business day calculations - When day of week matters

  3. Date-dependent assertions - “Created today”,Expires in 30 days”

  4. Scheduling tests - Cron jobs, delayed jobs


The Five Methods of Time Control

Laravel provides powerful tools for controlling time.

Method 1: Simple Freeze

The most basic approach. startOfMinute() eliminates millisecond differences.

$this->travelTo(now()->startOfMinute());

// All now() calls return the frozen time
$order = Order::factory()->create();

$this->assertEquals(now(), $order->created_at);

Method 2: Freeze to Specific Day

When day of week matters:

// Freeze to next Monday for day-of-week dependent tests
$this->travelTo(now()->next('Monday'));

$paymentDate = $account->getNextPaymentDate();

$this->assertEquals(now()->next('Wednesday'), $paymentDate);

Method 3: Freeze with Callback

Time is frozen only within the callback:

// Time is frozen only within the callback
$this->travelTo(now()->subDay(), function () {
    $this->getJson('middleware-test-route')->assertOk();
});

// Time is back to normal here

Method 4: Travel Back Explicitly

When you want to handle cleanup yourself:

$this->travelTo(now()->next('Monday'));

// ... test logic ...

$this->travelBack();  // Restore original time

Method 5: Date::setTestNow()

use Illuminate\Support\Facades\Date;

Date::setTestNow(Date::parse('2025-12-15 12:51:25'));

// ... test logic ...

Date::setTestNow(); // Always reset (see Part 4: The Teardown Tango for tearDown ordering)

The created_at > Nightmare

Tests can become flaky when code checks “does a newer record exist?” and the test creates multiple related records back-to-back. Millisecond differences between records can flip the boolean result.

Before (Flaky):

// Code under test:
// $hasNewerSubmission = $user->submissions()
//     ->where('created_at', '>', $submission->created_at)
//     ->exists();

$submissions = Submission::factory()
    ->for($user)
    ->sequence(
        ['channel' => ChannelType::Web],
        ['channel' => ChannelType::Mobile],
    )
    ->rejected()
    ->count(2)
    ->create();

// Depending on tiny timestamp differences, the "web" submission
// can be filtered out.

After (Stable):

$createdAt = now()->startOfSecond();

$submissions = Submission::factory()
    ->for($user)
    ->sequence(
        ['channel' => ChannelType::Web],
        ['channel' => ChannelType::Mobile],
    )
    ->rejected()
    ->count(2)
    ->create([
        'created_at' => $createdAt,
        'updated_at' => $createdAt,
    ]);

// Verify all records have the same timestamp
$this->assertCount(
    1,
    $submissions
        ->pluck('created_at')
        ->map(fn ($d) => $d->format('Y-m-d H:i:s'))
        ->unique()
);

Real-World Example: Overdue Order

Before (Flaky - timestamp mismatch):

public function it_reports_overdue_order(): void
{
    $order = Order::factory()->create();

    $expectedPayload = OrderReportDTO::fromOrder($order);

    ProcessOrderJob::dispatchSync($order->id);

    // FAILS: payload timestamps differ by milliseconds
    $this->assertDatabaseHas('order_reports', [
        'payload' => $expectedPayload->toString(),
    ]);
}

After (Stable - time frozen):

public function it_reports_overdue_order(): void
{
    // Freeze time at the start of minute for consistent timestamps
    $this->travelTo(now()->startOfMinute());

    $order = Order::factory()->create();

    $expectedPayload = OrderReportDTO::fromOrder($order);

    ProcessOrderJob::dispatchSync($order->id);

    // PASSES: timestamps are identical
    $this->assertDatabaseHas('order_reports', [
        'payload' => $expectedPayload->toString(),
    ]);
}

The Date-Only Illusion

You need to compare a deadline. The API gives you "2025-01-15" as a string. Simple enough - you parse it into a Carbon object and compare. Test passes. Ship it.

Three weeks later, CI starts failing. But only sometimes. And only in the afternoon.

Here's what you didn't realize: when you create a date object with just a date string, the time component depends on how you create it:

// ✅ Safe: These default to 00:00:00
Carbon::parse('2025-01-15');                    // 2025-01-15 00:00:00
Carbon::create(2025, 1, 15);                    // 2025-01-15 00:00:00
new DateTime('2025-01-15');                     // 2025-01-15 00:00:00

// ⚠️ Dangerous: These use CURRENT time!
Carbon::createFromFormat('Y-m-d', '2025-01-15');   // 2025-01-15 14:32:17
DateTime::createFromFormat('Y-m-d', '2025-01-15'); // 2025-01-15 14:32:17

The createFromFormat() trap is particularly sneaky. It preserves whatever time components aren't in your format string - which means it uses the current wall clock time. And there it is - your afternoon failures explained.

The flaky scenario:

$deadline = Carbon::createFromFormat('Y-m-d', '2025-01-15');
$order = Order::factory()->create([
    'deadline' => '2025-01-15 10:00:00',
]);

// Is the order before the deadline?
$this->assertTrue($order->deadline <= $deadline);

// At 09:00 → FAILS (10:00 is NOT <= 09:00)
// At 11:00 → PASSES (10:00 <= 11:00)
// Same code, different results based on wall clock!

The fix: Know what each method does, and be explicit when needed:

// Option 1: Use parse() - it defaults to 00:00:00
$deadline = Carbon::parse('2025-01-15');

// Option 2: Chain startOfDay() to be explicit
$deadline = Carbon::createFromFormat('Y-m-d', '2025-01-15')
    ->startOfDay();

// Option 3: Include time in the format string
$deadline = Carbon::createFromFormat('Y-m-d H:i:s', '2025-01-15 00:00:00');

// Option 4: Chain endOfDay() for "until end of" comparisons
$deadline = Carbon::parse('2025-01-15')->endOfDay();  // 23:59:59

The Randomness Illusion

Random values in tests might seem like a good idea. “It’ll catch edge cases,” you think. But they actually cause more problems than they solve.

Why Random Values Hurt

// Random amount might cross approval threshold
$order = Order::factory()->create([
    'amount' => $this->faker->numberBetween(500, 1500),
]);

// Orders over $1000 require manager approval
// 500-999: no approval needed ✓
// 1000-1500: approval needed ✗
$this->assertFalse($order->requiresApproval());  // Flaky!

Problems:

  1. Flaky tests - Same test passes/fails depending on random values

  2. Hard to debug - Can't consistently reproduce failures

  3. Misleading coverage - Test might not actually cover all cases

  4. CI instability - Random failures erode trust in test suite

The Solution: Deterministic Values

Before:

// Amount randomly crosses approval threshold
$order = Order::factory()->create([
    'amount' => $this->faker->numberBetween(500, 1500),
]);

// Expiration randomly in past or future
$subscription = Subscription::factory()->create([
    'expires_at' => $this->faker->dateTimeBetween('-1 week', '+1 week'),
]);

After:

// Explicitly below $1000 threshold
$order = Order::factory()->create([
    'amount' => 800,
]);

// Explicitly in the future
$subscription = Subscription::factory()->create([
    'expires_at' => now()->addWeek(),
]);

Use DataProviders Instead

Instead of randomness, use DataProviders to test multiple cases deterministically:

#[DataProvider('paymentStatuses')]
#[Test]
public function it_handles_payment_status(PaymentStatus $status): void
{
    $order = Order::factory()->create([
        'payment_status' => $status,
    ]);

    // Test each status deterministically
}

public static function paymentStatuses(): iterable
{
    yield 'draft' => [PaymentStatus::DRAFT];
    yield 'partly paid' => [PaymentStatus::PARTLY_PAID];
    yield 'fully paid' => [PaymentStatus::FULLY_PAID];
}

For simpler cases, PHPUnit 10+ offers #[TestWith] - inline data without a separate method:

use PHPUnit\Framework\Attributes\TestWith;

#[TestWith([800, false], 'below threshold')]
#[TestWith([1000, true], 'at threshold')]
#[TestWith([1500, true], 'above threshold')]
#[Test]
public function it_requires_approval_based_on_amount(
    int $amount,
    bool $requiresApproval
): void {
    $order = Order::factory()->create(['amount' => $amount]);

    $this->assertEquals($requiresApproval, $order->requiresApproval());
}

When Random IS Acceptable

// OK: Random data that doesn't affect assertions
$user = User::factory()->create([
    'name' => $this->faker->name(),  // Doesn't matter for test
]);

// NOT OK: Random data that affects test logic
$order = Order::factory()->create([
    'amount' => $this->faker->numberBetween(1000, 5000),  // Affects assertions!
]);

DataProvider Deep Dive

PHPUnit's DataProvider feature is powerful but has subtleties that can lead to unexpected behavior.

The Boot Order Problem

DataProviders run BEFORE Laravel boots.

This means:

  • now() returns a regular Carbon instance, not the immutable version configured in your service provider

  • Application configuration isn't available

  • Dependency injection doesn't work

Before (Unexpected behavior):

public static function getLimitExpirationDate(): iterable
{
    // PROBLEM: now() is called before Laravel boots!
    // This doesn't return CarbonImmutable as expected
    yield 'With expiration' => [now()->addDays(30)];
    yield 'Without expiration' => [null];
}

#[DataProvider('getLimitExpirationDate')]
#[Test]
public function it_handles_expiration(?CarbonInterface $expirationDate): void
{
    // $expirationDate is not CarbonImmutable!
}

After (Use string modifiers):

public static function getLimitExpirationDate(): iterable
{
    // Use string modifiers instead of Carbon instances
    yield 'With expiration' => ['+30 days'];
    yield 'Without expiration' => [null];
}

#[DataProvider('getLimitExpirationDate')]
#[Test]
public function it_handles_expiration(?string $expirationDateModifier): void
{
    // Convert to date inside the test where Laravel is booted
    $expirationDate = $expirationDateModifier
        ? now()->modify($expirationDateModifier)
        : null;

    // Now $expirationDate is CarbonImmutable as expected
}

Safe vs Unsafe in DataProviders

Safe

Unsafe

Scalar values (strings, int, bool)

now(), today()

Enums

config()

Static data

app()

Any Laravel facade

PHPUnit 10+ Attribute Syntax

PHPUnit 10 replaced the old @dataProvider annotation with PHP 8 attributes:

Old Style (PHPUnit 9 and earlier):

/**
 * @dataProvider applicationStatuses
 */
public function test_something(ApplicationStatus $status): void
{
    // ...
}

public function applicationStatuses(): array
{
    return [
        [ApplicationStatus::PENDING],
        [ApplicationStatus::APPROVED],
    ];
}

New Style (PHPUnit 10+):

use PHPUnit\Framework\Attributes\DataProvider;
use PHPUnit\Framework\Attributes\Test;

#[DataProvider('applicationStatuses')]
#[Test]
public function test_something(ApplicationStatus $status): void
{
    // ...
}

public static function applicationStatuses(): iterable
{
    yield 'pending' => [ApplicationStatus::PENDING];
    yield 'approved' => [ApplicationStatus::APPROVED];
}

Key Differences:

Aspect

Old Style

New Style

Inline

Syntax

@dataProvider

#[DataProvider]

#[TestWith]

Method

Instance

Static

None

Return

array

iterable

N/A

IDE Support

Limited

Full

Full

Named Data Sets

Use named data sets to make test output readable:

public static function getLatencies(): iterable
{
    yield '35 days overdue' => [35];
    yield '65 days overdue' => [65];
    yield '95 days overdue' => [95];
}

// Test output shows:
// ✓ it reports overdue order with data set "35 days overdue"
// ✓ it reports overdue order with data set "65 days overdue"
// ✓ it reports overdue order with data set "95 days overdue"

#[TestWith] also supports named data sets (PHPUnit 11+) with a second parameter:

#[TestWith([800, false], 'below threshold')]
#[TestWith([1000, true], 'at threshold')]
#[TestWith([1500, true], 'above threshold')]

// Output: ✓ it_requires_approval with data set "below threshold"

Factory Patterns for Test Data

Factories are the most powerful way to create test data. But when used incorrectly, they can lead to flaky tests.

factory()->create() vs firstOrCreate()

Use factory()->create() for:

  • Test-specific data (should be isolated)

  • Data that can differ between tests

  • Most test scenarios

Use firstOrCreate() for:

  • Shared reference data (workflow types, status codes)

  • Data seeded by migrations

  • Data that must be consistent across tests

// ShippingMethod is reference data - use firstOrCreate
$shipping = ShippingMethod::firstOrCreate(
    ['code' => 'express'],
    ['name' => 'Express Delivery', 'days' => 1]
);

// Order is test data - use factory
$order = Order::factory()
    ->for($shipping)
    ->create();

Use Correct Factory State Methods

Using the wrong factory state creates data that doesn't match the test scenario.

Example: Ready-to-Ship Order

A “ready-to-ship” order requires specific workflow states:

  • PaymentVerified: DONE

  • InventoryReserved: DONE

  • ShippingLabelGenerated: NOT done

Wrong (All workflows done):

// This marks ALL steps as done - including shipping!
$order = Order::factory()
    ->withAllStepsCompleted()
    ->create();

Correct (Specific workflow states):

// This creates the correct state for a ready-to-ship order
$order = Order::factory()
    ->readyToShip()  // Payment ✓, Inventory ✓, Shipping label ✗
    ->create();

The Determinism Checklist

Use this checklist when writing or reviewing tests:

  • Time is frozen with travelTo() where necessary

  • Random values don't affect test logic or counts

  • DataProviders don't use now(), config(), or facades

  • Back-to-back created records have explicit timestamps

  • Factory states match test scenario

  • Multiple cases use DataProviders (not randomness)

  • Fixed timestamps instead of natural language date parsing ('Today, 17:00')


What's Next

Missed the previous part? Part 2: Mock Madness covers Mockery's alias and overload mocks - the patterns that break parallel testing.

Part 4: The Teardown Tango covers test infrastructure:

  • Why tearDown() method ordering is critical

  • The mysterious errors caused by Sushi models

  • Proper use of HTTP, Event, and Queue fakes

  • The ultimate testing checklist

See you there.


The Flaky Test Chronicles is a series documenting what we learned from 300+ commits of test suite cleanup. May your tests pass at midnight the same way they pass at noon.