performance

How Load Testing with K6 Unveils Performance Bottlenecks

Discover how K6 Load Testing identifies and resolves performance issues in web applications, ensuring smooth user experiences and optimal performance.

October 13, 2025
K6 load-testing performance bottlenecks web-applications user-experience testing-tools
3 min read

Why performance bottlenecks matter more than ever

Users expect fast, fluid web experiences. They don’t care whether the slow-down comes from a misconfigured database pool, a saturated CPU, or a chatty microservice call. They just notice the lag—and leave. That’s why catching performance bottlenecks before they hit production is crucial for conversion, retention, and search rankings.

Load testing with K6 helps you find these bottlenecks early, validate your performance budgets, and build confidence in your architecture. K6’s developer-friendly scripting and rich metrics make it especially effective at uncovering the “knee” where your system shifts from smooth and scalable to overloaded and flaky. In this guide, you’ll learn how K6 reveals bottlenecks, how to design realistic tests, how to interpret K6 results, and what to do with what you discover.


What K6 brings to the table

K6 is an open-source load testing tool that lets you write tests in JavaScript and run them locally, in CI, or in the cloud. It’s efficient, scriptable, and designed to fit modern pipelines.

What you’ll use most in K6:

  • Scripting with JS: Define user flows and APIs to test.
  • Powerful scenarios: Model realistic traffic patterns and workloads.
  • Rich metrics: Latency percentiles, error rates, HTTP timings, and custom metrics.
  • Checks and thresholds: Leave “stoplight” rules that pass/fail CI based on SLOs.
  • Outputs and integrations: Stream to Grafana/Prometheus or analyze command-line summaries.

K6’s value isn’t in synthetically measuring “fast or slow” once. It’s in consistently revealing where systems fail under load, how they degrade as traffic rises, and whether fixes actually work.


What a “bottleneck” looks like in K6

Bottlenecks show up in K6 through specific signals:

  • Rising percentiles (p95/p99) while RPS stays flat: Saturation—queues or CPU-limited.
  • Spiking error rates (http_req_failed): Timeouts, connection resets, 5xx/429.
  • Disproportionate HTTP timing components:
    • High connect or TLS handshaking times: Networking/TLS overhead or connection churn.
    • High waiting time (TTFB): Server-side processing delays—DB, locks, GC, thread pools.
    • High receiving time: Payload size, compression, network throughput.
  • Active VUs and iteration rate diverging: Your test is ready to push, but the system (or test rig) can’t keep up.

When those symptoms appear, K6 helps you tie them back to root causes through tagging, grouping, and targeted scenarios.


Plan your test like a pro

Don’t start by hammering production. A thoughtful plan reveals bottlenecks faster and more safely.

  1. Define SLOs and performance budgets:

    • Example: “p95 latency < 300 ms at 500 RPS; error rate < 1%; CPU < 70%.”
    • Turn these into thresholds that can fail builds automatically.
  2. Understand real traffic:

    • Peak RPS, concurrency patterns, arrival bursts, user flows, cache hit ratios.
    • Include think time and pacing to mirror actual user behavior.
  3. Prepare test data:

    • Seed realistic datasets, avoid hot-caching a single record, generate unique IDs.
    • Ensure idempotency or careful cleanup for write-heavy tests.
  4. Choose the right environment:

    • Production-like topology, realistic limits on thread/connection pools.
    • Observe with APM, tracing, and logs for correlation.
  5. Decide test types:

    • Smoke: Quick sanity.
    • Load: Validate SLOs at expected peak.
    • Stress: Find breaking point and recovery behavior.
    • Soak: Catch leaks, slow degradation, and queue backlogs.

A simple K6 smoke test to catch the obvious

Start by verifying your target responds correctly under tiny load.

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 5,
  duration: '1m',
  thresholds: {
    http_req_failed: ['rate<0.01'], // <1% errors
    http_req_duration: ['p(95)<500'], // <500ms p95
  },
};

export default function () {
  const res = http.get('https://api.example.com/health');
  check(res, {
    'status is 200': r => r.status === 200,
    'content-type is json': r => r.headers['Content-Type']?.includes('application/json'),
  });
  sleep(1);
}

Actionable tip:

  • Always run a smoke test before bigger tests. It catches DNS issues, bad endpoints, and misconfs early so you don’t waste time.

Modeling realistic traffic with scenarios

Don’t rely only on fixed VU ramps. Use arrival-rate executors to simulate users arriving at a realistic rate and keep concurrency truly representative.

import http from 'k6/http';
import { check, sleep, group } from 'k6';

export const options = {
  scenarios: {
    peak_shopping: {
      executor: 'ramping-arrival-rate',
      startRate: 50,            // RPS at start
      timeUnit: '1s',
      preAllocatedVUs: 100,     // pool of VUs K6 can use
      maxVUs: 500,
      stages: [
        { duration: '2m', target: 200 }, // ramp to 200 RPS
        { duration: '5m', target: 200 }, // hold
        { duration: '2m', target: 400 }, // ramp to 400 RPS
        { duration: '5m', target: 400 }, // hold
      ],
      tags: { test: 'load', flow: 'shopping' },
    },
  },
  thresholds: {
    'http_req_failed{test:load}': ['rate<0.01'],
    'http_req_duration{test:load}': ['p(95)<300', 'p(99)<500'],
    'http_req_waiting{group:::checkout}': ['p(95)<200'], // server time
  },
};

export default function () {
  group('browse', () => {
    const res = http.get('https://shop.example.com/products?category=shoes');
    check(res, { '200 on browse': r => r.status === 200 });
    sleep(1 + Math.random()); // think time
  });

  group('product', () => {
    const res = http.get(`https://shop.example.com/product/${Math.floor(Math.random()*10000)}`);
    check(res, { '200 on product': r => r.status === 200 });
    sleep(1);
  });

  group('checkout', () => {
    const addToCart = http.post('https://shop.example.com/cart', JSON.stringify({ id: 42, qty: 1 }), {
      headers: { 'Content-Type': 'application/json' },
    });
    check(addToCart, { 'cart add ok': r => r.status === 200 });

    const checkout = http.post('https://shop.example.com/checkout', JSON.stringify({ token: 'fake' }), {
      headers: { 'Content-Type': 'application/json' },
    });
    check(checkout, { 'checkout ok': r => r.status === 200 || r.status === 201 });

    sleep(2);
  });
}

Why it helps:

  • Group-based thresholds show exactly which flow slows down first.
  • Arrival rate exposes how your system handles constant traffic pressure.
  • Think time prevents synthetic “infinite loop” pressure unrepresentative of real users.

Parameterize test data to avoid cache illusions

Hitting the same record can exaggerate cache hit rates. Use SharedArray and external data files to diversify inputs.

import http from 'k6/http';
import { SharedArray } from 'k6/data';

const products = new SharedArray('products', () => JSON.parse(open('./products.json')));

export default function () {
  const item = products[Math.floor(Math.random() * products.length)];
  http.get(`https://shop.example.com/product/${item.id}`);
}

Actionable tips:

  • Include cache-warming stages at the start of a scenario.
  • Test both cold and warm cache behaviors to understand real-world patterns.

Turn SLOs into hard gates with thresholds

Thresholds fail the test when performance budgets are violated. This is how you keep regressions out of main.

export const options = {
  thresholds: {
    http_req_failed: ['rate<0.01'],
    http_req_duration: ['p(95)<300', 'p(99)<500'],
    'http_req_duration{group:::checkout}': ['p(95)<350'],
  },
};

Checks are assertions within an iteration; thresholds are global pass/fail rules. Use both.


Understand K6’s HTTP timings to pinpoint where time is spent

K6 publishes granular metrics that reveal specific bottlenecks:

  • http_req_duration: Total time.
  • http_req_waiting: Server processing (TTFB).
  • http_req_connecting: TCP connection establishment.
  • http_req_tls_handshaking: TLS negotiation time.
  • http_req_blocked: Waiting for a connection (client-side queuing).
  • http_req_sending: Upload time.
  • http_req_receiving: Download time.

Common patterns:

  • High waiting: Server-side slowness (DB, app CPU, locks, I/O).
  • High connecting/TLS: Connection churn, lack of keep-alives, TLS overhead.
  • High blocked: Not enough sockets or connection reuse; client/test rig limits.
  • High receiving: Large payloads; compression misconfigured; CDN opportunities.

Actionable tip:

  • Enable HTTP/2 and keep-alive on servers. Watch http_req_connecting and http_req_tls_handshaking drop dramatically.

Custom metrics: capture what the server is telling you

You can parse response headers or bodies and record domain-specific metrics.

import http from 'k6/http';
import { Trend, Counter, Rate } from 'k6/metrics';

const queueWait = new Trend('queue_wait_ms', true);
const dbTime = new Trend('db_time_ms', true);
const cacheHit = new Rate('cache_hit');
const appErrors = new Counter('app_errors');

export default function () {
  const res = http.get('https://api.example.com/orders/123', {
    headers: { 'Accept': 'application/json' },
  });

  // Suppose your API exposes internal timings for observability
  queueWait.add(Number(res.headers['X-Queue-Wait'] || 0));
  dbTime.add(Number(res.headers['X-DB-Time'] || 0));
  cacheHit.add((res.headers['X-Cache'] || '') === 'HIT');
  if (res.status >= 500) appErrors.add(1);
}

This lets you separate “network vs. server vs. database” in your graphs.


Stress and soak tests to find the real breaking points

Use stress tests to find the knee in the throughput/latency curve—where additional load causes latency to explode.

export const options = {
  scenarios: {
    stress: {
      executor: 'ramping-arrival-rate',
      startRate: 100,
      timeUnit: '1s',
      preAllocatedVUs: 400,
      maxVUs: 2000,
      stages: [
        { duration: '2m', target: 300 },
        { duration: '2m', target: 600 },
        { duration: '2m', target: 900 },
        { duration: '2m', target: 1200 },
        { duration: '5m', target: 1200 },
      ],
      tags: { test: 'stress' },
    },
  },
  thresholds: {
    'http_req_failed{test:stress}': ['rate<0.02'], // allow a bit more under extreme
    'http_req_duration{test:stress}': ['p(95)<800'],
  },
};

Use soak tests to catch memory leaks, connection leaks, and slow drifts.

export const options = {
  vus: 50,
  duration: '2h',
  discardResponseBodies: true,
  thresholds: { http_req_failed: ['rate<0.01'] },
  summaryTrendStats: ['avg', 'min', 'med', 'p(90)', 'p(95)', 'p(99)', 'max'], // more insight
};

What to look for:

  • Soak test p95 gradually increasing: GC pressure, queue backlog, or a leak.
  • Stress test latency spike at a specific RPS: That’s your knee; investigate the first subsystem saturating.

Interpreting results to uncover specific bottlenecks

  1. Database bottlenecks:

    • Symptoms: High http_req_waiting; p95 grows sharply as RPS rises; DB CPU/IO high; slow queries appear.
    • Fixes:
      • Add missing indexes; denormalize hotspots; avoid N+1 queries.
      • Use read replicas and caching (Redis).
      • Batch writes; use pagination/limits on heavy reads.
  2. Connection pool exhaustion:

    • Symptoms: Increased timeouts; elevated 5xx/429; wait times spike; app logs show pool timeouts.
    • Fixes:
      • Right-size pools per service; tune Hikari/pgBouncer configs.
      • Reuse connections; keep-alive; HTTP/2; reduce per-request DB connections.
  3. CPU saturation:

    • Symptoms: p95 tracks CPU utilization; throughput flattens while latency rises.
    • Fixes:
      • Optimize code paths; reduce JSON serialization; enable gzip/br; cache templates.
      • Scale horizontally; tune thread pools; use async I/O where appropriate.
  4. Cache inefficiency:

    • Symptoms: Cold vs. warm test delta is large; high DB time on repeated requests; cache miss headers.
    • Fixes:
      • Improve TTL/keys; pre-warm caches; implement request coalescing.
      • Introduce CDN for static content and cacheable API responses.
  5. TLS/connection overhead:

    • Symptoms: High http_req_tls_handshaking and http_req_connecting.
    • Fixes:
      • Enable keep-alives; HTTP/2; tune idle timeouts; reuse DNS/TCP.
      • Consider session resumption, OCSP stapling.
  6. Network throughput/payload size:

    • Symptoms: High http_req_receiving; large responses; slow clients.
    • Fixes:
      • Compress responses; reduce payload via fields filtering; support range requests/CDN.
  7. Thread/queue backpressure:

    • Symptoms: Rising queue_wait header trend; increasing http_req_waiting without corresponding RPS jump.
    • Fixes:
      • Increase worker concurrency carefully; shorten critical sections; shed load; add circuit breakers.
  8. Memory leaks and GC:

    • Symptoms: Soak test shows increasing latency and heap; occasional pauses.
    • Fixes:
      • Profile memory; pool objects; tune GC; fix reference leaks.

Actionable tip:

  • Correlate K6 metrics with APM/tracing. Start a test, capture a trace snapshot, and identify the longest span or the most contended resource.

Add tags and groups to isolate problem areas

Use tags to slice metrics by endpoint, version, or feature flag.

import http from 'k6/http';
import { group } from 'k6';

export default function () {
  group('search', () => {
    http.get('https://api.example.com/search?q=boots', { tags: { endpoint: 'search', version: 'v2' } });
  });
  group('cart', () => {
    http.post('https://api.example.com/cart', JSON.stringify({ id: 101, qty: 1 }), {
      headers: { 'Content-Type': 'application/json' },
      tags: { endpoint: 'cart', version: 'v1' },
    });
  });
}

Then set thresholds per tag:

  • 'http_req_duration{endpoint:search}': ['p(95)<250']

This pinpoints which area first violates budgets under load.


Shift left with CI: fail fast on performance regressions

Make performance part of your pipeline. A minimal GitHub Actions example:

name: k6-load-test
on:
  pull_request:
jobs:
  k6:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install k6
        run: |
          curl -sSL https://github.com/grafana/k6/releases/latest/download/k6-v0.49.0-linux-amd64.tar.gz -o k6.tgz
          tar xzf k6.tgz
          sudo mv k6-v*/k6 /usr/local/bin/k6
      - name: Run test
        run: k6 run tests/load.js

Best practices:

  • Start with a low-RPS sanity test in PRs; run heavier tests nightly.
  • Use thresholds as gates so regressions block merges.
  • Store test artifacts: summaries, logs, and key metric snapshots.

Stream K6 metrics to your observability stack

Local summaries are helpful, but real insight comes from correlating K6 with system metrics:

  • Export to Prometheus or InfluxDB and visualize in Grafana.
  • Compare K6’s http_req_waiting with DB CPU, thread pool usage, GC, and container limits.
  • Tag K6 requests with a unique header (e.g., X-Test-Run-ID) to filter logs and traces.

Actionable tip:

  • Emit a unique ID per iteration for sampling traces:
    • const runId = __ENV.RUN_ID || Date.now().toString();
    • Pass headers: { 'X-Test-Run-ID': runId, 'X-Request-ID': ${runId}-${__ITER} }

Practical example: discovering a DB bottleneck step by step

  1. Baseline:

    • Load test at 200 RPS: p95 = 220 ms, errors < 0.2%. All good.
  2. Increase to 400 RPS:

    • p95 rises to 380 ms, p99 at 650 ms. http_req_waiting accounts for the increase.
    • App CPU ~65%, DB CPU ~85%. Slow query logs show repeated full scans.
  3. Fix:

    • Add index to the filter column. Tune connection pool size from 50 to 80, enable prepared statements.
  4. Retest:

    • At 400 RPS, p95 = 250 ms, p99 = 400 ms. Errors < 0.5%. DB CPU drops to 60%.
    • The bottleneck shifts to application serialization (observed via CPU profiles).
  5. Next iteration:

    • Optimize JSON serialization, enable gzip. p95 improves again by ~30 ms.

This iterative loop—test, observe, fix, retest—is where K6 shines.


Another example: solving connection churn and TLS cost

Symptom:

  • http_req_tls_handshaking ~50 ms per request, http_req_connecting ~20 ms.
  • RPS plateaus even as VUs increase. Many short-lived connections in server logs.

Causes:

  • Clients not reusing connections; server keep-alive too short; HTTP/1.1 only.

Fixes:

  • Enable HTTP/2; set keep-alive and idle timeouts appropriately; validate connection reuse.
  • K6 should reuse connections by default; ensure you’re not setting “Connection: close”.

Result:

  • TLS and connect times collapse to near-zero after the first connection; total latency drops by ~70 ms at p95.

Realistic pacing and arrivals prevent misleading conclusions

Avoid the trap of “while(true) hit endpoint.” Incorporate:

  • Sleep/think time based on real user interaction intervals.
  • Jitter: Random sleep to avoid thundering herd in sync.
  • Arrival rate modeling: Users arrive randomly; a Poisson process is more realistic than fixed VU loops.

If you don’t pace realistically, you might declare the system “broken” when in fact your test simply created an unnatural burst pattern.


Don’t forget front-end performance

Although K6 is primarily API-focused, you can also use the browser module (k6/experimental/browser) for headless browser testing. This can catch bottlenecks such as:

  • Render-blocking resources.
  • Third-party script bloat.
  • Large images and layout thrashing.

Even if you focus on APIs, remember that reducing payload sizes and enabling HTTP/2 push/Server Push alternatives (or preload) can dramatically improve real user metrics.


Anti-patterns to avoid

  • Ignoring p95/p99: Averages won’t reveal tail latency that frustrates users.
  • Over-synthetic tests: No think time, no realistic data → misleading results.
  • Testing on dev laptops: The test rig becomes the bottleneck; use proper infrastructure.
  • No correlation: K6 metrics without APM/logs/traces hides root causes.
  • One-off testing: Load testing should be repeatable, versioned, and tied to deployments.
  • Overlooking warm-ups: Cold caches and JITs skew early results—allow warm-up stages.

Quick playbooks for common K6-detected bottlenecks

  • If http_req_waiting spikes first:

    • Add DB indexes, reduce synchronous I/O, profile hot code paths, enable caching.
  • If http_req_connecting or http_req_tls_handshaking is high:

    • Turn on keep-alives/HTTP/2, lengthen idle timeouts, reuse connections, reduce DNS latency.
  • If http_req_receiving is high:

    • Compress responses, paginate, cache static content via CDN, strip unused fields.
  • If http_req_blocked appears:

    • Increase max open files/sockets on the client; ensure K6 machine is not the bottleneck.
  • If error rate rises under load:

    • Inspect 5xx/429 breakdown; check upstream timeouts, rate limits, circuit breakers; add fallback caching.
  • If p95 grows slowly but steadily during soak:

    • Audit for memory leaks and queue buildup; inspect GC logs; check connection leaks.

Actionable checklist to get started

  1. Define SLOs: p95, error rate, throughput targets.
  2. Write a smoke test and set thresholds.
  3. Model realistic traffic with ramping-arrival-rate and think time.
  4. Parameterize data; warm caches.
  5. Add group-based thresholds to isolate flows.
  6. Stream K6 metrics to Grafana and correlate with APM/traces.
  7. Run a stress test to find the knee; document the capacity limit.
  8. Run a soak test to detect slow degradation.
  9. Fix the first bottleneck; rerun the same test for validation.
  10. Automate in CI with thresholds as gates.

Conclusion: make performance a habit, not a hero project

Bottlenecks aren’t bugs you fix once—they’re the natural result of growth, new features, and changing user behavior. K6 gives you a repeatable, developer-friendly way to see where your application slows, how it fails under pressure, and whether your fixes truly work. By modeling realistic traffic, enforcing thresholds, and correlating K6 metrics with your observability stack, you’ll continuously unveil performance bottlenecks before users feel them.

Build the loop: plan, test, observe, fix, and retest. Keep your SLOs tight, your tests versioned, and your graphs honest. That’s how you deliver smooth experiences at scale—and keep them that way.

Share this article
Last updated: October 12, 2025

Related performance Posts

Explore more insights on web performance, security, and quality

Why Multi-Region Website Speed Matters and How Web-PSQC’s 8-...

Explore the critical role of website speed in user experience across multiple re...

Want to Improve Your Website?

Get comprehensive website analysis and optimization recommendations.
We help you enhance performance, security, quality, and content.