guides

How to Properly Handle M-Pesa STK Push Timeouts and Disconnections Without Losing Transaction State

Illustration by TechInKenya

In our previous guide, we walked through a complete Daraja 3.0 STK Push integration from scratch: authentication, initiating the payment prompt, and handling the callback when Safaricom calls back with a result. If you followed that guide, you now have a working integration that handles the happy path well.

But here is the thing: the happy path is not where integrations break down. The happy path is when a customer picks up their phone immediately, enters their PIN within seconds, has full signal, and your server is reachable. That works. The real problems show up in every other scenario.

What happens when a customer enters their PIN but their phone loses signal before the callback reaches your server? What happens when Safaricom's callback takes 45 seconds and your server responds too slowly? What happens when your server was briefly down during a deployment, and the callback hit your endpoint while it was restarting? What happens when a customer panics and tries to pay twice because the first attempt looked like it failed?

These edge cases are not rare. They happen daily in production, and they cause real business problems: orders stuck in "pending" forever, customers getting charged but not receiving their goods, duplicate payments, and support tickets that are hard to resolve without a clear audit trail.

This guide is about fixing all of that. We will cover the complete defense strategy for STK Push reliability: understanding what can actually go wrong, building a state machine that survives network failures, using the STK Query API as your safety net, making your callback handler truly idempotent, and setting up a background reconciliation job that catches anything the real-time flow misses.

All code examples continue from the Node.js project structure we set up in the previous article.

Understanding What Can Actually Go Wrong

Before writing any code, it is worth being precise about the failure modes you are defending against. They are not all the same problem.

Scenario 1: The customer paid but the callback never arrived. The customer entered their PIN, M-Pesa debited their account and credited yours, but your server never received the callback. This happens when your server was temporarily unreachable, when the callback request timed out, or when a brief network hiccup caused the POST to your callback URL to fail entirely. The money moved. Your system just does not know about it.

Scenario 2: The callback arrived but your server crashed before processing it. Your callback endpoint received the POST from Safaricom and responded with a 200, but your server crashed or restarted before the database write completed. The money moved, Safaricom thinks you acknowledged it, but your database is still showing the order as pending.

Scenario 3: The STK prompt timed out. The customer had 60 seconds to enter their PIN and did not respond in time. Safaricom sends a callback with ResultCode 1037. Your system should mark the order as "expired" and offer the customer a retry. If you do not handle this properly, the order just sits in pending.

Scenario 4: The customer cancelled. The customer dismissed the STK prompt. ResultCode 1032 arrives. Same handling requirement as the timeout case.

Scenario 5: The callback arrived twice. Safaricom's documentation and real-world behavior confirm that callbacks can be delivered more than once, especially if your server's initial 200 response was slow or ambiguous. If your callback handler is not idempotent, this results in double-processing: orders marked as paid twice, duplicate fulfilment, or worse.

Scenario 6: The customer's network dropped between PIN entry and callback delivery. The most frustrating edge case. The customer is certain they paid (they saw their phone briefly show "processing"), but the callback has not arrived. The transaction may or may not have completed on M-Pesa's side. You genuinely do not know.

Each of these requires a specific response. The unifying principle across all of them is this: never trust only the callback, and always be able to answer the question "what is the current state of this transaction?" at any point in time.

Step 1: Design Your Transaction State Machine

The foundation of resilient STK Push handling is a well-designed state machine. Every transaction should move through clearly defined states, and your code should only be able to move a transaction forward, never backward, and never skip states.

Here is the state model to implement:

code

INITIATED --> PENDING --> COMPLETED
                     \--> FAILED
                     \--> EXPIRED
                     \--> QUERIED (intermediate, from STK Query)

INITIATED means you have sent the STK Push request to Daraja and received a CheckoutRequestID. You have not yet received any callback.

PENDING means the STK prompt has been sent to the customer's phone. This is still essentially the same as INITIATED for most purposes, but it is useful to distinguish between "we sent the request to Daraja" and "Daraja confirmed the prompt reached the customer."

COMPLETED means you received a callback with ResultCode 0 and have confirmed the payment. This is the only terminal success state. You should never move out of this state.

FAILED means you received a callback with a non-zero ResultCode (user cancelled, wrong PIN, insufficient funds, etc.), or you queried the transaction status and got a definitive failure response. This is a terminal failure state.

EXPIRED is a special variant of FAILED for timeout cases (ResultCode 1037) or when your own reconciliation job determines a transaction has been pending too long without resolution. Making this a separate state is useful because an expired transaction may be retried, while a FAILED transaction (wrong PIN, cancelled) typically should not be.

QUERIED is an optional intermediate state you can use to indicate that you have proactively queried this transaction's status via the STK Query API and are awaiting a definitive result.

Here is the database schema to implement this:

javascript

// migrations/create_mpesa_transactions.js
// Add this to whatever database migration tool you use.
// Pseudocode for the schema, adapt to your ORM.

{
  id: "uuid primary key",
  order_id: "string, references your orders table",
  checkout_request_id: "string unique not null",  // From STK Push response
  merchant_request_id: "string not null",          // From STK Push response
  phone_number: "string not null",
  amount: "integer not null",                      // In KES, whole numbers only
  status: "enum: INITIATED, PENDING, COMPLETED, FAILED, EXPIRED, QUERIED",
  result_code: "integer nullable",                 // From callback or query
  result_description: "string nullable",
  mpesa_receipt_number: "string nullable",         // Only on COMPLETED
  callback_received_at: "timestamp nullable",
  queried_at: "timestamp nullable",
  raw_callback: "json nullable",                   // Store the full payload
  raw_query_response: "json nullable",
  created_at: "timestamp",
  updated_at: "timestamp"
}

The most important fields here are checkout_request_id (your primary key for matching callbacks), status (your state machine), mpesa_receipt_number (the M-Pesa transaction identifier for reconciliation), and raw_callback (your audit trail). Never skip storing the raw callback payload. When something goes wrong at 2am, those logs are the only way to reconstruct what actually happened.

Step 2: Store Transaction State Immediately After Initiating

This is where most developers make their first mistake. They initiate the STK Push, wait for the callback, and only then write anything to the database. The problem with this approach is that if anything goes wrong between initiation and callback, you have no record of the transaction in your system.

The correct approach is to write the transaction record to your database immediately after receiving the STK Push response from Daraja, before the customer has even touched their phone.

Update utils/mpesa.js to return the full Daraja response, and update your /stkpush route:

javascript

// routes/stkpush.js
app.post("/stkpush", generateToken, async (req, res) => {
  const { phone, amount, orderId } = req.body;

  // Validate inputs first (as before)
  if (!phone || !amount || !orderId) {
    return res.status(400).json({ error: "phone, amount, and orderId are required" });
  }

  try {
    const result = await initiateSTKPush(
      phone, amount, orderId,
      `Payment for order ${orderId}`,
      req.access_token
    );

    // *** CRITICAL: Write to DB immediately, before responding to the client ***
    await db.mpesaTransactions.create({
      orderId,
      checkoutRequestId: result.CheckoutRequestID,
      merchantRequestId: result.MerchantRequestID,
      phoneNumber: formatPhoneNumber(phone),
      amount: Math.round(amount),
      status: "INITIATED",
    });

    // Now respond to the client
    res.status(200).json({
      message: "STK Push sent. Customer should receive a prompt shortly.",
      checkoutRequestId: result.CheckoutRequestID,
    });

  } catch (error) {
    console.error("STK Push failed:", error.response?.data || error.message);
    res.status(500).json({ error: "Failed to initiate payment" });
  }
});

The CheckoutRequestID is now the foreign key that connects your internal transaction record to everything that comes back from Daraja. Every callback lookup, every query, every reconciliation job, everything goes through this ID.

Step 3: Build an Idempotent Callback Handler

Your callback handler needs to do three things correctly: respond to Safaricom immediately, process the callback exactly once regardless of how many times it is delivered, and handle all ResultCode scenarios explicitly.

Here is a production-ready callback handler:

javascript

// routes/callback.js
app.post("/callback", async (req, res) => {
  // *** Respond to Safaricom IMMEDIATELY with HTTP 200 ***
  // Do this before any database operations. Safaricom expects a fast response.
  // If your handler is slow, Safaricom may retry the callback, causing duplicates.
  res.status(200).json({ ResultCode: 0, ResultDesc: "Accepted" });

  // Everything below runs asynchronously after we have already responded
  const callbackData = req.body;
  const { Body } = callbackData;

  if (!Body?.stkCallback) {
    console.error("Malformed callback payload:", JSON.stringify(callbackData));
    return;
  }

  const {
    ResultCode,
    ResultDesc,
    CheckoutRequestID,
    MerchantRequestID,
    CallbackMetadata,
  } = Body.stkCallback;

  // Look up the pending transaction by CheckoutRequestID
  const transaction = await db.mpesaTransactions.findOne({
    where: { checkoutRequestId: CheckoutRequestID },
  });

  if (!transaction) {
    // This can happen if the DB write in /stkpush failed, or for test callbacks
    console.error("No transaction found for CheckoutRequestID:", CheckoutRequestID);
    return;
  }

  // *** IDEMPOTENCY CHECK ***
  // If we have already processed this transaction, do nothing.
  // Safaricom can and does deliver the same callback more than once.
  if (transaction.status === "COMPLETED" || transaction.status === "FAILED") {
    console.log(`Duplicate callback ignored for ${CheckoutRequestID}. Status: ${transaction.status}`);
    return;
  }

  if (ResultCode === 0) {
    // Payment was successful
    const metadata = {};
    CallbackMetadata.Item.forEach((item) => {
      if (item.Value !== undefined) {
        metadata[item.Name] = item.Value;
      }
    });

    await db.mpesaTransactions.update(
      {
        status: "COMPLETED",
        resultCode: ResultCode,
        resultDescription: ResultDesc,
        mpesaReceiptNumber: metadata.MpesaReceiptNumber,
        callbackReceivedAt: new Date(),
        rawCallback: JSON.stringify(callbackData),
      },
      { where: { checkoutRequestId: CheckoutRequestID } }
    );

    // Trigger downstream fulfillment: mark order as paid, send confirmation email, etc.
    await fulfillOrder(transaction.orderId, metadata.MpesaReceiptNumber);

    console.log(`Payment COMPLETED: ${CheckoutRequestID} | Receipt: ${metadata.MpesaReceiptNumber}`);

  } else {
    // Payment failed, cancelled, or timed out
    // ResultCode 1032: User cancelled
    // ResultCode 1037: DS timeout (user did not respond within 60s)
    // ResultCode 2001: Wrong PIN (customer used incorrect PIN)
    const newStatus = ResultCode === 1037 ? "EXPIRED" : "FAILED";

    await db.mpesaTransactions.update(
      {
        status: newStatus,
        resultCode: ResultCode,
        resultDescription: ResultDesc,
        callbackReceivedAt: new Date(),
        rawCallback: JSON.stringify(callbackData),
      },
      { where: { checkoutRequestId: CheckoutRequestID } }
    );

    console.log(`Payment ${newStatus}: ${CheckoutRequestID} | Code: ${ResultCode} | Desc: ${ResultDesc}`);
  }
});

Two details in this handler are worth emphasising. First, the immediate 200 response before any database work. Safaricom's documented expectation is that your server responds within 30 seconds, but in practice, a slow response increases the likelihood of retries. Respond first, process second. Second, the idempotency check at the start. Before doing anything, confirm that you have not already processed this CheckoutRequestID. This single check prevents an enormous class of bugs.

Step 4: Implement the STK Query API as Your Safety Net

The STK Query API is Daraja's answer to the question: "I initiated an STK Push and never got a callback. What actually happened?" It lets you ask Safaricom directly for the current status of any transaction, identified by CheckoutRequestID.

The endpoint is POST https://sandbox.safaricom.co.ke/mpesa/stkpushquery/v1/query (swap for the production URL when going live). The request body requires the same BusinessShortCode, Password, and Timestamp fields as the STK Push request itself.

Add this to your utils/mpesa.js file:

javascript

/**
 * Query the status of an STK Push transaction.
 * Use this when a callback has not arrived within a reasonable window,
 * or when you need to verify a transaction's status before fulfilling an order.
 *
 * @param {string} checkoutRequestId - The CheckoutRequestID from the STK Push response
 * @param {string} accessToken - OAuth token from the auth middleware
 */
const querySTKStatus = async (checkoutRequestId, accessToken) => {
  const timestamp = getTimestamp();
  const password = generatePassword(timestamp);
  const shortcode = process.env.BUSINESS_SHORTCODE;

  const payload = {
    BusinessShortCode: shortcode,
    Password: password,
    Timestamp: timestamp,
    CheckoutRequestID: checkoutRequestId,
  };

  const response = await axios.post(
    "https://sandbox.safaricom.co.ke/mpesa/stkpushquery/v1/query",
    payload,
    {
      headers: {
        Authorization: `Bearer ${accessToken}`,
        "Content-Type": "application/json",
      },
    }
  );

  return response.data;
};

module.exports = { initiateSTKPush, formatPhoneNumber, querySTKStatus };

The query response looks like this for a completed transaction:

json

{
  "ResponseCode": "0",
  "ResponseDescription": "The service request has been accepted successfully",
  "MerchantRequestID": "29115-34620561-1",
  "CheckoutRequestID": "ws_CO_191220191020363925",
  "ResultCode": "0",
  "ResultDesc": "The service request is processed successfully."
}

And for a transaction that is still being processed (the customer has not responded yet):

json

{
  "errorCode": "500.001.1001",
  "errorMessage": "The transaction is being processed"
}

That 500.001.1001 error code is important. It is not a real error. It means the transaction is still in flight. When you hit this code, do not mark the transaction as failed. Wait and query again.

Here is a route you can call from your frontend or use in your background job:

javascript

// routes/query.js
app.post("/mpesa/query", generateToken, async (req, res) => {
  const { checkoutRequestId } = req.body;

  if (!checkoutRequestId) {
    return res.status(400).json({ error: "checkoutRequestId is required" });
  }

  const transaction = await db.mpesaTransactions.findOne({
    where: { checkoutRequestId },
  });

  if (!transaction) {
    return res.status(404).json({ error: "Transaction not found" });
  }

  // No need to query if we already have a definitive answer
  if (transaction.status === "COMPLETED" || transaction.status === "FAILED") {
    return res.status(200).json({
      status: transaction.status,
      mpesaReceiptNumber: transaction.mpesaReceiptNumber || null,
    });
  }

  try {
    const queryResult = await querySTKStatus(checkoutRequestId, req.access_token);

    await db.mpesaTransactions.update(
      { queriedAt: new Date(), rawQueryResponse: JSON.stringify(queryResult) },
      { where: { checkoutRequestId } }
    );

    // Transaction is still in flight. Tell the client to check again later.
    if (queryResult.errorCode === "500.001.1001") {
      return res.status(200).json({ status: "PENDING", message: "Transaction is being processed." });
    }

    // We have a definitive result from the query
    const resultCode = parseInt(queryResult.ResultCode, 10);

    if (resultCode === 0) {
      await db.mpesaTransactions.update(
        { status: "COMPLETED", resultCode, resultDescription: queryResult.ResultDesc, queriedAt: new Date() },
        { where: { checkoutRequestId } }
      );
      return res.status(200).json({ status: "COMPLETED" });
    } else {
      const newStatus = resultCode === 1037 ? "EXPIRED" : "FAILED";
      await db.mpesaTransactions.update(
        { status: newStatus, resultCode, resultDescription: queryResult.ResultDesc },
        { where: { checkoutRequestId } }
      );
      return res.status(200).json({ status: newStatus, reason: queryResult.ResultDesc });
    }

  } catch (error) {
    console.error("STK Query error:", error.response?.data || error.message);
    res.status(500).json({ error: "Query failed" });
  }
});

Step 5: Add a Frontend Polling Loop

Your frontend should not assume the callback will update the UI automatically. Implement a polling loop that calls your query endpoint at a reasonable interval until it gets a definitive answer. A 3-second interval for a maximum of 20 attempts (about a minute total) is a sensible starting point.

javascript

// Frontend: polling loop after STK Push initiation
const pollPaymentStatus = async (checkoutRequestId, onSuccess, onFailure) => {
  const MAX_ATTEMPTS = 20;
  const POLL_INTERVAL_MS = 3000;
  let attempts = 0;

  const poll = async () => {
    attempts++;

    if (attempts > MAX_ATTEMPTS) {
      onFailure("Payment timed out. If you were charged, please contact support with your M-Pesa message.");
      return;
    }

    try {
      const response = await fetch("/mpesa/query", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ checkoutRequestId }),
      });

      const data = await response.json();

      if (data.status === "COMPLETED") {
        onSuccess(data);
        return; // Stop polling
      }

      if (data.status === "FAILED" || data.status === "EXPIRED") {
        onFailure(data.reason || "Payment was not completed.");
        return; // Stop polling
      }

      // Status is PENDING or QUERIED — wait and try again
      setTimeout(poll, POLL_INTERVAL_MS);

    } catch (error) {
      // Network error on the polling request itself — try again
      setTimeout(poll, POLL_INTERVAL_MS);
    }
  };

  // Start polling after a short initial delay
  setTimeout(poll, 2000);
};

The error message for the client-side timeout deserves particular attention. If the polling loop times out, do not tell the customer their payment failed. You do not know that. Tell them you could not confirm the status, and give them a way to check. Many Kenyan users will have received an M-Pesa confirmation SMS even if your system has not yet caught up. Do not cause a double-payment by telling a customer who already paid that their payment failed.

Step 6: Build a Reconciliation Background Job

The final layer of defense is a background job that periodically sweeps for transactions that are stuck in the INITIATED or PENDING state past a reasonable deadline and queries their status proactively. This catches anything the real-time flow missed.

javascript

// jobs/reconcileTransactions.js
const reconcileStuckTransactions = async () => {
  const STUCK_THRESHOLD_MINUTES = 10;
  const cutoffTime = new Date(Date.now() - STUCK_THRESHOLD_MINUTES * 60 * 1000);

  // Find transactions that have been pending for more than 10 minutes
  const stuckTransactions = await db.mpesaTransactions.findAll({
    where: {
      status: ["INITIATED", "PENDING"],
      createdAt: { lessThan: cutoffTime },
    },
  });

  if (stuckTransactions.length === 0) return;

  console.log(`Reconciliation: Found ${stuckTransactions.length} stuck transactions.`);

  for (const transaction of stuckTransactions) {
    try {
      // Generate a fresh access token for the query
      const accessToken = await generateFreshToken();
      const queryResult = await querySTKStatus(transaction.checkoutRequestId, accessToken);

      if (queryResult.errorCode === "500.001.1001") {
        // Still processing — this should not happen after 10 minutes, but log it
        console.warn(`Transaction ${transaction.checkoutRequestId} still processing after 10 minutes.`);
        continue;
      }

      const resultCode = parseInt(queryResult.ResultCode, 10);

      if (resultCode === 0) {
        // Payment completed but we missed the callback — resolve it now
        await db.mpesaTransactions.update(
          { status: "COMPLETED", resultCode, resultDescription: queryResult.ResultDesc, queriedAt: new Date() },
          { where: { id: transaction.id } }
        );
        await fulfillOrder(transaction.orderId, null); // No receipt number available via query
        console.log(`Reconciliation resolved COMPLETED: ${transaction.checkoutRequestId}`);
      } else {
        const newStatus = resultCode === 1037 ? "EXPIRED" : "FAILED";
        await db.mpesaTransactions.update(
          { status: newStatus, resultCode, resultDescription: queryResult.ResultDesc },
          { where: { id: transaction.id } }
        );
        console.log(`Reconciliation resolved ${newStatus}: ${transaction.checkoutRequestId}`);
      }

    } catch (error) {
      console.error(`Reconciliation failed for ${transaction.checkoutRequestId}:`, error.message);
      // Do not throw — process remaining transactions even if one fails
    }
  }
};

// Run this job every 5 minutes using your preferred scheduler (node-cron, bull, etc.)
// cron.schedule("*/5 * * * *", reconcileStuckTransactions);
module.exports = { reconcileStuckTransactions };

One important note on the reconciliation job: when you resolve a COMPLETED transaction via query (rather than via callback), you will not have a MpesaReceiptNumber in the query response. The STK Query API confirms the transaction completed but does not return the M-Pesa receipt. You will need to use the Transaction Status API (a separate Daraja endpoint) if you need the receipt number for these edge-case transactions. For most use cases, confirming the payment completed and fulfilling the order is sufficient, and the customer's SMS confirmation from Safaricom serves as their receipt.

Putting It All Together: The Full Defense Strategy

To summarise the complete approach:

At initiation: Write the transaction to your database with status INITIATED immediately after receiving the CheckoutRequestID from Daraja. Never wait for the callback before creating a record.

Callback handling: Respond with HTTP 200 before processing. Always check idempotency before updating. Store the raw callback payload. Handle ResultCode 0 (success), 1032 (cancelled), 1037 (timeout), and all other non-zero codes explicitly.

Client-side: Poll the query endpoint every 3 seconds for up to a minute. Do not tell customers their payment failed unless you have a definitive failure ResultCode. For timeouts, instruct customers to check their M-Pesa messages before retrying.

Background job: Sweep for INITIATED or PENDING transactions older than 10 minutes and query their status. Run every 5 minutes. Log everything.

Database: Keep the raw_callback and raw_query_response fields. They are your lifeline when disputes arise.

A Note on Double Payments

The most common real-world fallout from missing the above safeguards is not failed orders, it is double payments. A customer pays, your system does not update, the customer tries again, and both payments go through. Now you have charged them twice.

The defence is straightforward: before initiating a new STK Push for an order, check if there is already a COMPLETED transaction for that order in your database. If there is, do not initiate again. If there is a PENDING transaction less than 2 minutes old, do not initiate again either as the first attempt may still be in flight. This single check prevents the vast majority of double-payment incidents.

Building this kind of resilience into your M-Pesa integration is not optional for production applications. In Kenya's network environment, where connectivity can be inconsistent and Safaricom's own systems occasionally have latency spikes, treating the callback as unreliable is the only way to design your payment flow. The callback is the primary path. The query API and reconciliation job are your fallbacks. Together, they make sure you never permanently lose a transaction.

Have questions about production edge cases or a specific failure mode you have encountered? Drop a comment below or reach us at [email protected]. We are working on the next part of this series, which will cover securing your callback endpoint against spoofed requests and implementing IP allowlisting.

Software Staff Writer,Sandra Safari serves a unique dual role at TechInKenya as both a Software Engineer and a Tech Journalist. Operating at the intersection of infrastructure engineering and media, s...see full bio

Weekly Tech Digest

Join the community getting the best Kenyan tech news delivered every Friday.

Comments

to join the discussion.