Automation that doesn't break: principles and patterns

Automation is supposed to save time. But poorly designed automation often creates more work than it eliminates—just different work, happening at less predictable times.

After building and maintaining automations for BigCommerce stores for years, we’ve learned what separates the ones that run quietly in the background from the ones that wake you up at 3am. The difference isn’t complexity or sophistication. It’s discipline in design.

Why Automations Fail

Most automation failures aren’t dramatic. They’re subtle. An order syncs to the ERP but with the wrong shipping method. Inventory updates run, but a timing issue causes oversells during flash sales. A customer notification goes out twice because a webhook fired twice.

These failures share common causes:

Assumptions that don’t hold. The automation was built assuming data would always look a certain way, or that external systems would always respond promptly, or that certain conditions would never occur together. Then reality happened.

Missing error handling. When something goes wrong, the automation either fails silently (leaving you to discover problems later) or fails loudly in ways that cascade into other systems.

No visibility. Nobody knows what the automation is doing until something breaks. There’s no logging, no monitoring, no way to trace what happened after the fact.

Tight coupling. The automation depends on specific implementation details of other systems. When those systems change—and they always change—the automation breaks.

Understanding these causes points toward the principles that prevent them.

Principles for Reliable Automation

Idempotency

Running the same automation twice should produce the same result as running it once. This seems obvious, but it’s the principle most often violated.

Consider an automation that sends order confirmation emails. If the webhook fires twice (which happens more often than you’d think), will the customer receive two emails? If an order sync fails halfway through and retries, will you end up with duplicate line items?

Design automations so that re-running them is safe. Use unique identifiers to detect duplicates. Check whether work has already been done before doing it again. Make your operations idempotent by default, and explicitly handle cases where they can’t be.

In BigCommerce specifically, webhooks can fire multiple times for the same event. Your receiving code needs to handle this gracefully, typically by tracking which events you’ve already processed.

Observability

You need to know what your automations are doing. Logging, monitoring, and alerting aren’t optional—they’re the only way to catch problems before your customers do.

Every automation should log what it’s doing in enough detail to reconstruct what happened. Not just errors—successful operations too. When an order doesn’t appear in your ERP, you need to know whether the automation never ran, ran and failed, or ran and succeeded but something else went wrong.

Set up alerts for failure conditions, but also for anomalies. If your inventory sync usually processes 500 SKUs and today it processed 50, something might be wrong even if no errors occurred.

For BigCommerce integrations, this often means logging webhook receipts, API calls and responses, transformation steps, and final outcomes. When something goes wrong, you should be able to trace exactly what happened.

Graceful Degradation

What happens when a dependency fails? Good automation handles this cleanly. Bad automation cascades the failure into worse problems.

If your ERP is down, what happens to orders? If an API rate limit is hit, what happens to the queue of pending operations? If a third-party service times out, does the entire process fail?

Design for failure. Implement retries with exponential backoff for transient errors. Queue operations that can’t complete immediately. Have fallback paths for critical functions. And make sure humans get notified when automated recovery isn’t possible.

The goal isn’t to prevent all failures—that’s impossible. The goal is to contain failures and recover gracefully.

Loose Coupling

Automations should depend on interfaces, not implementations. When you integrate with an external system, depend on the documented API contract, not on undocumented behaviors or specific response formats that might change.

This applies within BigCommerce too. If you’re building automation that responds to order status changes, don’t depend on statuses changing in a specific sequence. Depend on the status you care about being reached, regardless of how it got there.

Loose coupling makes automations more resilient to change. When BigCommerce updates their API, or your ERP vendor releases a new version, loosely coupled automations are more likely to keep working.

Common Patterns

Event-Driven Over Scheduled

Respond to what actually happens rather than polling on a schedule. It’s more efficient and usually more correct.

BigCommerce webhooks let you react to events as they occur—orders placed, inventory changed, customers created. This is almost always better than scheduled jobs that periodically check for changes.

Scheduled jobs have their place for operations that don’t have triggering events, or for batch processes that are more efficient when aggregated. But for most real-time operations, event-driven architecture is superior.

The caveat: event-driven systems need good handling for missed events. Webhooks can fail to deliver. Your receiving endpoint can be down. Build in mechanisms to detect and recover from missed events—periodic reconciliation jobs that catch anything the event-driven flow missed.

Small, Focused Automations

One automation should do one thing. Composing simple pieces is more maintainable than building complex monoliths.

Instead of one massive automation that handles everything about an order—syncing to the ERP, updating inventory, sending notifications, triggering fulfillment—build separate automations for each concern. They can be triggered by the same event, but they operate independently.

This approach makes debugging easier (you can identify which piece failed), makes changes safer (you can modify one piece without risking others), and makes the system more resilient (one failure doesn’t cascade to everything).

Human-in-the-Loop for Edge Cases

Not everything should be automated. Some decisions genuinely need human judgment. Build that into your design rather than trying to automate edge cases that don’t have clear rules.

For BigCommerce operations, this might mean flagging orders that meet certain criteria for manual review rather than auto-processing them, routing customer service requests that can’t be automatically resolved, or holding inventory updates that would result in negative stock.

The goal of automation isn’t to eliminate human involvement—it’s to eliminate routine human involvement so people can focus on the cases that actually need attention.

Reconciliation as a Safety Net

Even the best event-driven systems miss things. Build reconciliation processes that periodically verify consistency between systems and fix discrepancies.

A nightly job that compares orders in BigCommerce against orders in your ERP, flagging any mismatches. A weekly inventory reconciliation that catches drift between systems. A monthly customer data sync that ensures nothing has fallen out of alignment.

These reconciliation processes are your safety net. They catch the edge cases that your primary automation missed and prevent small discrepancies from compounding into large problems.

BigCommerce-Specific Considerations

Working With Webhooks

BigCommerce webhooks are the foundation of most automations. Some practical considerations:

Webhook reliability. BigCommerce will retry failed webhook deliveries, but retries aren’t unlimited. Your receiving endpoint should respond quickly (ideally queueing work for async processing rather than doing everything synchronously) and should handle duplicate deliveries gracefully.

Webhook scope. You can subscribe to webhooks at various scopes—store-wide, or scoped to specific resources. Be intentional about what you subscribe to. More webhooks means more processing and more opportunity for issues.

Webhook payload limitations. Webhooks often contain limited data. You’ll frequently need to make follow-up API calls to get complete information. Design for this—don’t assume the webhook payload has everything you need.

API Rate Limits

BigCommerce has API rate limits that vary by plan. Automations that make many API calls need to respect these limits.

Implement rate limit handling that backs off appropriately when limits are approached. Queue operations during high-volume periods rather than failing. And architect your integrations to minimize unnecessary API calls—batch operations where possible, cache data that doesn’t change frequently.

Script Injection Points

For frontend automations, BigCommerce provides script injection points that let you add JavaScript to your storefront without modifying theme files. Use these for tracking, personalization, and other client-side automations.

Be mindful of performance impact. Every script adds load time. And be aware that scripts can be affected by browser extensions, ad blockers, and other client-side factors.

When to Build vs. Buy

Not every automation needs to be custom. Apps and integration platforms can handle many common use cases.

Build custom when:

Your requirements are genuinely unique to your business
You need deep control over behavior and error handling
Off-the-shelf solutions don’t fit your workflow
The automation is core to your competitive advantage

Buy or use platforms when:

The use case is common (email marketing, reviews, basic ERP sync)
Maintenance burden of custom code outweighs benefits
You need to move fast and requirements aren’t unusual
Someone else has already solved the hard problems

The honest assessment is that most stores over-build. Common integrations don’t need custom code. Standard workflows don’t need bespoke automation. Save your engineering effort for the things that actually differentiate your business.

The Maintenance Reality

Automations require ongoing maintenance. External APIs change. Business requirements evolve. Edge cases emerge that weren’t anticipated.

Budget for this maintenance. An automation isn’t done when it’s deployed—it’s done when it’s been running reliably for long enough that you trust it, and even then it needs periodic attention.

Documentation matters. The person maintaining the automation six months from now might not be the person who built it. Make sure they can understand what it does, why it exists, and how to troubleshoot it.

Reliable automation isn’t about being clever. It’s about being disciplined—following principles that make systems predictable, observable, and recoverable. The boring automations are usually the good ones.