Keeping Company Data Up to Date with Webhooks and Polling
KVKBase Team

Keeping Company Data Up to Date with Webhooks and Polling

How to keep company data current with polling, cache invalidation, and periodic revalidation. Practical patterns for your application.

webhooksapireal-time

Keeping Company Data Up to Date with Webhooks and Polling

You have fetched company data from the KVK and stored it in your database. But how do you keep that data current? Companies relocate, change names, or get deregistered. Without a strategy for catching changes, your application will be working with stale data within a few months.

In this article we discuss the two main strategies — polling and webhooks — and provide practical implementation patterns for keeping company data up to date.

The problem: data goes stale

Company data is not static. In any given year, the trade register sees:

  • Thousands of companies change address
  • Companies change their trade name
  • New companies are registered
  • Companies are deregistered or dissolved
  • Legal forms change (sole proprietorship becomes a BV)

If you have 10,000 company records in your database, hundreds of them are guaranteed to be inaccurate after six months.

Strategy 1: Polling

Polling is the simplest approach: you periodically check whether the data for your stored companies is still current.

Basic implementation

const REFRESH_INTERVAL_DAYS = 7; // Revalidate every 7 days

async function refreshStaleCompanies() {
  const cutoff = new Date();
  cutoff.setDate(cutoff.getDate() - REFRESH_INTERVAL_DAYS);

  // Fetch companies that haven't been checked in over 7 days
  const staleCompanies = await db.companies.findWhere(
    'last_verified_at < ? OR last_verified_at IS NULL',
    cutoff
  );

  console.log(`${staleCompanies.length} companies need revalidation`);

  for (const company of staleCompanies) {
    try {
      const fresh = await kvkbase.lookup(company.kvkNumber);
      const changes = detectChanges(company, fresh);

      if (changes.length > 0) {
        await db.companies.update(company.id, {
          tradeName: fresh.tradeName,
          address: fresh.address,
          isActive: fresh.isActive,
          lastVerifiedAt: new Date()
        });

        await logChanges(company.kvkNumber, changes);
      } else {
        // No changes, just update the verification date
        await db.companies.update(company.id, {
          lastVerifiedAt: new Date()
        });
      }
    } catch (error) {
      console.error(`Revalidation failed for ${company.kvkNumber}:`, error);
    }

    // Pause to respect rate limits
    await sleep(200);
  }
}

function detectChanges(old, fresh) {
  const changes = [];

  if (old.tradeName !== fresh.tradeName) {
    changes.push({ field: 'tradeName', old: old.tradeName, new: fresh.tradeName });
  }
  if (old.isActive !== fresh.isActive) {
    changes.push({ field: 'isActive', old: old.isActive, new: fresh.isActive });
  }
  // Compare address fields...

  return changes;
}

Smart scheduling

Not all companies need to be checked at the same frequency. Prioritize based on activity:

function getRefreshPriority(company) {
  // Active customers: every 3 days
  if (company.hasRecentOrders) return 3;

  // Companies with open invoices: daily
  if (company.hasOpenInvoices) return 1;

  // Inactive customers: every 30 days
  if (!company.hasRecentActivity) return 30;

  // Default: every 7 days
  return 7;
}

async function refreshByPriority() {
  const companies = await db.companies.findAll();

  for (const company of companies) {
    const priority = getRefreshPriority(company);
    const daysSinceRefresh = daysBetween(company.lastVerifiedAt, new Date());

    if (daysSinceRefresh >= priority) {
      await refreshCompany(company);
    }
  }
}

Setting up a cron job

Schedule the polling as a cron job that runs daily:

# crontab -e
# Run revalidation every night at 03:00
0 3 * * * node /app/scripts/refresh-companies.js

Or with a task scheduler in Node.js:

import cron from 'node-cron';

cron.schedule('0 3 * * *', async () => {
  console.log('Starting company data revalidation...');
  await refreshStaleCompanies();
  console.log('Revalidation complete');
});

Strategy 2: Webhooks

Webhooks are the reverse of polling: instead of you periodically checking, the data source sends a notification to your application when something changes.

Setting up a webhook endpoint

// Express endpoint for webhook notifications
app.post('/webhooks/company-updates', async (req, res) => {
  // Verify the webhook signature
  const signature = req.headers['x-webhook-signature'];
  if (!verifySignature(req.body, signature)) {
    return res.status(401).json({ error: 'Invalid signature' });
  }

  const { kvkNumber, eventType, data } = req.body;

  switch (eventType) {
    case 'company.updated':
      await handleCompanyUpdate(kvkNumber, data);
      break;
    case 'company.deregistered':
      await handleCompanyDeregistered(kvkNumber);
      break;
    case 'company.address_changed':
      await handleAddressChange(kvkNumber, data);
      break;
    default:
      console.log(`Unknown event type: ${eventType}`);
  }

  // Always return 200 to acknowledge receipt
  res.status(200).json({ received: true });
});

Webhook security

Webhooks must always be verified. Never blindly trust incoming requests:

const crypto = require('crypto');

function verifySignature(payload, signature) {
  const secret = process.env.WEBHOOK_SECRET;
  const expected = crypto
    .createHmac('sha256', secret)
    .update(JSON.stringify(payload))
    .digest('hex');

  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expected)
  );
}

Idempotency

Webhooks can be delivered more than once. Make sure your handler is idempotent:

async function handleCompanyUpdate(kvkNumber, data) {
  const eventId = data.eventId;

  // Check if we have already processed this event
  const processed = await db.webhookEvents.findByEventId(eventId);
  if (processed) {
    console.log(`Event ${eventId} already processed, skipping`);
    return;
  }

  // Process the update
  await db.companies.updateByKvk(kvkNumber, {
    tradeName: data.tradeName,
    address: data.address,
    isActive: data.isActive,
    lastVerifiedAt: new Date()
  });

  // Mark event as processed
  await db.webhookEvents.create({
    eventId,
    kvkNumber,
    processedAt: new Date()
  });
}

Cache invalidation

Apart from polling and webhooks, you need a cache invalidation strategy for real-time lookups.

TTL-based invalidation

The simplest approach: set a Time-To-Live on your cache entries:

const CACHE_TTL = {
  active: 24 * 60 * 60,        // 24 hours for active companies
  inactive: 7 * 24 * 60 * 60,  // 7 days for inactive companies
  notFound: 60 * 60,            // 1 hour for not-found numbers
  search: 60 * 60               // 1 hour for search results
};

async function lookupWithSmartCache(kvkNumber) {
  const cached = await cache.get(`company:${kvkNumber}`);

  if (cached) {
    return cached;
  }

  const fresh = await kvkbase.lookup(kvkNumber);
  const ttl = fresh
    ? (fresh.isActive ? CACHE_TTL.active : CACHE_TTL.inactive)
    : CACHE_TTL.notFound;

  await cache.set(`company:${kvkNumber}`, fresh, ttl);
  return fresh;
}

Stale-While-Revalidate

A more advanced pattern: return the cached data immediately, but refresh in the background:

async function lookupStaleWhileRevalidate(kvkNumber) {
  const cached = await cache.get(`company:${kvkNumber}`);

  if (cached) {
    // Refresh in the background if data is older than 12 hours
    const age = Date.now() - cached.cachedAt;
    if (age > 12 * 60 * 60 * 1000) {
      refreshInBackground(kvkNumber); // fire-and-forget
    }

    return cached.data;
  }

  // No cache: wait for fresh data
  const fresh = await kvkbase.lookup(kvkNumber);
  await cache.set(`company:${kvkNumber}`, {
    data: fresh,
    cachedAt: Date.now()
  });

  return fresh;
}

function refreshInBackground(kvkNumber) {
  kvkbase.lookup(kvkNumber)
    .then(fresh => cache.set(`company:${kvkNumber}`, {
      data: fresh,
      cachedAt: Date.now()
    }))
    .catch(err => console.error(`Background refresh failed: ${err.message}`));
}

Monitoring

Whichever strategy you choose, set up monitoring to know whether your data is current:

async function reportDataFreshness() {
  const stats = await db.companies.aggregate([
    {
      group: 'freshness',
      buckets: [
        { label: 'today', where: 'last_verified_at >= NOW() - INTERVAL 1 DAY' },
        { label: 'this_week', where: 'last_verified_at >= NOW() - INTERVAL 7 DAY' },
        { label: 'this_month', where: 'last_verified_at >= NOW() - INTERVAL 30 DAY' },
        { label: 'older', where: 'last_verified_at < NOW() - INTERVAL 30 DAY' }
      ]
    }
  ]);

  console.log('Data freshness report:', stats);
}

Which strategy should you choose?

FactorPollingWebhooks
ComplexityLowMedium
Real-timeNo (delayed)Yes
Cost (API calls)HigherLower
ReliabilityHigh (you are in control)Depends on provider
Best forSmall datasets, nightly syncLarge datasets, real-time requirements

In practice, a combination works best: polling as the baseline, with webhooks as an accelerator when available.

Conclusion

Keeping company data up to date is just as important as fetching it in the first place. Without an update strategy, your application will be working with stale information within months.

Start with a simple polling strategy: a nightly cron job that revalidates companies based on priority. Add webhooks when your scaling needs grow. And always implement smart caching with TTL-based invalidation for your day-to-day lookups.

With KVKBase as your data source, you can easily implement both strategies — the API is fast enough for polling and supports the patterns you need for reliable company data.