Keeping Company Data Up to Date with Webhooks and Polling
How to keep company data current with polling, cache invalidation, and periodic revalidation. Practical patterns for your application.
Keeping Company Data Up to Date with Webhooks and Polling
You have fetched company data from the KVK and stored it in your database. But how do you keep that data current? Companies relocate, change names, or get deregistered. Without a strategy for catching changes, your application will be working with stale data within a few months.
In this article we discuss the two main strategies — polling and webhooks — and provide practical implementation patterns for keeping company data up to date.
The problem: data goes stale
Company data is not static. In any given year, the trade register sees:
- Thousands of companies change address
- Companies change their trade name
- New companies are registered
- Companies are deregistered or dissolved
- Legal forms change (sole proprietorship becomes a BV)
If you have 10,000 company records in your database, hundreds of them are guaranteed to be inaccurate after six months.
Strategy 1: Polling
Polling is the simplest approach: you periodically check whether the data for your stored companies is still current.
Basic implementation
const REFRESH_INTERVAL_DAYS = 7; // Revalidate every 7 days
async function refreshStaleCompanies() {
const cutoff = new Date();
cutoff.setDate(cutoff.getDate() - REFRESH_INTERVAL_DAYS);
// Fetch companies that haven't been checked in over 7 days
const staleCompanies = await db.companies.findWhere(
'last_verified_at < ? OR last_verified_at IS NULL',
cutoff
);
console.log(`${staleCompanies.length} companies need revalidation`);
for (const company of staleCompanies) {
try {
const fresh = await kvkbase.lookup(company.kvkNumber);
const changes = detectChanges(company, fresh);
if (changes.length > 0) {
await db.companies.update(company.id, {
tradeName: fresh.tradeName,
address: fresh.address,
isActive: fresh.isActive,
lastVerifiedAt: new Date()
});
await logChanges(company.kvkNumber, changes);
} else {
// No changes, just update the verification date
await db.companies.update(company.id, {
lastVerifiedAt: new Date()
});
}
} catch (error) {
console.error(`Revalidation failed for ${company.kvkNumber}:`, error);
}
// Pause to respect rate limits
await sleep(200);
}
}
function detectChanges(old, fresh) {
const changes = [];
if (old.tradeName !== fresh.tradeName) {
changes.push({ field: 'tradeName', old: old.tradeName, new: fresh.tradeName });
}
if (old.isActive !== fresh.isActive) {
changes.push({ field: 'isActive', old: old.isActive, new: fresh.isActive });
}
// Compare address fields...
return changes;
}
Smart scheduling
Not all companies need to be checked at the same frequency. Prioritize based on activity:
function getRefreshPriority(company) {
// Active customers: every 3 days
if (company.hasRecentOrders) return 3;
// Companies with open invoices: daily
if (company.hasOpenInvoices) return 1;
// Inactive customers: every 30 days
if (!company.hasRecentActivity) return 30;
// Default: every 7 days
return 7;
}
async function refreshByPriority() {
const companies = await db.companies.findAll();
for (const company of companies) {
const priority = getRefreshPriority(company);
const daysSinceRefresh = daysBetween(company.lastVerifiedAt, new Date());
if (daysSinceRefresh >= priority) {
await refreshCompany(company);
}
}
}
Setting up a cron job
Schedule the polling as a cron job that runs daily:
# crontab -e
# Run revalidation every night at 03:00
0 3 * * * node /app/scripts/refresh-companies.js
Or with a task scheduler in Node.js:
import cron from 'node-cron';
cron.schedule('0 3 * * *', async () => {
console.log('Starting company data revalidation...');
await refreshStaleCompanies();
console.log('Revalidation complete');
});
Strategy 2: Webhooks
Webhooks are the reverse of polling: instead of you periodically checking, the data source sends a notification to your application when something changes.
Setting up a webhook endpoint
// Express endpoint for webhook notifications
app.post('/webhooks/company-updates', async (req, res) => {
// Verify the webhook signature
const signature = req.headers['x-webhook-signature'];
if (!verifySignature(req.body, signature)) {
return res.status(401).json({ error: 'Invalid signature' });
}
const { kvkNumber, eventType, data } = req.body;
switch (eventType) {
case 'company.updated':
await handleCompanyUpdate(kvkNumber, data);
break;
case 'company.deregistered':
await handleCompanyDeregistered(kvkNumber);
break;
case 'company.address_changed':
await handleAddressChange(kvkNumber, data);
break;
default:
console.log(`Unknown event type: ${eventType}`);
}
// Always return 200 to acknowledge receipt
res.status(200).json({ received: true });
});
Webhook security
Webhooks must always be verified. Never blindly trust incoming requests:
const crypto = require('crypto');
function verifySignature(payload, signature) {
const secret = process.env.WEBHOOK_SECRET;
const expected = crypto
.createHmac('sha256', secret)
.update(JSON.stringify(payload))
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(expected)
);
}
Idempotency
Webhooks can be delivered more than once. Make sure your handler is idempotent:
async function handleCompanyUpdate(kvkNumber, data) {
const eventId = data.eventId;
// Check if we have already processed this event
const processed = await db.webhookEvents.findByEventId(eventId);
if (processed) {
console.log(`Event ${eventId} already processed, skipping`);
return;
}
// Process the update
await db.companies.updateByKvk(kvkNumber, {
tradeName: data.tradeName,
address: data.address,
isActive: data.isActive,
lastVerifiedAt: new Date()
});
// Mark event as processed
await db.webhookEvents.create({
eventId,
kvkNumber,
processedAt: new Date()
});
}
Cache invalidation
Apart from polling and webhooks, you need a cache invalidation strategy for real-time lookups.
TTL-based invalidation
The simplest approach: set a Time-To-Live on your cache entries:
const CACHE_TTL = {
active: 24 * 60 * 60, // 24 hours for active companies
inactive: 7 * 24 * 60 * 60, // 7 days for inactive companies
notFound: 60 * 60, // 1 hour for not-found numbers
search: 60 * 60 // 1 hour for search results
};
async function lookupWithSmartCache(kvkNumber) {
const cached = await cache.get(`company:${kvkNumber}`);
if (cached) {
return cached;
}
const fresh = await kvkbase.lookup(kvkNumber);
const ttl = fresh
? (fresh.isActive ? CACHE_TTL.active : CACHE_TTL.inactive)
: CACHE_TTL.notFound;
await cache.set(`company:${kvkNumber}`, fresh, ttl);
return fresh;
}
Stale-While-Revalidate
A more advanced pattern: return the cached data immediately, but refresh in the background:
async function lookupStaleWhileRevalidate(kvkNumber) {
const cached = await cache.get(`company:${kvkNumber}`);
if (cached) {
// Refresh in the background if data is older than 12 hours
const age = Date.now() - cached.cachedAt;
if (age > 12 * 60 * 60 * 1000) {
refreshInBackground(kvkNumber); // fire-and-forget
}
return cached.data;
}
// No cache: wait for fresh data
const fresh = await kvkbase.lookup(kvkNumber);
await cache.set(`company:${kvkNumber}`, {
data: fresh,
cachedAt: Date.now()
});
return fresh;
}
function refreshInBackground(kvkNumber) {
kvkbase.lookup(kvkNumber)
.then(fresh => cache.set(`company:${kvkNumber}`, {
data: fresh,
cachedAt: Date.now()
}))
.catch(err => console.error(`Background refresh failed: ${err.message}`));
}
Monitoring
Whichever strategy you choose, set up monitoring to know whether your data is current:
async function reportDataFreshness() {
const stats = await db.companies.aggregate([
{
group: 'freshness',
buckets: [
{ label: 'today', where: 'last_verified_at >= NOW() - INTERVAL 1 DAY' },
{ label: 'this_week', where: 'last_verified_at >= NOW() - INTERVAL 7 DAY' },
{ label: 'this_month', where: 'last_verified_at >= NOW() - INTERVAL 30 DAY' },
{ label: 'older', where: 'last_verified_at < NOW() - INTERVAL 30 DAY' }
]
}
]);
console.log('Data freshness report:', stats);
}
Which strategy should you choose?
| Factor | Polling | Webhooks |
|---|---|---|
| Complexity | Low | Medium |
| Real-time | No (delayed) | Yes |
| Cost (API calls) | Higher | Lower |
| Reliability | High (you are in control) | Depends on provider |
| Best for | Small datasets, nightly sync | Large datasets, real-time requirements |
In practice, a combination works best: polling as the baseline, with webhooks as an accelerator when available.
Conclusion
Keeping company data up to date is just as important as fetching it in the first place. Without an update strategy, your application will be working with stale information within months.
Start with a simple polling strategy: a nightly cron job that revalidates companies based on priority. Add webhooks when your scaling needs grow. And always implement smart caching with TTL-based invalidation for your day-to-day lookups.
With KVKBase as your data source, you can easily implement both strategies — the API is fast enough for polling and supports the patterns you need for reliable company data.