> ## Documentation Index
> Fetch the complete documentation index at: https://vpn-docs.wxapros.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Bulk Exports

> Daily-refreshed full-dataset feeds: CSV, MMDB, Parquet.

For high-volume use cases (>1 M lookups/day) or warehouse loads, pull the
full dataset on a schedule instead of hammering the API.

## Format choice

| Format      | Best for                                                         | Tier required |
| ----------- | ---------------------------------------------------------------- | ------------- |
| **CSV**     | Spreadsheets, ad-hoc analysis, simple ETL                        | Pro           |
| **MMDB**    | Sub-ms local lookups in production code (Python, Go, Node, Java) | Business      |
| **Parquet** | Snowflake / BigQuery / Athena / Spark loads                      | Business      |

MMDB is what most production fraud pipelines use — load once into memory,
look up at the speed of a hash table, refresh hourly.

## Scheduling

Refresh once an hour at most. The dataset rolls over with new
observations roughly every 60 minutes. Pulling more often wastes bandwidth.

```bash theme={null}
# crontab — hourly MMDB refresh
0 * * * * curl -fsS \
  -H "X-API-Key: $WXA_API_KEY" \
  "https://wxaintel.wxapros.com/api/v1/vpn/export/mmdb" \
  -o /var/lib/wxa-vpn/wxa-vpn.mmdb.tmp \
  && mv /var/lib/wxa-vpn/wxa-vpn.mmdb.tmp /var/lib/wxa-vpn/wxa-vpn.mmdb
```

Atomic rename pattern (`tmp` → final) so readers never see a partial file.

## MMDB lookup (Python)

```python theme={null}
import maxminddb
reader = maxminddb.open_database("/var/lib/wxa-vpn/wxa-vpn.mmdb")
data = reader.get("1.1.1.1")
# {"classification": "cdn", "provider": "Cloudflare", ...}
```

Supports IPv4 and IPv6. Empty `dict` for unknown IPs.

## Parquet to Snowflake

```sql theme={null}
COPY INTO wxa_vpn_intel
FROM @my_s3_stage/wxa-vpn.parquet
FILE_FORMAT = (TYPE = PARQUET);
```

We don't host the file in S3 — you download it via the API, drop it in
your stage, then COPY. Schema is documented at
[docs/data-dictionary.md](https://github.com/whois-api-llc/wxa_vpn/blob/main/docs/data-dictionary.md).

## Concurrency

Bulk downloads count against a separate `bulk concurrent` limit (1 for
Starter, 2 for Pro, 5 for Business, negotiated for Enterprise). Don't
parallelize a single export — start one, wait for completion, start
the next.

## File size

| Format  | Approx size (full dataset)                     |
| ------- | ---------------------------------------------- |
| CSV     | \~3 GB compressed (gzip), \~15 GB uncompressed |
| MMDB    | \~600 MB                                       |
| Parquet | \~1.2 GB                                       |

Plan storage and bandwidth accordingly. Compressed transfers are negotiated
via `Accept-Encoding: gzip`.
