I'm about to build an application that stores up to 500 million records of domain names. I'll index the '.net' or '.com' part and strip the 'www' at the beginning. So I believe the table would look like this:
domain_id | domain_name | domain_ext
----------+--------------+-----------
1 | dropbox | 2
2 | digitalocean | 2
domain_ext = 2 means it's a '.com' domain.
The queries I'm about to perform::
- I need to be able to insert new domains easily.
- I also need to make sure I'm not inserting a duplication (each domain should have only 1 record), so I think to make
domain_name + domain_extas UNIQUE index (with MySQL - InnoDB). - Query domains in batches. For example:
SELECT * FROM tbl_domains LIMIT 300000, 600;
What do you think? will that table hold hundreds of millions of records? How about partitioning by first letter of the domain name, would that be good? Let me know your suggestions, I'm open minded.
Aucun commentaire:
Enregistrer un commentaire