Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
969 views
in Technique[技术] by (71.8m points)

mysql - Fastest random selection WHERE column X is Y (NULL)

Currently I am using:

SELECT * 
FROM 
  table AS t1
  JOIN (
    SELECT (RAND() * (SELECT MAX(id) FROM table where column_x is null)) AS id
  ) AS t2 
WHERE 
  t1.id >= t2.id
  and column_x is null
ORDER BY t1.id ASC
LIMIT 1

This is normally extremely fast however when I include the highlighted column_x being Y (null) condition, it gets slow.

What would be the fastest random querying solution where the records' column X is null?

ID is PK, column X is int(4). Table contains about a million records and over 1 GB in total size doubling itself every 24 hours currently.

column_x is indexed.

Column ID may not be consecutive.

The DB engine used in this case is InnoDB.

Thank you.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Getting a genuinely random record can be slow. There's not really much getting around this fact; if you want it to be truly random, then the query has to load all the relevant data in order to know which records it has to choose from.

Fortunately however, there are quicker ways of doing it. They're not properly random, but if you're happy to trade a bit of pure randomness for speed, then they should be good enough for most purposes.

With that in mind, the fastest way to get a "random" record is to add an extra column to your DB, which is populated with a random value. Perhaps a salted MD5 hash of the primary key? Whatever. Add appropriate indexes on this column, and then simply add the column to your ORDER BY clause in the query, and you'll get your records back in a random order.

To get a single random record, simply specify LIMIT 1 and add a WHERE random_field > $random_value where random value would be a value in the range of your new field (say an MD5 hash of a random number, for example).

Of course the down side here is that although your records will be in a random order, they'll be stuck in the same random order. I did say it was trading perfection for query speed. You can get around this by updating them periodically with fresh values, but I guess that could be a problem for you if you need to keep it fresh.

The other down-side is that adding an extra column might be too much to ask if you have storage constraints and your DB is already massive in size, or if you have a strict DBA to get past before you can add columns. But again, you have to trade off something; if you want the query speed, you need this extra column.

Anyway, I hope that helped.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...