MySQL Select item with JOIN statement?

June 3, 2011 - 9:29 pm

What I’m trying to do is display the 5 most purchased items on my website. The items are stored in a table called items. However, that table does not store how many times each item has been bought. That is sort of stored in my purchased table.

The way the purchases table is set up is that there is the id of the purchase (transaction number basically), the id of the user who did the purchase, the name of the transaction (acts more as a description than name so for this I will call that field description), the price then the date. There is no column that keeps a running count of each item’s purchase. Each purchase is just a new entry into the table. For our purposes I believe we are only concerned with the descriptions column of the purchases table and the name column of the items table.

I could easily display the name/description of the purchase that occurs most often. However, I want to display the ITEM name. The item name is INCLUDED in the description so a REGEXP will be involved. I don’t know how exact to form the statement and how to JOIN them. This is what I’ve tried and they haven’t work:

SELECT name FROM items JOIN name ON (purchases.description REGEXP item.name) GROUP BY purchases.description ORDER BY count( * ) DESC LIMIT 5
^^I can tell that GROUP BY is not correct. Perhaps the JOIN name part is not correct either

SELECT item.name FROM items JOIN name ON (item.name = purchases.description WHERE purchases.description REGEXP item.name) GROUP BY purchases.description ORDER BY count( * ) DESC LIMIT 5

SELECT purchases.description FROM purchases WHERE purchases.description REGEXP item.name GROUP BY purchases.description ORDER BY count( * ) DESC LIMIT 5

SELECT description FROM purchases WHERE description REGEXP name FROM items GROUP BY description ORDER BY count( * ) DESC LIMIT 5
Yes my item table has a primary integer key. But no my purchases table does not have an item_id column. If I make that column, how do I make it foreign. The options I see are primary, unique, index, and full text?

It all makes sense to me and I wish that was how the table was set up but since it wasn’t, am I able to add that stuff?

Any time you find yourself needing a REGEXP to do a join, take a step back and stop writing select statements and look at your database design.

What you are trying to do should be a very, very simple join statement. But it’s not, because your database design has some weaknesses. Fix that, then everything else will be easier.

Does your item table have a primary key? It needs one. And item name isn’t a good candidate. (Names change. Primary keys never should). The item table should have an integer primary key, preferably auto-increment if your DB supports that. (And this looks like MySQL, so it does).

Your purchases table should then have an item_id column which is a foreign key that relates back to the item’s id column.

Now you can do a clean join between the purchases table and the item name table. This is off the top of my head, may not be 100% right, but it should look about this simple:

SELECT items.name
FROM purchases
JOIN items ON items.id = purchases.item_id
GROUP BY purchases.item_id
ORDER BY count(*) DESC
LIMIT 5

Not only is that fairly simple to write, but it should also be blazingly fast. items.id and purchases.item_id can both be indexed. A REGEXP on purchases.description will never use an index. It will have to scan the entire table, and run the regex on each row. Just a wild arsed guess on my part, but I would think for a few million purchases (Always plan for success!!!), my query should take less than 50 milliseconds. Anything involving a regex will measure in the many seconds range.

One Response to “MySQL Select item with JOIN statement?”

  1. Ratchetr Says:

    Any time you find yourself needing a REGEXP to do a join, take a step back and stop writing select statements and look at your database design.

    What you are trying to do should be a very, very simple join statement. But it’s not, because your database design has some weaknesses. Fix that, then everything else will be easier.

    Does your item table have a primary key? It needs one. And item name isn’t a good candidate. (Names change. Primary keys never should). The item table should have an integer primary key, preferably auto-increment if your DB supports that. (And this looks like MySQL, so it does).

    Your purchases table should then have an item_id column which is a foreign key that relates back to the item’s id column.

    Now you can do a clean join between the purchases table and the item name table. This is off the top of my head, may not be 100% right, but it should look about this simple:

    SELECT items.name
    FROM purchases
    JOIN items ON items.id = purchases.item_id
    GROUP BY purchases.item_id
    ORDER BY count(*) DESC
    LIMIT 5

    Not only is that fairly simple to write, but it should also be blazingly fast. items.id and purchases.item_id can both be indexed. A REGEXP on purchases.description will never use an index. It will have to scan the entire table, and run the regex on each row. Just a wild arsed guess on my part, but I would think for a few million purchases (Always plan for success!!!), my query should take less than 50 milliseconds. Anything involving a regex will measure in the many seconds range.
    References :

Leave a Reply