Page MenuHomePhorge

MySQL edge table error log: 'INSERT... ON DUPLICATE KEY UPDATE on a table with more than one UNIQUE KEY is unsafe'
Open, Needs TriagePublic

Description

I see many of the following errors in my mysql error log:

[Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. INSERT... ON DUPLICATE KEY UPDATE  on a table with more than one UNIQUE KEY is unsafe Statement: INSERT INTO `edge` (src, type, dst, dateCreated, seq, dataID)
            VALUES ('PHID-MTAM-<TheId>, 57, 'PHID-USER-<TheId>', 1710754728, 0, NULL) ON DUPLICATE KEY UPDATE dataID = VALUES(dataID)

Looking in resources/sql/quickstart.sql we see many edge tables defined as:

CREATE TABLE `edge` (
  `src` varbinary(64) NOT NULL,
  `type` int(10) unsigned NOT NULL,
  `dst` varbinary(64) NOT NULL,
  `dateCreated` int(10) unsigned NOT NULL,
  `seq` int(10) unsigned NOT NULL,
  `dataID` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`src`,`type`,`dst`),
  UNIQUE KEY `key_dst` (`dst`,`type`,`src`),
  KEY `src` (`src`,`type`,`dateCreated`,`seq`)
) ENGINE=InnoDB DEFAULT CHARSET={$CHARSET} COLLATE={$COLLATE_TEXT};

I suggest that the (dst,type,src) key will automatically be unique by virtue of the primary key being (src,type,dst), and it would not hurt to remove the UNIQUE constraint.

Event Timeline

Maybe,

But in any case, it may have sense to keep a generic index on (dst, type, src) (or maybe just dst, type) in that specific order, since the UNIQUE KEY was probably a "micro-optimization" to achieve such index for queries operating on dst, or also dst and type, but all without src. This is just an opinion from an alien. Better to wait for more answers.

@valerio.bozzolan agreed - I'm not proposing to remove the (dst, type, src) index, just the uniqueness constraint on that index. Sorry for not making that clear.

Are the keys actually different? Looks like they have the same fields in different order. Does this makes a difference in the implementation?

@avivey - the order of the keys is irrelevant when it comes to determining uniqueness, but extremely important in other respects. In order for an index to be used when selecting rows, the database must be provided with values for the index fields, in the order specified. Ie you always need to specify a value for the first field, and can get away without successive ones.

So to make use of the (src,type,dst) index, the database must be provided with a src value, or src & type values, or src & type & dst values.

Hence if you might want to do a query filtering just on a dst value, you also need a (dst) index, or in this case (dst,type,src).

Interesting, and a little troubling if I understand it correctly; It means that the query ... WHERE dst = $a and type = $b and src = $c will be a full-table-read, but the equivalent query src = $c and type = $b and dst = $a will use the index to resolve quickly?

I tried to find where this 2nd key is introduced, and it looks like it was snuck in in https://secure.phabricator.com/D10577 with no explanation, while making the DB ready for full unicode.

BTW, epriestley wanted to remove the dataID column completely (https://secure.phabricator.com/T10574), which would also make this moot (but it would be a large job).

I don't have any objection to the suggestion, anyway.

Sorry, I'm not explaining it very well. For the database engine to use index (col1, col2, col3), col1 must be specified somewhere in the WHERE clause, but it's precise location is irrelevant - the database engine will work it out. As to why we need both indexes, there's a good explanation on https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys

The point I'm trying to make is whilst we (probably) need both indexes, because they both feature the same columns, only one of them has to be unique. And a primary key index is, by definition, unique.