[ts-gen] Debugging intermittent lost posts

Bill Pippin pippin at owlriver.net
Mon Sep 28 14:02:13 EDT 2009


Nils,

I was out the latter part of last week, and I'm just getting to your
post now.  Thanks much for your meticulous debugging efforts; your work
illuminates the problem completely.  I'll first list the key stages of
your detective work, and then give my proposed solution near the end of
this message.

> ... result was missing in the OrderStatus table. ... [after entering]
> the same statement into an mysql session ... following error message: 
 
> ERROR 1062 (23000) at line 1: Duplicate entry ... for key 3

So, the lost inserts are due to duplicate keys, more precisely 
(client_id, order_tag) pairs, and the shim's past silence thereof
is due to my use of mysql's insert ignore feature, which converts a
handful of data-dependent sql errors to warnings.  [Users who want to
be notified when these duplicate keys occur should use the new warn
option.]

I'd already discussed this issue of mysql warnings and lost inserts
as a hypothetical problem in an earlier post.  Your work not only
confirms the earlier hypothesis, it also narrows it from data-dependent
warnings in general, such as illegal nulls, to the specific problem of
dup keys.  Thanks much; this is a critical observation, and lets me put
aside all the other complications I'd considered.

> By the way, the standard error of shim remained empty and didn't
> complain about anything - might it be that this is a consequence 
> of the 'ignore' attribute? 

If the stderr was empty, I suspect you weren't using the warn option.
Please let us know if, while using the warn option, you have lost
inserts without any stderr warning info, for now just a warning count
with an echo of the related insert statement text.  You should be
getting that warning information when this problem occurs.  Also, I
do plan to improve the text feedback users get with the warn option;
the current implementation is incomplete.

> ... how to fix [the problem of duplicate keys?]  Might it be the case
that order_tag is only unique for a (or a few) day(s) and then cycles
back?

Indeed.  I had believed that the IB tws maintained the tick id numbering
for orders over a longer time span --- in fact, I'm certain they used to,
at least for paper account testing --- but it's now clear that we can't
depend on the IB tws to provide the degree of monotonicity you need.

So, the shim has to take care of it right after session handshake
negotiation, when the IB tws has provided the candidate starting point
value for the tick id sequence, via the unsolicited next id message.

My planned change to the startup code is as follows:

    0.  Given a shim-provided handshake client_id of C, and the
        resulting next id message from the IB tws with a proposed
        tick id sequence starting value of X; then:

    1.  the shim will interpolate C into the following query template:

            select max(order_tag)
              from CreateEvent
             where client_id = %d;

    2.  use it to select from the database a max id value Y; and

    3.  use max(X, Y+1) as the tick id sequence starting point.

This would be the new default behavior, and would give a monotonically
increasing sequence of order_tag values in the database for each risk
mode client id; and, not incidentally, unique keys for the resulting
(client_id, order_tag) pairs occurring in OrderStatus records.  For
those users wanting to maintain the current behavior, there would be
a dups option to accept the proposed IB tws tick id reset when such
occurred.

Note that the user would be responsible for selecting a new risk mode
client_id via the config file if the order id had passed two billion
and was nearly about to max out, which rollover occurs at 2^31 - 1.

I plan to provide the feature fix some time this week, perhaps in the
next couple of days; I will be interleaving this task with ongoing work
on memory allocation, and I'll post to the list when I make the
release.

Note that I will continue to use the "insert ignore" feature.  Inserts
are ganged together, and I'd rather lose some than all.  Although the
new algorithm for tick id start value computation should solve the
particular problem you've seen of lost inserts due to duplicate
(client_id, order_tag) pairs, truly superfluous duplicates do occur
with journal-related messages, and so there are valid reasons to drop
duplicates.

The key problem here was not the dropped insert --- that was just a
symptom --- but rather, first, the duplicate key, which the above
change should fix; and second, the silence thereof, for which users
should use the new warn option.

Thanks,

Bill



More information about the ts-general mailing list