[ts-gen] Debugging intermittent lost posts [Was: Do I really need to ...]

Bill Pippin pippin at owlriver.net
Mon Sep 14 19:34:50 EDT 2009


About losing order status posts:

> ... I am [intermittently losing posts for] ... records in the
> OrderStatus table which are present in the shim log. However, having
> 100% reliable order status information seems to be crucial as soon
> [as] one feeds it back to some kind of order logic.

I'd really like to track this problem down.  Could we first find out
what's wrong with value-based contract expressions for the symbol
you're working with?  I'm bothered by the fact that your install of
the shim can't read contract expressions for the newer GBP contracts,
and I wouldn't be surprised at all if this is connected to problems
with order status writes.

If this doesn't get you anywhere --- and I'm *very* suspicious of
symbol lookup issues given that you are seeing intermittent problems
with database writes --- and you instead wish to go on and debug the
problem yourself, I'm delighted to give you pointers below on where
you might look:

> ... is there a canonical break point in the sources from where ...
> one might be able to trace OrderStatus [message posts] ... ?

Feel free to look at post.c,  and add "fprintf(stderr, ... )"
statements with wild abandon.  After all, you can always chop
them out again, or restore from your original tarball, or download
yet another. :-)

Poster::post/3 provides a loop over all the messages of a particular
type collected during the current time slot, typically at most 20 

Poster::post/5 maps individual messages to additional lines of the
sql insert statement for a particular message-related journal table.  
Possible points of failure include message to record mapping, where
the term returned is true or not; Compound::poSt(), in form.c, where
the sql record term is formatted as text appended to a growing sql
insert statement, with a flag returned, ditto; and the nested
uniqueness check following, again with a boolean flag returned.

The chunk of code in post.c that you probably want to focus on
checking is the following:

60:  if ((term = m.rule.map2(t.aTom, m, a.slot)) && r.poSt(t, *term))
61:      if ((uniq = r.css.unique(vecs, *term)));
62:      else b.pop(b.size() - tail), b << seal;
63:  else                 param_err_msg_not_posted(fail, m),
64:      log_fail(e, fail.param_err_msg_not_posted);

By the way, although you might expect from the above that the
uniqueness check might be causing a silent failure, if you look at
name.c, you'll see that the return is clamped to true once given a
successful term mapping, in order to absolutely rule out just such
a problem, so possible problems here should in principle be limited
to the calls of PostMsg::map2() and SqlPost::poSt() .

The actual database write occurs back up in Poster::post/3, in the
conditional that follows the loop:

33:  if (t)
34:      dbms.send(b),
35:      tabs.read(e, *t),
36:      signal(e, *t, k), e.log.log_text(e, rule, note);

Note in the above that the database write, read, and log format
operations occur in sequence, and also that there should be
a trace for the mysql error text if the dbms send fails.  So,
in theory, another point of failure would be the conditional 
test above, although it should succeed if the Poster::post/5 call
ever succeeded.

The variable b is a buffer, and although your trace code shouldn't
touch it here, since reading a buffer terminates the appended text
with a null, you can instead add printfs at various points for the
database write in fill.c .  E.g., for the PerQuery ctor that is
indirectly called by Dbms::send(), you could print out the query
text variable "q" in the ctor body:

69:  env::PerQuery
70:     ::PerQuery(Msql_1 m, char_0 q)
71:         : dbms(m),
72:           conn(m.conn),
73:           text(m.text),
74:          query(q),
75:         status(       mysql_query(conn, query)    ),
76:         insert(self ? mysql_insert_id(   conn) : 0),
77:         result(self ? mysql_store_result(conn) : 0),
78:          width(self ? mysql_field_count( conn) : 0),
79:    store_fault(width && !result)
80:  {}
// insert fprintf(stderr, "PerQ: %s\n", q); in ctor body above

Please feel free to post more questions if this doesn't help.  I'd
like to clear this problem up.



More information about the ts-general mailing list