Bug #676
openPipeliningModule::LoadBank calling TIBlobModule::LoadSlot with too short a length
Description
I don't know if this is an Analyzer decoder bug or if hcana's TIBlobModule needs to be written differently.
PipeliningModule::LoadBank in non multiblock uses FindIDWord to find the end of the bank by looking for a particular pattern. When it calls TIBlobModule::LoadSlot (from hcana) it sometimes returns a pointer to the end of the bank that is too early. One of the words in the TIBlobModule is the TI time which can be anything. So sometimes the TI time matches the pattern that FindIDWord is looking for.
A non-problematic bank looks like:
<uint32 data_type="0x1" tag="4" num="0">
0x85422301 0xff112001 0x4010002 0x223 0x8bee531b
0x8d400006 0xfd4f1110 0xfd4f1110
</uint32>
The 0xbee531b word is the TI time and 0x8d400006 is the trailer word pattern pattern that FindIdWord matches. (kBlockTrailer and slot 21).
A problematic bank looks like:
<uint32 data_type="0x1" tag="4" num="0">
0x85422401 0xff112001 0x4010002 0x224 0x8d6d1fbf
0x8d400006 0xfd4f1110 0xfd4f1110
</uint32>
Here 0x8d6d1fbf is the TI time, but it matches the pattern because FindIdWord only looks at the top 10 bits.
I can simply ignore the length that is sent to TIBlobModule::LoadSlot, but that seems unsatisfatory.
I suppose the same issue could be in FindEventsInBlock for fMultiBlockMode. If so, maybe not all the events in a block might be found if a payload word matches the block trailer.
Updated by Ole Hansen over 2 years ago
- Status changed from New to In Progress
After some investigation, I conclude that the TI trigger time word is incompatible with the expected data format for a PipeliningModule. To be safe, PipeliningModules must not generate arbitrary payload data words whose topmost bit is set, i.e. (word & (1U<<31)) != 0
because such words might be interpreted as block or event headers or trailers. In reality, the range of ambiguous data values is smaller. The range 0x80-0x97 for the top 8 bits is the danger zone. This range can be even smaller under additional assumptions which may or may not hold. But shrinking the range of disallowed values only reduces the likelihood of a conflict but doesn't fix the underlying problem.
It is easy to fix the decoding of the TI module, as Steve already says, but the bigger question is how much these rogue data words affect the decoding of other modules. At the moment, it looks like things are fairly safe as long as the TI module is in a dedicated CODA bank. This seems to be the case, the crate map (https://github.com/JeffersonLab/hallc_replay/blob/master/MAPS/db_cratemap.dat) has all the TI modules (slot 21), and only those modules, in bank 4.
I'll have to think of a way to make the decoding of PipeliningModule more robust so the TI modules can be decoded reliably, even in multi-block mode. Better yet would be to fix the firmware that writes that ambiguous data word. There's certainly no loss in adding one extra word to these very small (5 or 6 word) TI events.
Updated by Ole Hansen over 2 years ago
- % Done changed from 0 to 30
I have added an experimental check to PipeliningModule::LoadBank that verifies that the number of data words reported in the block trailer agrees with the number of data words implied by the block trailer's position in the buffer, which should always be the case. It seems to work, but I still have to test it further to make sure it causes no unintended trouble. With this check and some extra logic in PipeliningModule, the buffer length given to the TI module decoder should then be correct in single block mode. Not that this is much gain since the TI data can also successfully be decoded in single-block mode with a change in the TI module's decoder (see Steve's hcana commit d42208a).
As for multiblock mode, I think there is no general way this problem can be handled in PipeliningModule. What we can do, however, is to override the LoadBank method in TIBlobModule and use the known data format of the TI events to decode the bank. However, this will still require the TI modules to be in a dedicated bank, which is of course easy to do.
It would help to know how a TI bank would look in multiblock mode. Is the first word after the block header (0xff112001
in the example data above) an event header? So, with time stamp data present, a TI event is 4 words long, including the headers? Hence, if the block size is N, a TI bank would be 2 + 4*N words long (plus possible filler words), where the 2 accounts for block header and block trailer?
Updated by Ole Hansen over 2 years ago
Also, for reference, Steve thinks the example data come from this run: /cache/hallc/c-pionlt/raw/shms_all_16915.dat.
Updated by Ole Hansen over 2 years ago
Another issue with decoding the TI module in multiblock mode may be the format of the event header. If 0xff112001
really is an example of this module's event headers, then it does not conform to the event header format that PipeliningModule assumes, which is that the upper 5 bits are 10010 (lower 4 bits = data_type_def = 2), i.e. the topmost byte is between 0x90 and 0x97. Instead we have upper 5 bits 11111, which looks like a filler word (data_type_def = 15) to the bank decoder.
Updated by Ole Hansen over 2 years ago
I wrote a custom decoder for the TI data format. It tests OK with recent Hall C data. No errors about TI time mismatches. See hcana pull request #489. Someone should probably test this further with one or more Hall C production replays.
Updated by Ole Hansen over 2 years ago
- Status changed from In Progress to Resolved
- % Done changed from 30 to 100