Reading in TDT Files ==================== .. _Python: http://python.org .. _NumPy: http://numpy.scipy.org .. _pandas: http://pandas.pydata.org .. _dtype: .. http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html .. _dtypes: .. http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html .. _DataFrame: .. http://pandas.pydata.org/pandas-docs/dev/dsintro.html#dataframe .. _Cython: http://www.cython.org .. _fused types: http://docs.cython.org/src/userguide/fusedtypes.html .. _MultiIndex: http://pandas.pydata.org/pandas-docs/dev/dsintro.html#multindex TDT File Structure ------------------ There are two types of TDT files necessary to create an instance of :class:`span.tdt.spikedataframe.SpikeDataFrame`: one file ending in "tev" and one ending in "tsq". Note that this differs slightly from TDT's definition of a tank. TSQ Event Headers ----------------- The TSQ file is a C ``struct`` making it trivial to work with in `NumPy`_ using a compound `dtype`_. According to `Jaewon Hwang `_, the C struct is .. code-block:: c struct TsqEventHeader { long size; long type; long name; unsigned short chan; unsigned short sortcode; double timestamp; union { __int64 fp_loc; double strobe; }; long format; float frequency; }; *but* this code **will not work** on most modern systems because ``long`` is implementation defined--the compiler writer defines it. I have not run across a compiler on a 64 bit system that defines ``sizeof(long)`` to be 32. Thus the most accurate version (and the one used in **span**) is .. code-block:: c #include struct TsqEventHeader { int32_t size; int32_t type; int32_t name; uint16_t chan; uint16_t sortcode; double timestamp; union { int64_t fp_loc; double strobe; }; int32_t format; float frequency; }; .. warning:: If you're using this code on data that were created on a Windows 7 machine then may have to change ``int32_t`` to ``int64_t``. I have not tested this code on data created on a Windows 7 machine so **use at your own risk**. Reading the TSQ file into `NumPy`_ is, fortunately, very easy now that we have this ``struct``. .. code-block:: python import numpy as np from pandas import DataFrame from numpy import int32, uint32, uint16, float64, int64, int32, float32 names = ('size', 'type', 'name', 'channel', 'sort_code', 'timestamp', 'fp_loc', 'strobe', 'format', 'fs') formats = (int32, int32, uint32, uint16, uint16, float64, int64, float64, int32, float32) offsets = 0, 4, 8, 12, 14, 16, 24, 24, 32, 36 tsq_dtype = np.dtype({'names': names, 'formats': formats, 'offsets': offsets}, align=True) tsq_name = 'name/of/file.tsq' tsq = np.fromfile(tsq_name, dtype=tsq_dtype) df = DataFrame(tsq) The variable ``tsq`` in the above code snippet is a `NumPy record array `_. I personally find these very annoying. Luckily, `Wes McKinney `_ created the wonderful `pandas`_ library which automatically converts `NumPy`_ record arrays into a `pandas`_ `DataFrame`_ where each field from the record array is now a column in the `DataFrame`_ ``df``. TEV Raw Data ------------ The raw data are contained in the file with the extension ".tev". There is a single function that does the heavy lifting in `Cython`_ and the rest is done in pure `Python`_. The basic idea that the :attr:`fp_loc` field of the header `DataFrame`_ (from the tsq files) contains the location in the tev file of the samples for a particular channel. What's nice about :mod:`span` is that it hides this complexity from the user. If you like complexity, then read on. TL;DR (too long; don't read) """""""""""""""""""""""""""" Reading in the Raw Data ~~~~~~~~~~~~~~~~~~~~~~~ Now that we've got the header data we can get what we're really interested in: raw voltage traces. There are some indexing acrobatics here that require a little bit of detail about the tsq file and little bit of knowledge of "group by" style operations. First off, there is a Cython function that does all of the heavy lifting in terms of reading raw bytes into a NumPy array. What is passed in to that function is important. The first argument is of course the filename, no surprise there. The second argument is important. This is the numpy array of file locations grouped by channel number. This is an array that contains the file pointer location of each consective chunk of data in the TEV file. That means that if, for example, I want to read all of the data from channel 1 then I would loop over the first column of this array. Since each element is a file pointer location I would seek to that location and read ``blocksize`` bytes. The Cython function does this automatically for every channel. The third argument is ``blocksize`` and the fourth argument is the output array that contains the raw voltage data. Here is the inner loop that does the work of reading in the raw data from the tev file. .. literalinclude:: ../../span/tdt/read_tev.pyx :language: cython :lines: 57-76 You can see here that this part of the :func:`span.tdt._read_tev._read_tev_raw` function skips to the point in the file where the next chunk lies and placing it in the array ``spikes``. This codes works on any kind of floating point spike data (by used `fused types`_ and it also runs in parallel for a slight speedup in I/O. As usual, the best way to understand what's going on is to read the source code. Organizing the Data ~~~~~~~~~~~~~~~~~~~ Whew! Reading in these data are tricky. Now we have a dataset. However it's not properly arranged, meaning the dimensions are not those that make sense from the point of analysis. I'm not exactly sure how this works, but TDT stores their data in chunks and that chunk size is usually a power of 2. The number of chunks depends on the length of the recording and is the number of rows in the TSQ array. So, ``tsq.shape[0]`` equals the number of chunks in the recording. Now, each chunk has a few properties, which you can explore on your own if you're interested. For now, we'll only concern ourselves with the ``channel`` (``chan`` in the C ``struct``) column. The ``channel`` column gives each chunk a ... you guessed it ... channel, and thus provides a way to map sample chunks to channels. Electrode Array Configuration ----------------------------- See the :mod:`span.tdt.recording` module documentation. ``span.tdt.tank`` ----------------- .. automodule:: span.tdt.tank :members: ``span.tdt.spikedataframe`` --------------------------- .. automodule:: span.tdt.spikedataframe :show-inheritance: :members: