Forum: help

RE: No Of Samples [ Reply ]
By: Sacha Krstulovic on 2007-01-31 11:18

> I have the feeling that the pos value is the same
> pos value as in the original sound, right? I mean,
> this center of this atom window corresponds
> exactly to the 4800 sample of the original sound.

The pos value is the position of the first sample of the window (not its center) with respect to the beginning of the signal (time/sample t=0).

> The second thing is that I'm going to implement
> the gammatone filters for the mptk. Is there a
> roadmap how to add a new filter? I can of course
> look at the mdct implementation and imitate
> the way, it was implemented. However it would
> be nice, if you had a written roadmap for me.

There was a roadmap but it's very much out of date (file HOWTO_ADD_A_CLASS in subdir src/libmptk/). Your best bet is probably indeed to get inspired from the mdct or gabor block/atom.
Roughly:
1) create your own subdirectory under src/libmptk/atom_classes/contrib/, something like the name of your lab;
2) instantiate your own block and atom classes. Their interface to the rest of the code uses the base block.h/atom.h classes, so you can derive your own classes directly from these base classes, or if you need any particular transform your classes could be derived from the Gabor or MDCT classes. Your classes should instantiate the virtual methods from the base block.h/atom.h.
3) now, the tricky part: adding your block to the dictionary XML format. The dictionary uses a parser written in flex (dict_scanner.lpp in src/libmptk/atom_classes/). The principle is that the flex scanner will associate pieces of string-parsing C code to regular expressions, to get the parameter values from the XML text file. The pieces of C code fill a structure called MP_Scan_Info_c, which is defined in atom_classes/block_io_interface.h. Substeps are:
a) learn flex (the documentation on the GNU website is very good, or the O'Reilly book is nice as well);
b) add the regular expressions needed by your block to dict_scanner.lpp;
c) add the fields needed by your block to MP_Scan_Info_c, in conjunction with the parser's flex code;
d) add the management of your parameter's default values to the MP_Scan_Info_c::pop_block() method, and their writing to disk to the MP_Scan_Info_c::write_block() method.
That's it for the block I/O. This looks a bit hairy but if you get accointed with flex, the code should be straightforward to understand.
4) Add the I/O for your atom, by simply adding your atom class to the read_atom() factory function in atom_classes/atom_classes.cpp.
5) You will need to modify the src/libmptk/CMakeLists.txt to insert your sources into the CMake build process. The automake makefiles can also be used, but the automake-based build system will be abandonned soon so you should prefer the use of CMake.
6) WRITE SOME DOCUMENTATION for your new classes, either as a separate document, or as an edit of doc/userman/latex_src/userman.tex.

Once your code is ready, tested and documented, if you wish we can bundle it with MPTK under the GNU General Public License and cite you/your lab as a contributor. Basically your lab remains the owner of your code but the license authorizes other people to see the sources and to derive their own code from them without owing you money. Please don't forget to cite the MPTK web page in the publications derived from its use:
@Misc{MPTK,
author = {R. Gribonval, S. Krstulovi{\'c} and B. Roy},
title = {{MPTK}, {The Matching Pursuit Toolkit}},
note = {see \url{http://mptk.gforge.inria.fr/}}
}

I hope this helps.

Good luck in your implementation, and best regards;

-*- Sacha -*-

RE: No Of Samples [ Reply ]
By: Kamil Adiloglu on 2007-01-30 15:02

[forum:2670]

Hello Sacha and Rémi,
thanks a lot for your quick respopnses. They helped a lot to understand and to find out how to pursue.
My point using matching pursuit is to find a representation method for a group of sounds, which in turn will be classified by a machine learning algorithm. Therefore this representation method should have the same number of dimensions independent of the length of the original sounds. Besides, for the sake of the quality of the classification, the representation should reflect the characteristics of these sounds in an efficient way, so that similar sounds have similar characteristics.
I need the numSamples of the analyzed sounds, because I would like to normalize all of the sounds that I analyze. Then I would like to observe the plots and try to find similarities within those plots for the sounds, which are known in advance to be similar. I can also run a machine learning method to classify them. So, the normalization enables me to compare the relative positions of two atoms within the books of different sounds.
At that step, my questions to you would be: Are the position values of the atoms the original positions within the analyzed sounds or do they have another meaning as well? For example, if an atom has the pos value 4800 in a sound, which has 26081 samples originally, and whose analyses book gives the numSamples value 26048. I realized that the pos value never exceeds the numSamples value of the book. However I have the feeling that the pos value is the same pos value as in the original sound, right? I mean, this center of this atom window corresponds exactly to the 4800 sample of the original sound. Am I wrong?

The second thing is that I'm going to implement the gammatone filters for the mptk. Is there a roadmap how to add a new filter? I can of course look at the mdct implementation and imitate the way, it was implemented. However it would be nice, if you had a written roadmap for me.

Thanks in advance.
Best Regards
Kamil

RE: No Of Samples [ Reply ]
By: Sacha Krstulovic on 2007-01-30 14:05

[forum:2667]

Hello Kamil and Rémi;

I think this question reduces to a choice between two ways of interpreting the definition of the atoms. After an illustration of the logic that currently prevails, I will expose both views for the sake of the discussion.

The logic underlying the current implementation is: once the book is emitted, it does not know anything about the signal it comes from (besides the sampling rate), and it can have an existence of its own. In the book, the atoms are understood as short waveforms which are floating somewhere in a sea of zeros; hence, the book is able to infer the zeros coming before the earliest atom (because it knows the location of the earliest atom), and then it also knows the position of the end of the latest atoms, but it does not try to infer the possible trailing zeros.

The opposite way to see this, which would support the use of the same numSamples for the book and the analyzed signal, would be: all the individual atoms in a book should have, by definition, the same number of samples as the analyzed signal, to respect Matching Pursuit's inner product definition and orthogonalization principles; hence they should take into account the number of zeros preceding and trailing the location of the nonzero parts of the analysis windows.

While the latter is probably cleaner from a math and correlation analysis point of view, it sort of hampers the possible interpretation of the atoms as being short waveforms located anywhere along the time axis (like musical notes that would exist independently of the whole score). In a way, the current book format is more general than the specific Matching Pursuit analysis: Matching Pursuit uses a collection of atoms which all have the size of the analyzed signal, but these atoms have non-zero samples over a specific support only, and thus they can be interpreted as isolated waveforms; the current book implementation indeed registers a set of isolated non-zero waveforms positioned in time, and these waveforms can be related to a wider and more specific support, if needed, by zero padding.

I don't know which logic you prefer. My vote goes to the current one, because I think it is more general than one that would be tied to the specific MP analysis: the book format can already be used, in its current state, to store something more general than the result of a Matching Pursuit analysis, and then the whole mpf/mpr mechanism could be reused with the results of analysis algorithms which would extract isolated waveforms with a different analysis paradigm.

To complete the picture, just a few practical remarks about the current implementation:
- the numSamples parameter from the book informs the user about the contents of the book, not about the analyzed signal. As explained, it currently corresponds to the support of the sum of the contained atoms, without the possible trailing zeros. With this logic in mind, maybe this parameter should be named differently to avoid a confusion with the signal's numSamples, e.g., numSamplesInBook.
- The current implementation stemmed mainly from the practical concern that one may want to add books and residuals coming from different analyses, i.e., from signals which don't have exactly the same length, without having to worry too much about the compatibility of the signal sizes (and array boundaries).
The current behaviour of mpr is: if you give it a residual, the rebuilt signal will have the size of the residual, no matter the extent of the atom positions in the book, and in this instance the residual logically has the same length as the signal which it comes from (notwithstanding possible zeros at the end). Conversely, if you give mpr a book only, it will create a signal which is made by addition of the atoms contained in the book, because it does not know how many zeros it should add at the end to rebuild the signal which has been originally analyzed.
Currently, you can modify the book's numSamples by hand (or in Matlab) and rewrite the book on disk; when you give it to mpr, the size of the rebuilt signal will correspond to the hand-tweaked numSamples. A cleaner way to do this would be to add a --signalsize=x switch to mpr, to enforce a particular signal size when rebuilding (that should be fairly easy to do, tell me if you're interested in that kind of modification).

I hope this helps and feeds the discussion.
Cheers;
-*- Sacha -*-

RE: No Of Samples [ Reply ]
By: Rémi Gribonval on 2007-01-30 09:29

[forum:2666]

Dear Kamil,

We have had numerous discussions with Sacha on how the numSamples (and numChans) fields should behave, and we are not yet sure we opted for the best behavior. In a sense you are right when you expect numSamples to be the number of samples of the original analyzed signal, and numChans to be its number of channels. However the current behavior is a bit less intuitive. At a given step of the mpd iterative process, numSamples indicates the 'effective' time support of the signal that will be reconstructed if mpr is used. That is to say, even if the reconstructed has 'm' samples, only the first 'numSamples' can be nonzero, because so far mpd has not selected any atom which waveform has any sample beyond the numSample-th one.

Practically, when the book is empty its numSamples is zero, and each time an atom is added to the book, numSamples is increased only if needed, that is to say if the time support of the added atom goes beyond the current range [0 numSamples-1].

If you run sufficiently many iterations of mpd, and your analyzed signal has nonzero samples at its extremities, then eventually numSamples should take the value you expect.

As I said, this behavior is somehow still in discussion, and we are interested on your feedback to improve the usabilty of MPTK, so do not hesitate to comment further on this topic.

I hope this helps,

Best regards,

Rémi.

No Of Samples [ Reply ]
By: Kamil Adiloglu on 2007-01-30 09:09

[forum:2665]

Hi,
I have another question concerning the no of samples variable in the set of atoms found by the algorithm. If I read the book generated by mpd, by using bookread MatLab function I obtain an array of atoms. There I obtain the fields numAtoms, numChans, numSamples, sampleRate and several other fileds, and of course the array of the atoms. my question is about the numSamples field there. At the beginning, I thought that this variable holds the number of samples in the original sound, which is now decomposed with mpd. However, then I realized that the value of this variable changes according to the number of atoms that I want to get. If I choose for example 16 atoms I get 12032 samples. If I choose 1600 atoms, this time the value of numSamples variable increases to 26048. However I know that the the original sound has 26081. Why do the numSample variable have always a different number of samples? What does this value indicate?
My aim was to normalize all sound files that I decompose using mpd, and then look for some similarities between those decompositions. I expect to obtain similar decompositions for similar sounds, like opening door sounds have similar decompositions at least in time, when I normalize them in time.

Thanks in advance.
Regards
Kamil