//NEWCODE How To Develop Projects Using LHOTSE ------------------------------------ A LHOTSE developers manual is planned, but not yet available. In the meantime, this file contains some hints which will get you started developing projects using LHOTSE. You should also have a look at project sources, which can be found in src/ and subdirectories, or at LHOTSE library sources in lhotse/ and subdirectories. All class interfaces are documented in the corresponding header files. 0) LHOTSE Standard Make System Most complete LHOTSE projects can be built with the GNU configure-make system. However, this is tedious and slow during day-to-day development. The LHOTSE developers use a much simpler make system, which however has to be configured by hand. The standard make system is in /old. The documentation file /doc/simple-make-system.txt gives details about how to configure it. 1) Structure of LHOTSE LHOTSE has a simple structure. Code is generally organized in modules. A module is something like a C++ namespace, although at present, LHOTSE does not make use of namespaces (all public names are in the global namespace; note that LHOTSE uses STL, which does define a std:: namespace). LHOTSE library modules are compiled into LHOTSE libraries. These modules provide the core of LHOTSE and all classes which are deemed generally useful, have clean, completely documented interfaces, are encapsulated, and have been tested reasonably well. At present, all library modules are compiled into a single LHOTSE main library: liblhotse.a. The main library source tree is rooted at lhotse/. The sources of the global module are at lhotse/, the sources of module are at lhotse//. At present, module sources do not have further subdirectories. The LHOTSE developers keep their own projects in src/. Each project consists of some main code file in src/, and possibly one or several project modules. Each project module occupies a subdirectory from src/, just a main library modules occupy subdirectories from lhotse/. Projects which are officially shipped with the current LHOTSE version, lie in /src. Project code may use any of the library modules, while library code must not use any project modules. 2) Setting up your project sources We strongly recommend that you keep your project sources separate from the LHOTSE source tree, given you are not a LHOTSE developer yourself. The LHOTSE source tree is, however, required at the time you compile your project using LHOTSE. If the LHOTSE source tree is rooted at , you have to pass -iquote or -I to your compiler. Also make sure that the LHOTSE libraries are found by the linker. You may also use LHOTSE project code in /src in this way. You may also use the LHOTSE build system, which is used to install LHOTSE source packages. This is useful if your system essentially has the same system requirements as LHOTSE. Modifying the LHOTSE build system is not hard, but requires some familiarity with the GNU autotools, and is not described here. 3) Including LHOTSE interfaces Interfaces to LHOTSE public classes are kept in header files in the source tree. The most commonly used and elementary LHOTSE classes are in the global module of the main library, in lhotse/. All code files using LHOTSE have to include #include "lhotse/global.h" This includes a set of global interfaces, but not all of them. For example, in order to include class NumberFormats of the global module: #include "lhotse/NumberFormats.h" To include all classes of the global module: #include "lhotse/package.h" All other modules reside in subdirectories of lhotse/. For example, to include class StMatrix of main library module matrix, use #include "lhotse/matrix/StMatrix.h" To include all exported classes of module matrix, use #include "lhotse/matrix/package.h" Including package.h is not recommended, since it increases compile time. Some modules define exception classes in addition to lhotse/exceptions.h (all LHOTSE exceptions are derived from the class StandardException). For example, to include additional exceptions of the matrix module, use #include "lhotse/matrix/exceptions.h" 4) Writing your main code file This is mostly up to you, but LHOTSE requires some installation methods to be called for certain functionalities, and it also exports classes which can help you with writing command line applications (e.g., CommandParser). Please check the code files for existing projects (e.g., src/main_ivmsing.cc or src/main_klr.cc) for typical LHOTSE main code files. 5) Writing a project in the LHOTSE style As noted above, you can use LHOTSE within your project without conforming with the LHOTSE style at all. Conforming with the style may be simpler, as you can use a simple modification of the LHOTSE build system for your own project (see 2)). A LHOTSE project typically consists of a main code file and one or more modules. A simple project may only consist of a main code file. Stable projects shipped with LHOTSE are in src/ (e.g., ivmsing, klr). See 4) for how to write a main code file, and check the code files for existing projects. A module must contain the files given in src/template-new. Copy them and replace the placeholders accordingly. The template files assume that the new project module lies in src/, where is the module name. NOTE: Placing your own project sources into the LHOTSE source tree is not recommended (see file INSTALL), rather you should keep your sources separate. This facilitates installing new LHOTSE versions. If you use LHOTSE extensively, and especially if you intend to use the LHOTSE build system for your project as well (see 2)), our recommendation is to use symbolic links in the LHOTSE source tree to your sources. For example, if sources for your project module lie in , and the LHOTSE source tree is rooted at , go to /src and issue ln -s Now, if you install a new LHOTSE version, you merely need to re-create these symbolic links to your sources. 6) Packaging a LHOTSE project This will be described in more detail in the forthcoming developers manual. In the moment, we recommend looking at an existing LHOTSE project package, for example ivmsing or klr. Note that a project package does not include the LHOTSE library sources (see also file INSTALL). It is easiest if you conform with the LHOTSE style, putting your main code file in src/, and your modules in src/<...>. You need to place Makefile.am files in each subdirectory, just adapt the ones in a LHOTSE project package. You need to edit configure.ac, listing all Makefile's in AC_CONFIG_FILES. You also need to do some changes in the preamble, notably AC_INIT, AC_CONFIG_SRCDIR. We refer to the GNU autotools manuals. 7) Programming with LHOTSE: Central concepts In the absence of a more detailed manual, we just note some basic concepts here. Details can be found in the LHOTSE class header files. 7.1) Handle One of the main sources of bugs in C (and C++) programs has to do with memory allocation. You allocate some memory with malloc, then forget to deallocate it later. In C++, you dynamically create an object using new, then forget to destroy it using delete. Languages like Java or Lisp allow to be sloppy with allocations. Objects are to be created when first needed, and removed automatically once not needed anymore. The drawback is that expensive, complicated, and opaque mechanisms like garbage collectors are required. Handles in LHOTSE are an extension of what is known as smart pointers in C++. The idea is to wrap a pointer to a dynamically allocated object in a handle object. Upon destruction of the handle object, the referred object (called "representation") is destroyed. We use a central property of C++, namely that objects which go out of scope, are destroyed automatically, in order to implement smart pointers. This is not enough if many handles can refer to the same underlying representation, a central property of normal pointers. In a LHOTSE handle, each representation is associated with a reference counter, which counts the number of handles which refer to it. Associations and deassociations of handles lead to increase or decrease of the counter. The destruction of a handle counts as a deassociation. Once the counter drops to zero, the representation is destroyed. Handles are implemented in Handle (global module). Important features of normal pointers, such dereferenciation (*), application (->), assignment (=), or dynamic casting (cast) are supported for handles. Array features of normal pointers (+, +=, []) are not supported in Handle, but in ArrayHandle (see 7.2)) (it is a big design flaw of C to mangle pointers and arrays!). There is a zero handle, which is the same as a zero pointer. A handle is zero iff it is not associated with a representation. All of LHOTSE (libraries and projects) are using handles consequently (see rules below). We strongly recommend that you do so as well. There is not any significant overhead in doing so, neither in development time nor in running time performance, and we assume that it is not your favourite activity to track down memory leaks or segmentation faults. 7.1.1) Using handles: - Creation: Create a handle by wrapping an object just created dynamically: Handle hand(new T(...)); NEVER wrap an object which has not been created dynamically: Handle hand(&myobj); // NEVER! The default constructor creates a zero handle (without representation). The copy constructor acts like '='. You can create a handle which does not own its object: Handle hand(&myobj,false); This variant is deprecated, see 7.1.2). - Application (->) '->' has the same semantics as for a pointer. Applying '->' to a zero handle leads to an exception. - Assignment (=): '=' has the same semantics as for a pointer, in that the underlying representation is not copied. h2=h1; means that 'h2' is deassociated from its representation (if any), then associated with the representation of 'h1'. If 'h2' is Handle, 'h1' Handle, the assignment works if T1 == T2, or T1 is a child of T2. See also "Dynamic Casting". - Comparison (==): Two handles are considered identical if they have the same representation, or if they are both zero. 'h==0' can be used for a handle 'h', testing whether 'h' is a zero handle. '!=' is the negation of '=='. - Dereferenciation (*): '*' has the same semantics as for a pointer. If 'h' is Handle, then '*h' is a T& to the representation. For a zero handle, an exception is thrown. - (Automatic) Conversion to T*: A Handle which appears as r-value where 'const T*' or 'T*' is expected, is converted automatically into a pointer to the representation, or to the zero pointer if the handle is zero. The conversion can be enforced by using the 'p' method, in that 'h.p()' is the representation pointer for a handle 'h'. Use of 'p' should be avoided in general, but it is necessary in situations where automatic conversion does not apply. For example, dynamic_cast(h) does not work, you need dynamic_cast(h.p()) NOTE: Use 'DYNCAST(T,h.p())' instead of 'dynamic_cast(h.p())' in LHOTSE (see global.h, global module, for details). - Dynamic Casting (cast): Has the same semantics as 'dynamic_cast' for a pointer. Handle h2=Handle::cast(h1); for a Handle 'h1' checks whether a dynamic cast T1* -> T2* is possible. If so, 'h2' is associated with the representation of 'h1'. Otherwise, 'h2' becomes a zero handle. NOTE: The type of the underlying representation is not modified. The representation is always destroyed using the destructor of its real type, independent of the types of handles associated with it. - Reassociation (changeRep): A handle's association can be changed using 'changeRep', which works in the same way as the constructor, but on a handle which already exists. A handle can be deassociated by turning it into a zero handle: h.changeRep(0); 7.1.2) Rules for using handles (see also Handle header comments): - Wrap any dynamically allocated object into a handle immediately. Do not store a pointer to the object. Do not destroy the object using 'delete'. Instead of T* ptr=new T(...); ... delete ptr; use Handle hand(new T(...)); ... // Object destroyed once 'hand' goes out of scope - Do not pass pointers as arguments. Pass handles instead. Instead of // 'ptr' points to object // 'func' declared: ... func (...,T* ptr,...) func(..,ptr,..) use // 'hand' refers to object (as handle) // 'func' declared: ... func (...,const Handle& hand,...) func(..,hand,..) - Do not maintain pointers to objects as class members. Maintain handles instead. This has the following advantages: - If you store a pointer as class member, then destroy the underlying object accidentally, the class will not know about it, and will try to refer to the object still - Upon destruction of an object, all its members are destroyed. For a handle member, the reference counter mechanism applies, destroying the underlying object iff no other handles refer to it. For a pointer member, nothing happens at all (pointer types do not have destructors) - If a handle is stored in a class, other handles referring to the same representation can be destroyed, without the representation being destroyed. For example: Handle hand(new T(...)); MyClass a(...,hand,...); If 'MyClass' contains a handle member which is set to 'hand' upon construction, 'hand' can afterwards be destroyed (e.g., by going out of scope). Since the class member still refers to the representation that 'hand' referred to, the latter is not affected. NOTE: It is OK to pass pointers to methods if it is clear that these pointers are not stored in class members. You will find this in older LHOTSE code, but it is discouraged. In general, handles should always be preferred to pointers, unless there is a clear reason not to use them (such as calling foreign code). Note that the performance overhead of handles is minimal, do not avoid them for efficiency reasons. - NEVER do the following things: - Do NOT create an object non-dynamically, then wrap it into a handle. If you need to generate a handle referring to an object which has not been generated dynamically, you can do two things: - Go back in your code and do not create the object non-dynamically, but as a handle. Instead of T myobj(...); use Handle myobj(new T(...)); In both variants, 'myobj' exists in the same scope. This is the preferred variant. - Pass a handle which does not own its representation. T myobj(...); Handle hmyobj(&myobj,false); func(...,hmyobj,...); This variant is deprecated, and in general you should avoid to use handles which do not own their representation. They are nothing like usual pointers, and using them is unsafe. - Do NOT wrap a dynamically allocated object in more than one handle. T* ptr=new T(...); Handle h1(ptr); Handle h2(ptr); or Handle h1(new T(...)); ... Handle h2(h1.p()); The different handles cannot know that they refer to the same object, and either of them will destroy the representation upon its own destruction. If you want several handles to refer to the same representation, do the natural thing just as with normal pointers: Handle h1(new T(...)); Handle h2(h1); or Handle h1(new T(...)); Handle h2; h2=h1; You can avoid this problem by wrapping a dynamically allocated object in a handle immediately, not keeping the pointer around. Furthermore, use the 'p' method of Handle only if you really need to (see 7.1.1)). 7.2) ArrayHandle ArrayHandle is derived from Handle, but exports array capabilities of normal pointers, in a controlled way. Another one the main bug sources in C is running beyond boundaries of an array, resulting in a segmentation fault or something much worse. The mechanics behind ArrayHandle (MemWatcher, ...) are used throughout LHOTSE to control memory usage and access. We cannot go into details here, but refer to MatTimeStamp, BaseVector (module matrix). We strongly recommend that you use this way of managing memory regions as well. There is not any significant overhead in doing so, neither in development time nor in running time performance, and we assume that it is not your favourite activity to track down bugs caused by violating memory boundaries, or memory leaks. We also recommend that you use the controlled forms of accessing memory through 'ArrayHandle' in all but your absolute bottleneck code (use a profiler to find out). 'ArrayHandle' is for pure simple contiguous arrays, if you need matrices and vectors with changing size, arithmetic operations, etc., use the classes in module matrix, you will find many high-performance routines there already. Use vectorization through function objects (see below) to avoid error-prone and hard-to-understand loops. 7.2.1) Rules for using ArrayHandle: The same rules as for Handle (see 7.1.2)) apply here as well. The single most important rule is: Do not use memory allocation through 'new[]' (or even 'malloc') anymore. Do use ArrayHandle instead. Instead of T* array=new T[100]; or even worse (fails if T has constructor, no default construction, no element destruction) T*array=(T*) malloc(100*sizeof(T)); always use ArrayHandle array(100); Do not think of an array or memory region as something you store as a pointer, allocate with 'new[]', destroy with 'delete[]', but as an ArrayHandle. There is no overhead in doing so. You can reallocate an ArrayHandle with a different size, using 'changeRep': array.changeRep(200); You may think of an ArrayHandle as a handle for a dynamically allocated memory region (using 'new[]' instead of 'new'). Importantly, the size is always maintained together with a representation. '=' copies handles, without copying the representation. If 'array' in an ArrayHandle, 'array.size()' is the size of the representation. 'array[i]' refers to the i-th element of the representation. Importantly, this access is controlled. For out-of-range 'i', an exception is thrown. NOTE: Element access through the controlled '[]' operator is recommended for parts of code which are not bottlenecks. For bottleneck code, the pointer to the underlying representation is obtained by the 'p' method or through implicit conversion to T*. Note that the classes in the matrix module implement many high-performance operations very efficiently, calling external Fortran code which has been optimized by experts and possibly tuned to the computer architecture in use. In general, ArrayHandle is for low-level arrays and low-level memory allocation. If you need vectors, matrices, or other specialized data structures, use specialized LHOTSE classes, or the C++ Standard Template Library, or try to find implementations by specialists. In the latter case, if you have wrapped such code in LHOTSE, please consider sharing your code with others and contact the LHOTSE developers. 7.3) Function Objects Function objects are an important concept of C++, and are heavily used in the context of the Standard Template Library. In essence, they allow you to apply functions to objects such as containers (vectors, matrices, lists, ...) or other data structures. Syntactically, a function object is instantiated from a class which implements the '()' operator. LHOTSE supports function objects with several of its core data structures (vectors, matrices, ArrayHandle, ...). It also contains some extensions to what C++ and the STL allow you to do. The revelant LHOTSE classes are FuncObjects, AccumulFunc, and NullaryFunc. We cannot go into details here, please look at the relevant class headers. For example, you can apply a function object 'f' to a vector 'v', by w.apply1(v,f); This means that 'f' is applied to each entry of 'v', the result is written into the corresponding entry of 'w', Use x.apply2(v,w,f); for a binary function 'f'. The same works for matrices. MATLAB(R) programmers will recognize this vectorization concept. LHOTSE also provides accumulators, but they are not discussed here (see AccumulFunc, and 'accumulate' methods of vector, matrix classes). The same work for ArrayHandle as well, although in this case, the l-value must have the correct size (see ArrayHandle's 'apply1', 'apply2'). Note that all function object code is written as templates, and will always be inlined by your compiler, so using function objects usually does not sacrifice efficiency. An adapter is a template function which creates a function object, given some parameters, which can be function objects as well. This concept is very important, in order to build functions objects from atomic ones. You should understand that you can always code a function object by hand, by defining a class which exports 'operator()'. Using adapters will often save you such error-prone "trivial" coding, and make your code more readable. The STL defines many elementary function objects ('plus', 'multiplies', ... for arithmetic, 'equal_to', 'less', ... as predicates), please consult C++ (or STL) documentation. Examples: - 'bind1st', 'bind2nd' are STL adapters which bind 1st / 2nd argument to a given value. 'std::plus', 'std::divides' are addition, division, so that 'bind2nd(std::plus(),5)' is x -> x+5 with int, and 'bind1st(std::divides(),1.0)' is x -> 1/x with double. - STL 'ptr_fun' converts a pointer to a (unary or binary) function to a function object. 'ptr_fun(log)' is x -> log(x) (with double) as a function object. LHOTSE extends the range of adapters beyond what is offered by the STL, most notably allows you to compose functions objects into new ones. See FuncObjects for details, and the kernel classes in module gp (GaussianCF, MLPerceptCF, SimpleMaternCF) for real-world examples. Function objects can be used to do small arithmetic computations on vectors or matrices of type T != double, since many such arithmetic methods are defined only for the double classes. For example, use x.apply2(v,w,std::plus()) to add int vectors v,w (BaseVector). 7.4) Interval, Range Interval, Range are low-level classes in the global module. An Interval is a special case of a predicate function object (it exports '()': T -> bool, which is true iff the argument is within the interval). It is used in situations where elements of a data structure have to lie within a certain interval. Examples: CommandParser (global), 'checkBounds' of BaseVector (matrix). An interval can be open, closed, or unbounded at either side. It is a template class which works with types which export the '>', '==' operators. Range represents a range, i.e. an array of non-negative int values. A Range object can be of different types: - flat: Of the form s,s+1,s+2,...,t A flat range can be open: s,s+1,s+2,... (this is not possible for the other types). Its size is then determined by what it is applied to. - linear: Of the form s,s+step,s+2*step,...,t, where t-s mod step == 0. If step==1, the range is flat. step can be negative, but != 0. Both s, t must be >= 0. - indexed: Arbitrary set of non-negative integers. Range objects are important for working with matrices and vectors. They allow you to do most of the things you can do in MATLAB(R), but (often) without creating temporary copies. The full range is the special open range 0,1,... It is often the def. argument to subselection operators, such as 'operator()' in the matrix/ vector classes. The global function 'full()' returns a const ref. to this full range, it represents what is ":" in MATLAB(R). 7.5) FileUtils, NumberFormats FileUtils, NumberFormats are in the global module. Both consist of static methods. Together, they make handling of files a bit simpler than in native C++. NumberFormats tries to be somewhat system-independent, by catering for different byte sizes of types, or little-endian / big-endian conventions. However, this is not completely portable, and if you are into portable binary data formats, you should consider using a serialization library. 7.6) CommandParser This global class offers a simple parser for command files, by which the executables of some LHOTSE projects are controlled. Again, this is fairly simple, and you should use LEX, YACC for more advanced stuff. If you do so in a generic way, please contact the LHOTSE developers. 7.7) BaseVector, BaseMatrix, StVector, StMatrix (matrix/vector classes): The Module 'matrix' These are among the most useful features of LHOTSE. We do not describe them here in any detail, but refer (in the moment) to the class header files, and to project sources. We just give a few hints. The matrix/vector classes are more advanced than ArrayHandle, not only in that they implement a large set of methods, but also for the following reasons: - They allow for size changes. An object can change its size many times during its lifetime. It is kept in an underlying buffer, which may be larger than the current object. When more space is needed, a buffer extension strategy is used, the default is doubling the current size. This allows for maximum efficiency with reasonable overhead, avoiding frequent copying. Compare this to MATLAB(R)! As a current rule, buffer size does not shrink, unless requested by the user. If you know what size a matrix / vector will attend during its lifetime, it is of course best to allocate it with this size at the beginning. Subsequent size changes which do not grow beyond this initial size, do not cost anything in terms of copying or reallocation. - They allow for masking (see 'operator()' and 'mask' methods), in that you can access a part of a matrix/vector as a matrix/vector directly, without copying and copying back. Access every third element of a vector, access a vector read backwards, access row, column, or diagonal of a matrix as a vector, and do all of this without copying. To high-level methods, there is no difference between a true and a mask matrix/vector object, and if the requested operation cannot be done on a mask, copying and writing back is done automatically. Moreover, masking with flat or linear ranges (see 7.4)) are supported directly by LHOTSE and the underlying BLAS library, making copying and reallocation unnecessary. Compare this to MATLAB(R)! A mask object behaves in exactly the same way as a normal object (but see hints on temporary masks below), except that its size cannot be modified. Any size-changing operation results in an exception. Furthermore, a size-changing operation on a normal object leads to all masks referring to this object becoming invalid (this is a bit like an STL iterator becoming invalid if the underlying data structure is modified). This is the case even if the size of the normal object is increased (a specific invalidation is too complicated and expensive). This mechanism is realized in MatTimeStamp, it does not require an object knowing about masks referring to it. - Mask objects are light-weight, they are similar to handles, and they can be used as temporary objects without loss of efficiency. If 'v' is a vector, then v(Range(2,6))->func(...); applies vector method 'func' directly on the subvector v(2..6). Assignments work as v1(rng1)=v2(rng2); where only one copying process is used (it would be three or four in a system where subvectors are created as temp. copies). This allows to write code almost as readable as MATLAB(R), but without all the unnecessary copying. Columns/rows of matrices can be accessed as (mask) vectors. For example, if 'mat' is a matrix, mat(full(),3)=v; assigns column 3 of 'mat' with 'v'. 'full()' returns a ref. to the full range. - Their memory regions are controlled. With masking, several matrices and/or vectors may access the same memory region at the same time. The user need not worry about deallocating a region once nobody is using it anymore, or about notifying masks that their underlying true object was destroyed. This is done automatically, eliminating a source for nasty bugs. - The numerical types StVector, StMatrix (where the element type is double) offer arithmetic methods which wrap the complete BLAS interface, which is the backbone for all serious numerical linear algebra done today (on dense objects). The methods are easy and natural to use, in contrast to calling BLAS directly. The higher-level LAPACK library is wrapped partially, and further LAPACK functionality may be added on demand. In general, calling BLAS, LAPACK, and other Fortran code is quite simple in LHOTSE by adhering to the Fortran (or BLAS) matrix and vector format, and by the provision of some helper classes (WriteBackMat, WriteBackVec in module matrix). See FastUtils (module matrix) for examples. - Classes derived from BaseVector or BaseMatrix may require additional constraints on elements. For example, IndexVector <- BaseVector allows for non-negative entries only. Element conditions, and also global conditions, are supported in the generic template classes, and they can be realized in derived classes fairly easily (see IndexVector in matrix module). - There is some support for matrices of different structure in StMatrix, namely symmetric (stored in upper/lower triangle) and upper/lower triangular. These structures are fully supported by BLAS and LAPACK, and LHOTSE uses this support (something else you cannot do with MATLAB(R)). Non-rectangular structures can only be used with quadratic matrices. They are controlled by setting a StMatrix attribute, called structure pattern. The possible values of this attribute are defined (as constants) in MatStrct. NOTE: The structure pattern is a volatile attribute, and it is not in fact supported by the base class BaseMatrix which implements many generic methods of StMatrix. Always check (header comments!) whether a method supports structure patterns, and what this means. It is safest to set the structure pattern just before calling such a method. NOTE: Banded matrices supported by BLAS are not at present supported in LHOTSE. Furthermore, the packed storage format for triangular matrices supported by BLAS is not used in LHOTSE. Note that by using structure patterns, one can maintain two triangular (or symmetric) matrices in a a single StMatrix object. For example, if 'mat' is a quadratic StMatrix matrix, then mat.setStrctPatt(MatStrct::lower); sets the structure pattern to lower triangular, which (depending on what is done with 'mat') can also mean that 'mat' is symmetric, with its entries being stored in the lower triangle. In this case, entries in the upper triangle are ignored, and one can actually store a second symmetric matrix there. Since BLAS/LAPACK support this storage directly, there is no performance deficit. - LHOTSE comes with own file formats for matrix/vector classes. These are reasonably portable between different systems. For MATLAB(R) users, the LHOTSE distribution contains MATLAB(R) functions (in matlab/) for reading/ writing files in the LHOTSE formats. Here are some hints for working with the matrix/vector classes: - Do not try to be too clever with masks. If an underlying object changes size, all masks referring to it become invalid. It is safest to create masks whenever needed and dropping them afterwards. - Avoid having l-value and r-value in some expression being mask objects referring to the same underlying object, unless you exactly know what you are doing. Certainly never do this if the referred regions overlap! This is an important distinction to systems which do create temporary physical copies in expressions, such as MATLAB(R). Something like vec(Range(2,6))=vec(Range(1,5)); is not allowed in LHOTSE. It may lead to an exception, or (worse) to a wrong (i.e., non-anticipated) result. Tricks like vec(Range(5,0,-1))=vec(Range(0,5)); (to reverse a vector) may work in MATLAB(R), but they fail in LHOTSE. To do any of these in LHOTSE, a safe option is to create a copy explicitly. In general, vec(rng1)=vec(rng2); is allowed iff the ranges 'rng1', 'rng2' are distinct, not open, and of the same size, and if 'vec' does not have to be extended. - 'operator()' and 'mask' return temporary mask objects, which can in principle be used like a normal object or a non-temporary mask object. There are a few differences, however, which make their handling a bit awkward, but are necessary due to limited operator overloading or automatic conversion support of C++. First, since the '.' operator cannot be overloaded, it is replaced with '->' for temporary masks. You cannot write vec(Range(2,6)).prod(...) but have to write vec(Range(2,6))->prod(...) instead, or (more awkward) ((StVector&) vec(Range(2,6))).prod(...) This can be a bit confusing, since a non-temporary mask vector can be treated just as a normal one: mvec.reassign(vec,Range(2,6)); mvec.prod(...) The problem is caused by the fact that a temporary mask object like 'vec(Range(2,6))' is an instance not of the matrix/vector class, but of another specific one (in the example: TempStVector, instead of StVector), which is a way of ensuring that the temporary mask object can be used as l-value, while not being copied. Although there is a 'conversion to StVector&' operator present in TempStVector, the compiler does not always invoke it implicitly. If you obtain compiler errors for expressions including terms such as 'vec(Range(...))', try replacing them with '((StVector&) vec(Range(...)))' (explicit cast to reference type). Some commonly used operators can be applied to temp. masks without explicit casting (they are "mirrored" in TempStVector, etc.): vec(Range(2,6))=vec2; or vec1(rng1)=vec2(rng2); or vec(Range(2,6))=1.0; It is important to note that the variant using explicit casts to ref. type does always work in the same way, and you will find it in older LHOTSE code. Mirroring does not exist at present for other operators such as '[]', '==', '()'. In the future, more operators may be mirrored for convenience, but old code with explicit casts will continue to work. - Assignments with l-values being masks: There are some differences to what is familiar from a system which draws temporary physical copies (like MATLAB(R)). First, if a range is applied to a l-value vector which is too short, the vector is extended to the smallest size for which the range can be accommodated. For example, if 'a', 'b' have size 5, then a(Range(1,6))=b; leads to 'a' being extended to size 6 (the new entries are filled with a default fill value, which is 0 by default for the numerical types), then the assignment being done. NOTE: This can behave unexpectedly. For example, a(Range(1))=b; [*] leads to an exception. This is because the open Range(1) (standing for 1,2,...) is first applied to 'a' of size 5, resulting in a mask vector of size 4. The subsequent assignment would require a vector of size 5, but 'a(Range(1))' cannot be extended (it is a mask). NOTE: Open ranges can be used in this situation if there is place for the r-value, so that extension is not required. If 'a' is of size >= 6, [*] works fine. This is due to a special property of 'operator='. If the l-value is a normal (non-mask) object, its size (and content) is changed to that of the r-value. If the l-value is a mask, a size change is not permitted, but if the l-value mask is >= the r-value, the latter is copied to the first positions of the l-value, and the size of the l-value is not changed. ATTENTION: In the present implementation, automatic extension is used even if the temp. mask to be created is a r-value in an expression. For example, if 'a' has size 5, then b=a(Range(4,5)); would extend 'a' to size 6, instead of complaining (which would be more appropriate). This is a problem with C++, which may be fixed in the future. - Assignments where l-value and r-value refer to same underlying object: In the simplest case, this is about expressions a(rng1)=a(rng2); [*] In general, we mean expressions where l-value and r-value terms do refer, directly or through masks, to the same underlying object (these need not be assignments). RULE: Avoiding using any such expression keeps you out of trouble. Remember that LHOTSE allows you to save unnecessary temp. copies, but this means that you sometimes have to do temp. copies explicitly. In general, [*] is safe only if: - 'rng1', 'rng2' are disjoint, and - 'a' is not extended by the application of 'rng1' or 'rng2' Any other usage of [*] is NOT ALLOWED in LHOTSE. Unfortunately, using these does not lead to a compiler error. It usually leads to an exception. Worse, it could go through, but not do what was intended. Worst, it may work for some C++ compilers, and not for others, or it may work for the present LHOTSE version, but not in future ones. In the rest of this bullet point, we discuss why some of these cases go wrong. Read this if you are interested, but obeying the rule will never lead you there. Suppose that 'a' has length 5, and we want to duplicate it to obtain a vector of size 10. The following is not allowed: a(Range(5,9))=a(Range(0,4)); [**] Here is what happens: - The r-value expression leads to creation of a temporary mask, referring to 'a' - The l-value expression leads to 'a' being expanded to size 10. This means that a mask to 'a' exists at a point when 'a' changes size. This leads to the mask becoming invalid, and the assignment fails with an exception The correct way is to first expand 'a' (using 'expand'), then to do [**]. NOTE: The problem is that code like [**] does work for some compilers, because they create the temp. mask for l-value before creating the mask for the r-value, in which case the extension of 'a' happens before any masks being created. DO NOT rely on this! BTW, something like a(Range(5,9))=a; is not doing the job either. Here, 'a' is first expanded to size 10, then it is assigned to itself (which does nothing). The problem is that C++ is first creating all terms in an expression before computing the expression itself. Something like a=a(Range(1,2)); fails as well. The assignment operator will first bring the l-value 'a' to size 2. This is a size change, invalidating the temp. mask r-value. There are (at present) a few other useful classes in the matrix module: - Incomplete Cholesky decomposition (IncompleteCholesky, IncompCholMatrix, ICholKernelMatrix): The ICF is a popular method in Machine Learning in order to speed up large kernel method computations, using a low-rank approximation of one or more kernel matrices. - Methods for stable low-rank Cholesky decomposition updates (methods in StMatrix) - Sparse Matrices (SimpSparseMatrix): A limited set of sparse matrix methods is implemented in SimpSparseMatrix. If you have wrapped sparse matrix code in a generic way in LHOTSE, please do contact the LHOTSE developers. 7.8) The Module 'optimize' This module contains code related to optimization. At present, the following public optimization codes are wrapped: - L-BFGS-B (Zhu, Byrd, Lu-Chen, Nocedal): BFGS limited memory Quasi-Newton, allows for box constraints on the coefficients. See class LimMemBFGSB. The Fortran source code is part of the LHOTSE package. If you have wrapped other (public or proprietary) optimization code in LHOTSE, please do contact the LHOTSE developers. There are also implementations of several non-linear optimizers: - Conjugate Gradients (ConjGrad) - BFGS Quasi-Newton (QuasiNewton) There are double loop extensions of those. These implementations are much more configurable than professional code (you can "hack them" to make them do what you want to do). All non-linear optimizers communicate with their objective function through the same interface CritFunc (or CritFuncInnerState for double loop variants). If you have coded your objective implementing CritFunc, you can use any of the LHOTSE optimizers without any change. CritFunc also supports optional features (protocolling, event triggers, pair optimization, ...), which are supported by specific optimizers. LinearCG implements the linear conjugate gradients algorithm for solving large symmetric positive definite linear systems. The interface towards the system matrix is given by MatVectMult, allowing complete freedom as to how this matrix is represented. Lanczos (and subclasses) inmplement the Lanczos algorithm, with complete as well as selective orthogonalization. This is useful if you need this algorithm in non-standard situations. For standard problems, professional code such as ARPACK should be used. OneDimSolver contains methods for root finding of scalar functions. 7.9) The Module 'data' Datasets in Machine Learning and Statistics come in many formats, contain data items of very different kinds, and are represented at running time in very different ways, possibly using auxiliary files if the complete set cannot be kept in memory. The design of the data module is an attempt to create a generic interface which meets most of these needs. We do not provide any details here. The interested developer should look at the main function code for project ivmsing (src/main_ivmsing.cc), at src/loadDataSet.h, and at the class header files of the data module. For an example of how to provide a generic interface towards algorithms, look at the covariance function hierarchy (CovarFunc children) in the gp module. 7.10) The Module 'gp' gp stands for Gaussian process, and in this module, general classes useful for GP methods are collected. The most important hierarchy here is given by the children of CovarFunc, representing covariance functions (aka kernels). Children of CovarFunc are general covariance functions. Children of OptimCovarFunc are covariance functions, for which derivatives w.r.t. some of their hyper- parameters can be computed. The latter follow a convention involving a precomputation matrix, which for the cases currently given in the hierarchy can save a lot of running time. The setup should be clear from the OptimCovarFunc header comments, and from the classes already implemented (GaussianCF, MLPerceptCF, SimpleMaternCF). In order to implement a new isotropic or geometrically anisotropic covariance function, the InnerLinearCF class is very helpful. See GaussianCF, MLPerceptCF, SimpleMaternCF for how to use it. Note that some of the implemented kernels do accept input vectors in either a dense or a sparse format. The latter may save time, and certainly does save memory. The implementation uses datatype support from the 'data' module. If you implemented kernels not present in LHOTSE, please do contact the LHOTSE developers. 7.11) Other Modules 'argblocks' contains some ArgBlock implementations useful in the context of control files and CommandParser. 'exper' contains some classes for doing generic experiments. 'quad' contains some code for quadrature (univariate and multivariate). Furthermore, a hierarchy of likelihood functions derives from SingleLikehoodFactor (these are used in code for variational approximations, EP, and the ivmsing project). 'rando' contains a simple pseudo-random number generator (GeneratorKISS) and some methods to generate from common non-uniform distributions (Random). Wrapping a larger free package is planned. Whatever you do, make sure to NEVER use the built-in PRN generator of C! 'specfun' contains code to evaluate some special functions (Specfun), mainly by calling GSL functions. Both 'rando' and especially 'specfun' are fairly incomplete at the moment.