src/simple_tools/DEBIAN/changelog

2014-12-06 Ronan Keryell <Ronan.Keryell@silkan.com>

 * Par4All 1.4.6

   * Integrate latest PIPS version for recent Linux version

   * There are stills issues with headers of modern GNU libc


2014-02-03 Ronan Keryell <Ronan.Keryell@silkan.com>

 * Par4All 1.4.5

   * Integrate ASTRAD support for the SIMILAN collaborative research project

   * Add support to build and publish Par4All for Ubuntu in a
	 VirtualBox virtual machine

   * Integrate WWW site of the project in the repository itself

   * Move the WWW site from WordPress to GitHub Pages based on ReST format
	 via Sphinx


2014-01-14 Ronan Keryell <Ronan.Keryell@silkan.com>

 * Par4All 1.4.4

   * Update the include recovery for gcc-4.8 and recent Debian & Ubuntu

   * Improve documentation

   * Improve compilation documentation for OpenSuse

   * Fix const-ness issues in Par4All Accel back-end

   * Update to latest PIPS version


2012-11-29 Ronan Keryell <Ronan.Keryell@wild-systems.com>

 * Par4All 1.4.3

   * XML output of some memory access information for OpenGPU Project.

     New --spear-xml option taking as input an XML description of
     functions to transform and generate XML descriptors

   * Improve outlining

   * Fix documentation

   * Improve compilation documentation for OpenSuse

   * Fix post-processing in the OpenCL back-end for Thales TRT in SMECY
     project


2012-06-22 Ronan Keryell <Ronan.Keryell@wild-systems.com>

 * Par4All 1.4.2

   * Change Stars-PM demo Makefile to avoid nvcc choking on recent FFTW3 headers

   * Fix normalization for Chirp code from SMECY project

   * Make different privatization phases easier to select

   * Improved complexity analysis for SMECY project (cluster 3)

   * Added a phase limit_parallelism_using_complexity to
     automatically skip the parallelization of non enough
     compute-intensive loops

   * Add a git-svn-switch script to ease upstream SVN URL changes

   * Fix scalarization to avoid scalarizing a local variable used in an
     array subscript

   * New taskify phase to begin producing StarPU tasks for MediaGPU project

   * Take into account CUDA 3.0 and 3.5 architecture

   * Take into account the new name of HPC Project: SILKAN

   * Add an OpenCL target to the benchmarks for time measurements in Mehdi
     Amini's PhD thesis

   * Choose OpenCL workgroup allocation instead of the vendors


2012-05-17 Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.4.1

   * Update package dependencies for Ubuntu 12.04 and Debian/testing
     & /unstable.

   * Make Par4All relocatable after compile-time (.deb package and .tar.gz)

   * Improve the coding rules guide

   * Update dependent package list for Ubuntu 12.04 and to be able to run
     Stars-PM demo

   * Do not add "register" qualifier in scalarization phase

   * The license of the PIPS C3 Linear library changes from GPLv3 to LGPL


2012-05-04 Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.4

   * Outline kernel callees with a prefix to avoid conflicts

   * New kernel generation algorithm for p4a: no longer inline callees in
     kernels but call device functions instead

   * Apply some callees unfolding after array linearization for non-C99
     VLA devices

   * nvcc compilation chain can use C++ compiler specified by user

   * Deal with OpenCL pointer in kernel callees

   * Add another loop fusion algorithm based on array regions

   * Fix --select-module and other module filtering to cope new kernel
     generation algorithm

   * Improve documentation

   * Liveness analysis

   * Generated variables are now prefixed p4a_ instead of P4A_ reserved
     for the runtime itself

   * More pointer values, effects and regions interaction

   * New PHPiPS interface to ease cloud interfacing

   * Loop nests with some variable declaration with initializations are
     not perfect loop nests

   * New --kernel-unroll option to unroll loops inside kernels

   * Display OpenCL kernel compilation messages in the case of errors
     or in debug mode

   * Improve memory effects and regions analysis in the case of casts

   * New privatization phase dealing even with global variables based on
     liveness analysis

   * New pointer values

   * Improvement of the coding rules guide

   * Improve packaging scripts

   * Improve integration script

   * Add OS distribution in error report email


2012-02-09 Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.3.2

   * Improved redundancy elimination in Linear

   * Add some missing intrinsics for Fortran

   * Do not generate loops for degenerated iteration domains

   * Fix effects computation

   * Improve for- to do-loop translation

   * Do not privatize global variable in simple privatization

   * New experiment -pointer-analysis option to take into account pointer
     analysis

   * Fix interface with PoCC

   * Add a --pocc-options to p4a

   * Improve transformers computation

   * Improve semantics analysis in the case of unsigned loop index

   * Fix bug in timing method in Accel runtime

   * Check for Accel runtime non-initialization

   * Test if a CUDA device is available in CUDA mode for Accel runtime

   * Add a lot of information in CUDA mode for Accel runtime

   * Improve documentation

   * Control simplification deals with do-while loops

   * Fix bugs in loop-fusion

   * Now clean_up_sequence can deal more gracefully with variable scoping

   * Improve sorting in Linear to have more repetitive behaviour

   * Add detection of 1-trip while-loops

   * Fix bugs in isolate_statement

   * Improve outliner in the case of a new compilation unit

   * Add isnan C intrinsic

   * Manage by overloading already existing OpenMP-pragma with the new
     generated ones

   * Fix sharing in the case of cast in a function call

   * --cuda-cc now accepts 2.1 version

   * Add support for AMD fp64 OpenCL extension in Accel runtime, if available

   * Add debug kernel call information in OpenCL Accel runtime

   * p4a can accept now Fortran files with .F extension


2012-01-18  Serge Guelton <serge.guelton@hpc-project.com>

 * Par4All 1.3.1

   * Check for CUDA memory allocation errors

   * Restore PIPS properties after a PyPS exception

   * Manage better pointers on 1D array on GPU

   * Fix loop bound generation in the presence of unsigned index types

   * Fix p4a --report

   * Fix regressions in examples because of PIPS validation
     compatibility.

   * Stars-PM example can work in OpenCL and OpenGL

   * New --no-pointer-aliasing option in p4a

   * --fine option is now --fine-grain. Coarse grain is again the
     default parallelization.

   * Remove parasitic kernel launchers from GPU code with static
     functions

   * Put libraries at the end of link options since it chokes some
     linkers.

   * Fix bugs in loop fusion

   * Fix bugs in PIPS linear around overflow handling

   * Improve array linearization to deal with pointers on array of
     structs, VLA, arrays with static size and others, skip local arrays...

   * Fix bugs in statement isolation, accept structs, deal better
     with partial arrays...

   * Fix bugs in outliner

   * Fix bugs in GPU-ify

   * Describe better the versions in scripts and validation

   * Update package list for recent Ubuntu

   * Update installation guide

   * p4a_scpp is now more user friendly

   * Better linear constraints, improve normalization

   * Improve redundancy reduction in Linear

   * Improve ipyps portability

   * Improve OpenMP reduction pragma when several reductions at the
     same time

   * Fix bug in localize_declaration

   * Improve simplify control

   * Improve p4a_git integration script

   * Improve organization of the Par4All download server directories

   * Fix typo in stub broker

   * Fix P4A Accel runtime to deal with constants in OpenCL kernel
     invocation

   * Differentiate more OpenMP and GPU compilation flows

   * Restructure Par4All Accel Runtime

   * Avoid launching kernels with no iteration

   * Accept multiple source files with same base name

   * More resilient with spaces in path names

   * Improve effects for structures

   * Keep more comments

   * More tolerant with __asm(...)

   * Improve transformers

   * Improve conflict testing in dependence graph


2011-11-11  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.3

   * Huge improvements in compilation speed of big linear source codes
     with a lot of variables to cope with the output of the Scilab-to-C
     compiler from HPC Project. Use aggressive caching of internal
     structures and remove many memory leaks. Mainly in def-use analyses,
     effects...

   * Can now generate OpenCL code.

   * Generate up to 3D kernel for CUDA 2.0+ architectures (Fermi...)

   * Now compile for Fermi architecture with --cuda. Ask for more cache
     than shared memory

   * Improved debug information on GPU target about kernels. Now P4A_DEBUG
     is an environment variable and no longer a compilation flag

   * New block of threads layout for CUDA

   * Use loop fusion both for OpenMP and Accel

   * Use fine grain parallelization before coarse grain for Accel

   * Many many bug fixes

   * Generate better types for variables with aggregated type in kernels

   * Added a phase to promote sequential code to GPU kernel code

   * Improved loop fusion

   * Do not parallelize loops with control side effects (return,
     abort()...)

   * Better handling of const

   * Cope with in-lining of functions with lacking return

   * New quick scalarization phase

   * Improved variable localization/privatization

   * Improved #include recovery

   * New PyPS-made unfolding of functions

   * Subtler linearization of arrays in GPU kernel that cannot be C99

   * Fix bugs in the GPU communication optimization

   * Accept trivial (idempotent) cast in C code

   * Function calls can now happen in declarations

   * Improved PIPS infrastructure

   * Added automatic benchmarking and graphics generation to speed-up
     scientific article generation (see article at LCPC this year). :-)
     Can also run with PGI and HMPP compilers

   * Register variables cannot conflict

   * Improved C switch() desugaring

   * Can now also accept less standard complex double instead of double
     complex

   * Par4All building infrastructure improved to be able to easily skip
     some wrong commit from upstream projects (PIPS...)

   * Can compile on Fedora 15

   * Improved documentation

   * Improved examples


2011-07-15  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.2.1

  * Unstable version, unreleased. Look at version 1.3 for the real changes.


2011-07-07  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.2

  * This version targets mainly the Wild Cruncher, a parallelizing
    environment from HPC Project for Scilab programs. Par4All is used
    to parallelize the output of the Scilab-to-C compiler from HPC
    Project

  * Added in examples/Benchmarks some benchmark examples we use in
    our publications so that anybody can verify Par4All performance on
    them with own hardware

  * Improved support for CUDA atomic update for reductions

  * Better deal with scalars in GPU parallelization

  * Improved memory effect analysis

  * Fixed outlining for kernel generation with scalar parameters

  * Improved loop fusion, deal with local variable declarations

  * Improved array scalarization

  * Make package publication more resilient to network failures

  * Fixed GPU code generation for non rectangular iteration spaces

  * Fixed communication optimization between GPU and CPU

  * Added support for CEA SCMP embedded system

  * Installation directory can now be changed also after a first
    installation

  * Use the broker concept to deal with stubs to manage with non or
    already parallelized libraries

  * Now install LICENSE.txt

  * Updated to new PyPS interface

  * GPU kernel can be outlined in separated source files on demand,
    for OpenCL or use a separate non C99 compiler (CUDA nvcc), at
    kernel, launcher, wrapper grain...

  * Fixed compilation flags in PIPS/linear to avoid recompilation to
    fail when an API changes too much


2011-04-20  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.1.2

  * Improved support for Scilab compiler output

  * Some work on effects in PIPS

  * p4a -g now put device code in debug too.

  * Improved Stars-PM example

    Added nVidia SM11 GPU, FFT timing...

    Added a PGI Accelerator version for comparison

  * Fixed wrong Makefile targets for Jacobi demo

    Bug reported by Richard Membarth from Universität Erlangen.


2011-04-12  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.1.1

  * Added support for CEA SCMP task dataflow machine (European
    project ARTEMIS SCALOPES)

  * Improved GPU kernel generation for loop nests with complex
    declarations.

    Bug reported by Richard Membarth from Universität Erlangen.

  * Added new options to apply PIPS transformations in the Par4All
    compilation transit (--apply-before-parallelization...)

  * Added a programming guide describing best practices to get
    better performance with Par4All


2011-03-01  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.1

  * C99 declarations anywhere in a block and in C99 for-loops are now
    supported.

  * Fixed code generation for C99 declarations.

  * New --apply-before-parallelization option to apply phases before
    parallelization.

  * Improved compilation speed.

  * No longer rely on Python 3.x since there where some issues on
    some systems to cope with both 2.y and 3.x versions.

  * Fixed encoding issues.


2011-02-03  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.0.5

  * Par4All 1.0.5 fixes a bug when a code to be kernelized uses some
    global variables.

    Thanks to Sarnath Kannan for this bug report. It should work now
    on common cases.

  * Prototype on lazy CUDA communication optimizations to remove
    redundant host-accelerator communications.

  * Fixed a space iteration transposition bug in accelerator mode
    that was killing performances. But right now, better results are
    obtained with 2D kernels.

  * C99 for(int i;...;...) are now accepted.

  * Can generate kernels with less perfectly nested loops.

  * Updated examples directory to new options and communication
    optimizations

  * Better error and warning messages.

  * Script cleaning by using Python module names everywhere.


2010-11-22  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.0.4

  * Par4All 1.0.4 introduces a new P4A Accel runtime for OpenMP and CUDA.

  * In previous months, PIPS and PyPS has evolved a lot, specially
    in the code generation for various accelerators. This version
    try to cope with these evolutions.

  * Added the Stars-pm cosmological N-body simulation program as an
    example.

  * Now the runtime can deal with subarray transfers between the
    host and the accelerator, up to 4D arrays. Well right now the
    phases chosen in PIPS do not use them yet.

  * The code generation for non-C99 CUDA is more robust.


2010-11-05  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.0.3

  * Par4All 1.0.3 is a base version to be used from Windows in a
    Wild Cruncher..

  * NVIDIA_GPU_Computing_SDK is no longer needed to produce CUDA
    code for nVidia GPU.

  * Can now compile on Fedora.

  * New option to use simpler #include recovery.

  * Can move produced files to a given directory.

  * Improved documentation.

  * New options for Python code injection.

  * Better error messages.


2010-09-16  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.0.2

  * Clean up examples and their README files.

      Added an option to run Hyantes example in single precision
      (for demos on laptops with small GPU).

  * Updated to current PIPS and PyPS version.

  * --accel without --cuda works again for GPU emulation in OpenMP.

  * Do not use new PyPS #include recovery.

  * Recover any #include, not only standard ones (useful for Par4All
    Scilab).

  * Do not use PIPS capply by default when running phases.

  * Added path normalization in p4a_setup so that configure can
    take relative path.

  * p4a_validate post-processing utility updated to cope with new
    PIPS validation output.


2010-07-23  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.0.1

  * Corrected library name issue for libcutil on 32 bit x86.

  * Fixed bugs into array_to_pointer.

  * Corrected behaviour of NULL for nvcc.

  * Fixed usage of limit_nested_parallelism in p4a_process.py.

  * Better error message display.

  * Fixed OpenMP prettyprint of C parallel loops with label.

  * Deals with clock() intrinsic function.


2010-07-16  Ronan Keryell <Ronan.Keryell@hpc-project.com>

 * Par4All 1.0

  * Initial version of Par4All released.


%%% Local Variables:
%%% ispell-local-dictionary: "american"
%%% End: