Friday, November 27, 2009

Working with big data

One of the main goals of the project is the ability to process large datasets (mainly raster and vector layers, but also database tables). In the way of this goal, we have found several problems in the base libraries (mainly GeoTools, Java Image I/O Ext and Sextante), which had problems when accessing big raster data. Therefore, a part of our work has focused in reinforcing these libraries.

Some of these problems have already been solved:
- The ASCII GRID driver of Java Image I/O Ext is not able to read large files
- Sextante-GeoTools bindings load the whole raster layer in memory before processing it
- Sextante-GeoTools bindings load the whole dbf table in memory before processing it
- Sextante-GeoTools bindings creates the whole dbf table in memory before writing it to disk

However, there are still some pending tasks, all of them are due to GeoTools and Image I/O limitations:
- Raster layers must completely be created in memory before being written to disk.
- No BigTiff support (TIFF files are limited to 4GB)
- No binary grid support

Currently, these remaining tasks are considered to be low priority tasks, but we hope to be able to solve them in future phases of the project. Collaborations are welcome!

Tuesday, November 24, 2009


Most of the data that ETC-LUSI has to process or create in its daily work, are TIFF files using EPSG:3035 projection (Lambert Azimutal Equal Area projection using the ETRS89 datum).

However, we've discovered that ArcGIS (at least on 9.3 version) is not able to properly encode EPSG:3035 using standard GeoTiff tags, so it uses an external auxiliary file (using a proprietary format) to store the spatial reference information.

GeoTools is not able to read these proprietary files, and therefore it refuses to read these (false Geo-)TIFF files. Fortunately, the library can still be convinced to read them by using the DEFAULT_COORDINATE_REFERENCE_SYSTEM Hint during GeoTiff reader creation.

For the moment, we can live with this workaround, but we would be really pleased to see ArcGIS generating correct standard GeoTIFF files for EPSG:3035 projection, as is the reference projection at the European Environment Agency.

Wednesday, October 28, 2009

Project Milestones

The project has quite ambitious goals, but we prefer to reach them step by step. For the moment we have defined 2 milestones:

MILESTONE 1: Sextante as GeoKettle Job Entries.
This milestone will be available for the end of 2009, and its main goals include:
  • The full set of Sextante algorithms available in GeoKettle as Job Entries.
  • New Sextante algorithms: Tabulate Area and Zonal Statistics.
  • Geospatial Input and Output Job Entries based on GeoTools.
  • Improved support for huge raster datasets in GeoTools ArcGrid driver.
  • Improved support for huge raster and table data in Sextante-GeoTools bindings.

MILESTONE 2: Sextante as GeoKettle Transformation Steps.
This milestone will be developed during 2010. Main goals include:
  • A selected set of Sextante algorithms available as Kettle Transformation steps.
  • Raster support for GeoKettle: Raster input and output steps based on GeoTools.
  • Full power of GeoKettle transformation applied to Sextante: distributed execution using computer grids.

The result of each milestone will be a fully functional product, matching the planned goals.
You can find more detailed, technical documentation at the documentation section of the BeETLe project at OSOR.

Tuesday, October 20, 2009

BeETLe Project Introduction

The BeETLe project aims to extend GeoKettle with advanced spatial analysis tools. The main idea involves integrating Sextante into GeoKettle, making GeoKettle a more powerful and versatile application.

This project has been pushed by ETC-LUSI, in coordination with other institutions: Extremadura University (Sextante developers), Laval University (GeoKettle developers), and Junta de AndalucĂ­a (which shares vision and needs with ETC-LUSI).