Ubuntu Manpage: r.in.xyz - Create a raster map from an assemblage of many coordinates using univariate statistics.

NAME

       r.in.xyz  - Create a raster map from an assemblage of many coordinates using univariate statistics.

KEYWORDS

       raster, import, LIDAR

SYNOPSIS

       r.in.xyz
       r.in.xyz help
       r.in.xyz  [-sgi]  input=name  output=name  [method=string]   [type=string]   [fs=character]   [x=integer]
       [y=integer]    [z=integer]    [zrange=min,max]     [zscale=float]     [percent=integer]     [pth=integer]
       [trim=float]   [--overwrite]  [--verbose]  [--quiet]

   Flags:
       -s
           Scan data file for extent then exit

       -g
           In scan mode, print using shell script style

       -i
           Ignore broken lines

       --overwrite
           Allow output files to overwrite existing files

       --verbose
           Verbose module output

       --quiet
           Quiet module output

   Parameters:
       input=name
           ASCII file containing input data (or "-" to read from stdin)

       output=name
           Name for output raster map

       method=string
           Statistic to use for raster values
           Options: n,min,max,range,sum,mean,stddev,variance,coeff_var,median,percentile,skewness,trimmean
           Default: mean

       type=string
           Storage type for resultant raster map
           Options: CELL,FCELL,DCELL
           Default: FCELL

       fs=character
           Field separator
           Special characters: newline, space, comma, tab
           Default: |

       x=integer
           Column number of x coordinates in input file (first column is 1)
           Default: 1

       y=integer
           Column number of y coordinates in input file
           Default: 2

       z=integer
           Column number of data values in input file
           Default: 3

       zrange=min,max
           Filter range for z data (min,max)

       zscale=float
           Scale to apply to z data
           Default: 1.0

       percent=integer
           Percent of map to keep in memory
           Options: 1-100
           Default: 100

       pth=integer
           pth percentile of the values
           Options: 1-100

       trim=float
           Discard  percent of the smallest and  percent of the largest observations
           Options: 0-50

DESCRIPTION

       The  r.in.xyz  module  will  load  and bin ungridded x,y,z ASCII data into a new raster map. The user may
       choose from a variety of statistical methods in creating the new  raster.  Gridded  data  provided  as  a
       stream of x,y,z points may also be imported.

       r.in.xyz is designed for processing massive point cloud datasets, for example raw LIDAR or sidescan sonar
       swath  data.  It  has  been tested with datasets as large as tens of billion of points (705GB in a single
       file).

       Available statistics for populating the raster are:
                   | n            | number of points in cell
                   | min          | minimum value of points in cell
                   | max          | maximum value of points in cell
                   | range        | range of points in cell
                   | sum          | sum of points in cell
                   | mean         | average value of points in cell
                   | stddev       | standard deviation of points in cell
                   | variance     | variance of points in cell
                   | coeff_var    | coefficient of variance of points in cell
                   | median       | median value of points in cell
                   | percentile | pth percentile of points in cell
                   | skewness     | skewness of points in cell
                   | trimmean     | trimmed mean of points in cell

                     Variance and derivatives use the biased estimator (n). [subject to change]

                     Coefficient of variance is given in percentage and defined as (stddev/mean)*100.

NOTES

Gridded data
If data is known to be on a regular grid r.in.xyz can reconstruct the map perfectly as long as some care
is taken to set up the region correctly and that the data's native map projection is used. A typical
method would involve determining the grid resolution either by examining the data's associated
documentation or by studying the text file. Next scan the data with r.in.xyz's -s (or -g) flag to find
the input data's bounds. GRASS uses the cell-center raster convention where data points fall within the
center of a cell, as opposed to the grid-node convention. Therefore you will need to grow the region out
by half a cell in all directions beyond what the scan found in the file. After the region bounds and
resolution are set correctly with g.region, run r.in.xyz using the n method and verify that n=1 at all
places. r.univar can help. Once you are confident that the region exactly matches the data proceed to
run r.in.xyz using one of the mean, min, max, or median methods. With n=1 throughout, the result should
be identical regardless of which of those methods are used.

Memory use
While the input file can be arbitrarily large, r.in.xyz will use a large amount of system memory for
large raster regions (10000x10000). If the module refuses to start complaining that there isn't enough
memory, use the percent parameter to run the module in several passes. In addition using a less precise
map format (CELL [integer] or FCELL [floating point]) will use less memory than a DCELL [double precision
floating point] output map. Methods such as n, min, max, sum will also use less memory, while stddev,
variance, and coeff_var will use more. The aggregate functions median, percentile, skewness and trimmed
mean will use even more memory and may not be appropriate for use with arbitrarily large input files.

The default map type=FCELL is intended as compromise between preserving data precision and limiting
system resource consumption. If reading data from a stdin stream, the program can only run using a
single pass.

Setting region bounds and resolution
You can use the -s scan flag to find the extent of the input data (and thus point density) before
performing the full import. Use g.region to adjust the region bounds to match. The -g shell style flag
prints the extent suitable as parameters for g.region. A suitable resolution can be found by dividing
the number of input points by the area covered. e.g.
wc -l inputfile.txt
g.region -p
# points_per_cell = n_points / (rows * cols)
g.region -e
# UTM location:
# points_per_sq_m = n_points / (ns_extent * ew_extent)
# Lat/Lon location:
# points_per_sq_m = n_points / (ns_extent * ew_extent*cos(lat) * (1852*60)^2)

If you only intend to interpolate the data with r.to.vect and v.surf.rst, then there is little point to
setting the region resolution so fine that you only catch one data point per cell -- you might as well
use "v.in.ascii -zbt" directly.

Filtering
Points falling outside the current region will be skipped. This includes points falling exactly on the
southern region bound. (to capture those adjust the region with "g.region s=s-0.000001"; see g.region)

Blank lines and comment lines starting with the hash symbol (#) will be skipped.

The zrange parameter may be used for filtering the input data by vertical extent. Example uses might
include preparing multiple raster sections to be combined into a 3D raster array with r.to.rast3, or for
filtering outliers on relatively flat terrain.

In varied terrain the user may find that min maps make for a good noise filter as most LIDAR noise is
from premature hits. The min map may also be useful to find the underlying topography in a forested or
urban environment if the cells are over sampled.

The user can use a combination of r.in.xyz output maps to create custom filters. e.g. use r.mapcalc to
create a mean-(2*stddev) map. [In this example the user may want to include a lower bound filter in
r.mapcalc to remove highly variable points (small n) or run r.neighbors to smooth the stddev map before
further use.]

Reprojection
If the raster map is to be reprojected, it may be more appropriate to reproject the input points with
m.proj or cs2cs before running r.in.xyz.

Interpolation into a DEM
The vector engine's topographic abilities introduce a finite memory overhead per vector point which will
typically limit a vector map to approximately 3 million points (~ 1750^2 cells). If you want more, use
the r.to.vect -b flag to skip building topology. Without topology, however, all you'll be able to do with
the vector map is display with d.vect and interpolate with v.surf.rst. Run r.univar on your raster map
to check the number of non-NULL cells and adjust bounds and/or resolution as needed before proceeding.

Typical commands to create a DEM using a regularized spline fit:
r.univar lidar_min
r.to.vect -z feature=point in=lidar_min out=lidar_min_pt
v.surf.rst layer=0 in=lidar_min_pt elev=lidar_min.rst

EXAMPLE

       Import the Jockey's Ridge, NC, LIDAR dataset, and process into a clean DEM:
           # scan and set region bounds
         r.in.xyz -s fs=, in=lidaratm2.txt out=test
         g.region n=35.969493 s=35.949693 e=-75.620999 w=-75.639999
         g.region res=0:00:00.075 -a
           # create "n" map containing count of points per cell for checking density
         r.in.xyz in=lidaratm2.txt out=lidar_n fs=, method=n zrange=-2,50
           # check point density [rho = n_sum / (rows*cols)]
         r.univar lidar_n | grep sum
           # create "min" map (elevation filtered for premature hits)
         r.in.xyz in=lidaratm2.txt out=lidar_min fs=, method=min zrange=-2,50
           # zoom to area of interest
         g.region n=35:57:56.25N s=35:57:13.575N w=75:38:23.7W e=75:37:15.675W
           # check number of non-null cells (try and keep under a few million)
         r.univar lidar_min | grep '^n:'
           # convert to points
         r.to.vect -z feature=point in=lidar_min out=lidar_min_pt
           # interpolate using a regularized spline fit
         v.surf.rst layer=0 in=lidar_min_pt elev=lidar_min.rst
           # set color scale to something interesting
         r.colors lidar_min.rst rule=bcyr -n -e
           # prepare a 1:1:1 scaled version for NVIZ visualization (for lat/lon input)
         r.mapcalc "lidar_min.rst_scaled = lidar_min.rst / (1852*60)"
         r.colors lidar_min.rst_scaled rule=bcyr -n -e

TODO

                      Support for multiple map output from a single run.
                     method=string[,string,...] output=name[,name,...]

BUGS

                      n map sum can be ever-so-slightly more than `wc -l` with e.g. percent=10 or less.
                     Cause unknown.

                      n map percent=100 and percent=xx maps differ slightly (point  will  fall  above/below  the
                     segmentation line)
                     Investigate with "r.mapcalc diff=bin_n.100 - bin_n.33" etc.
                     Cause unknown.

                      "nan" can leak into coeff_var maps.
                     Cause unknown. Possible work-around: "r.null setnull=nan"
       If you encounter any problems (or solutions!) please contact the GRASS Development Team.

AUTHORS

       Hamish Bowman

       Department of Marine Science
       University of Otago
       New Zealand
       Extended  by  Volker Wichmann to support the aggregate functions median, percentile, skewness and trimmed
       mean.

       Last changed: $Date: 2012-06-20 02:33:07 -0700 (Wed, 20 Jun 2012) $

       Full index

       © 2003-2013 GRASS Development Team

GRASS 6.4.3                                                                                     r.in.xyz(1grass)