xenial (1) r.in.xyz.1grass.gz

Provided by: grass-doc_7.0.3-1build1_all bug

NAME

       r.in.xyz  - Creates a raster map from an assemblage of many coordinates using univariate statistics.

KEYWORDS

       raster, import, conversion, aggregation, ASCII, LIDAR

SYNOPSIS

       r.in.xyz
       r.in.xyz --help
       r.in.xyz   [-sgi]   input=name   output=name    [method=string]    [type=string]    [separator=character]
       [x=integer]     [y=integer]     [z=integer]      [skip=integer]      [zrange=min,max]      [zscale=float]
       [value_column=integer]      [vrange=min,max]      [vscale=float]      [percent=integer]     [pth=integer]
       [trim=float]   [--overwrite]  [--help]  [--verbose]  [--quiet]  [--ui]

   Flags:
       -s
           Scan data file for extent then exit

       -g
           In scan mode, print using shell script style

       -i
           Ignore broken lines

       --overwrite
           Allow output files to overwrite existing files

       --help
           Print usage summary

       --verbose
           Verbose module output

       --quiet
           Quiet module output

       --ui
           Force launching GUI dialog

   Parameters:
       input=name [required]
           ASCII file containing input data (or "-" to read from stdin)

       output=name [required]
           Name for output raster map

       method=string
           Statistic to use for raster values
           Options: n, min, max, range, sum, mean, stddev, variance, coeff_var,  median,  percentile,  skewness,
           trimmean
           Default: mean

       type=string
           Storage type for resultant raster map
           Options: CELL, FCELL, DCELL
           Default: FCELL

       separator=character
           Field separator
           Special characters: pipe, comma, space, tab, newline
           Default: pipe

       x=integer
           Column number of x coordinates in input file (first column is 1)
           Default: 1

       y=integer
           Column number of y coordinates in input file
           Default: 2

       z=integer
           Column number of data values in input file
           If  a separate value column is given, this option refers to the z-coordinate column to be filtered by
           the zrange option
           Default: 3

       skip=integer
           Number of header lines to skip at top of input file
           Default: 0

       zrange=min,max
           Filter range for z data (min,max)

       zscale=float
           Scale to apply to z data
           Default: 1.0

       value_column=integer
           Alternate column number of data values in input file
           If not given (or set to 0) the z-column data is used
           Default: 0

       vrange=min,max
           Filter range for alternate value column data (min,max)

       vscale=float
           Scale to apply to alternate value column data
           Default: 1.0

       percent=integer
           Percent of map to keep in memory
           Options: 1-100
           Default: 100

       pth=integer
           pth percentile of the values
           Options: 1-100

       trim=float
           Discard <trim> percent of the smallest and <trim> percent of the largest observations
           Options: 0-50

DESCRIPTION

       The r.in.xyz module will load and bin ungridded x,y,z ASCII data into a new  raster  map.  The  user  may
       choose  from  a  variety  of  statistical  methods in creating the new raster. Gridded data provided as a
       stream of x,y,z points may also be imported.

       Please note that the current region extents and resolution are used  for  the  import.  It  is  therefore
       recommended  to  first use the -s flag to get the extents of the input points to be imported, then adjust
       the current region accordingly, and only then proceed with the actual import.

       r.in.xyz is designed for processing massive point cloud datasets, for example raw LIDAR or sidescan sonar
       swath  data.  It  has  been tested with datasets as large as tens of billion of points (705GB in a single
       file).

       Available statistics for populating the raster are:

       n                                                            number of points in cell

       min                                                          minimum value of points in cell

       max                                                          maximum value of points in cell

       range                                                        range of points in cell

       sum                                                          sum of points in cell

       mean                                                         average value of points in cell

       stddev                                                       standard deviation of points in cell

       variance                                                     variance of points in cell

       coeff_var                                                    coefficient of variance of points in cell

       median                                                       median value of points in cell

       percentile                                                   pth percentile of points in cell

       skewness                                                     skewness of points in cell

       trimmean                                                     trimmed mean of points in cell

           •   Variance and derivatives use the biased estimator (n). [subject to change]

           •   Coefficient of variance is given in percentage and defined as (stddev/mean)*100.

       It is also possible to bin  and  store  another  data  column  (e.g.  backscatter)  while  simultaneously
       filtering and scaling both the data column values and the z range.

NOTES

   Gridded data
       If  data is known to be on a regular grid r.in.xyz can reconstruct the map perfectly as long as some care
       is taken to set up the region correctly and that the data’s native map  projection  is  used.  A  typical
       method  would  involve  determining  the  grid  resolution  either  by  examining  the  data’s associated
       documentation or by studying the text file. Next scan the data with r.in.xyz’s -s (or -g)  flag  to  find
       the  input  data’s bounds. GRASS uses the cell-center raster convention where data points fall within the
       center of a cell, as opposed to the grid-node convention. Therefore you will need to grow the region  out
       by  half  a  cell  in  all directions beyond what the scan found in the file. After the region bounds and
       resolution are set correctly with g.region, run r.in.xyz using the n method and verify that  n=1  at  all
       places.   r.univar  can  help. Once you are confident that the region exactly matches the data proceed to
       run r.in.xyz using one of the mean, min, max, or median methods. With n=1 throughout, the  result  should
       be identical regardless of which of those methods are used.

   Memory use
       While  the  input  file  can  be arbitrarily large, r.in.xyz will use a large amount of system memory for
       large raster regions (10000x10000).  If the module refuses to start complaining that there  isn’t  enough
       memory,  use the percent parameter to run the module in several passes.  In addition using a less precise
       map format (CELL [integer] or FCELL [floating point]) will use less memory than a DCELL [double precision
       floating  point]  output  map.  Methods such as n, min, max, sum will also use less memory, while stddev,
       variance, and coeff_var will use more.  The aggregate functions median, percentile, skewness and  trimmed
       mean will use even more memory and may not be appropriate for use with arbitrarily large input files.

       The  default  map  type=FCELL  is  intended  as compromise between preserving data precision and limiting
       system resource consumption.  If reading data from a stdin stream, the  program  can  only  run  using  a
       single pass.

   Setting region bounds and resolution
       You  can  use  the  -s  scan  flag  to  find the extent of the input data (and thus point density) before
       performing the full import. Use g.region to adjust the region bounds to match. The -g  shell  style  flag
       prints  the  extent  suitable as parameters for g.region.  A suitable resolution can be found by dividing
       the number of input points by the area covered. e.g.
       wc -l inputfile.txt
       g.region -p
       # points_per_cell = n_points / (rows * cols)
       g.region -e
       # UTM location:
       # points_per_sq_m = n_points / (ns_extent * ew_extent)
       # Lat/Lon location:
       # points_per_sq_m = n_points / (ns_extent * ew_extent*cos(lat) * (1852*60)^2)

       If you only intend to interpolate the data with r.to.vect and v.surf.rst, then there is little  point  to
       setting  the  region  resolution so fine that you only catch one data point per cell -- you might as well
       use "v.in.ascii -zbt" directly.

   Filtering
       Points falling outside the current region will be skipped. This includes points falling  exactly  on  the
       southern region bound.  (to capture those adjust the region with "g.region s=s-0.000001"; see g.region)

       Blank lines and comment lines starting with the hash symbol (#) will be skipped.

       The  zrange  parameter  may  be  used for filtering the input data by vertical extent. Example uses might
       include preparing multiple raster sections to be combined into a 3D raster array with r.to.rast3, or  for
       filtering outliers on relatively flat terrain.

       In  varied  terrain  the  user may find that min maps make for a good noise filter as most LIDAR noise is
       from premature hits. The min map may also be useful to find the underlying topography in  a  forested  or
       urban environment if the cells are over sampled.

       The  user  can  use a combination of r.in.xyz output maps to create custom filters. e.g. use r.mapcalc to
       create a mean-(2*stddev) map. [In this example the user may want to  include  a  lower  bound  filter  in
       r.mapcalc  to  remove highly variable points (small n) or run r.neighbors to smooth the stddev map before
       further use.]

   Alternate value column
       The value_column parameter can be used in specialized cases when you want to filter by  z-range  but  bin
       and store another column’s data. For example if you wanted to look at backscatter values between 1000 and
       1500 meters elevation. This is particularly useful when using r.in.xyz to prepare depth slices for  a  3D
       raster  &#8212;  the  zrange  option  defines  the  depth  slice but the data values stored in the voxels
       describe an additional dimension. As with the z column, a filtering  range  and  scaling  factor  may  be
       applied.

   Reprojection
       If  the  raster  map  is to be reprojected, it may be more appropriate to reproject the input points with
       m.proj or cs2cs before running r.in.xyz.

   Interpolation into a DEM
       The vector engine’s topographic abilities introduce a finite memory overhead per vector point which  will
       typically  limit  a  vector map to approximately 3 million points (~ 1750^2 cells). If you want more, use
       the r.to.vect -b flag to skip building topology. Without topology, however, all you’ll be able to do with
       the  vector  map is display with d.vect and interpolate with v.surf.rst.  Run r.univar on your raster map
       to check the number of non-NULL cells and adjust bounds and/or resolution as needed before proceeding.

       Typical commands to create a DEM using a regularized spline fit:
       r.univar lidar_min
       r.to.vect -z type=point in=lidar_min out=lidar_min_pt
       v.surf.rst in=lidar_min_pt elev=lidar_min.rst

   Import of x,y,string data
       r.in.xyz is expecting numeric values as z column. In order to perform a occurrence count  operation  even
       on x,y data with non-numeric attribute(s), the data can be imported using either the x or y coordinate as
       a fake z column for method=n (count number of points per grid cell), the z values are ignored anyway.

EXAMPLE

       Import the Jockey’s Ridge, NC, LIDAR dataset (compressed file "lidaratm2.txt.gz"), and process it into  a
       clean DEM:
       # scan and set region bounds
       r.in.xyz -s separator="," in=lidaratm2.txt out=test
       g.region n=35.969493 s=35.949693 e=-75.620999 w=-75.639999
       g.region res=0:00:00.075 -a
       # create "n" map containing count of points per cell for checking density
       r.in.xyz in=lidaratm2.txt out=lidar_n separator="," method=n zrange=-2,50
       # check point density [rho = n_sum / (rows*cols)]
       r.univar lidar_n | grep sum
       # create "min" map (elevation filtered for premature hits)
       r.in.xyz in=lidaratm2.txt out=lidar_min separator="," method=min zrange=-2,50
       # set computational region to area of interest
       g.region n=35:57:56.25N s=35:57:13.575N w=75:38:23.7W e=75:37:15.675W
       # check number of non-null cells (try and keep under a few million)
       r.univar lidar_min | grep ’^n:’
       # convert to points
       r.to.vect -z type=point in=lidar_min out=lidar_min_pt
       # interpolate using a regularized spline fit
       v.surf.rst in=lidar_min_pt elev=lidar_min.rst
       # set color scale to something interesting
       r.colors lidar_min.rst rule=bcyr -n -e
       # prepare a 1:1:1 scaled version for NVIZ visualization (for lat/lon input)
       r.mapcalc "lidar_min.rst_scaled = lidar_min.rst / (1852*60)"
       r.colors lidar_min.rst_scaled rule=bcyr -n -e

TODO

           •   Support for multiple map output from a single run.
               method=string[,string,...] output=name[,name,...]
               This can be easily handled by a wrapper script, with the added benefit of it being very simple to
               parallelize that way.

           •   Add two new flags for support for direct binary input from libLAS for LIDAR data and  MB-System’s
               mbio for multi-beam bathymetry data.
               note: See the new r.in.lidar module for this.

KNOWN ISSUES

n map sum can be ever-so-slightly more than `wc -l` with e.g. percent=10 or less.
               Cause unknown.

           •   n  map  percent=100  and  percent=xx  maps  differ  slightly  (point  will  fall  above/below the
               segmentation line)
               Investigate with "r.mapcalc diff = bin_n.100 - bin_n.33" etc.
               Cause unknown.

           •   "nan" can leak into coeff_var maps.
               Cause unknown. Possible work-around: "r.null setnull=nan"
       If you encounter any problems (or solutions!) please contact the GRASS Development Team.

SEE ALSO

        g.region, m.proj, r.fillnulls, r.in.ascii, r.in.lidar,  r3.in.xyz,  r.mapcalc,  r.neighbors,  r.out.xyz,
       r.to.rast3, r.to.vect, r.univar, v.in.ascii, v.surf.rst

        v.lidar.correction, v.lidar.edgedetection, v.lidar.growing, v.outlier, v.surf.bspline

       pv - The UNIX pipe viewer utility

AUTHORS

       Hamish Bowman
        Department of Marine Science
       University of Otago
       New Zealand
       Extended  by  Volker Wichmann to support the aggregate functions median, percentile, skewness and trimmed
       mean.

       Last changed: $Date: 2015-11-20 12:34:12 +0100 (Fri, 20 Nov 2015) $

       Main index | Raster index | Topics index | Keywords index | Full index

       © 2003-2016 GRASS Development Team, GRASS GIS 7.0.3 Reference Manual