Big data file share manifest—ArcGIS GeoAnalytics Server | Documentation for ArcGIS Enterprise
Skip To Content

Big data file share manifest

Note:

At ArcGIS Enterprise 10.9.1 or later, it is recommended that you add or edit big data file shares through your portal contents page instead of ArcGIS Server Manager.

Learn more about adding big data file shares in the portal

Big data file shares are registered as a data store through your portal contents page. Big data file shares require a manifest to outline the schema of the data, as well as the fields that represent geometry and time in the dataset. The manifest is automatically generated when you register a big data file share, but you may need to make modifications if there are any changes to your data, or if the manifest generation was unable to determine all the information needed (for example, if the automatically generated manifest did not select the correct field for the geometry or time).

Note:

Editing your big data file share through the manifest is an advanced option. To learn more about applying changes to individual datasets in your big data file share, see Manage big data file shares in portal. To learn about applying a hints file for delimited files, see Hints file.

The manifest is composed of datasets. The number of datasets depends on the number of folders your big data file share contains. In the following example, there are five datasets:

"datasets":[
  {.. dataset1 ..},
  {.. dataset2 ..},
  {.. dataset3 ..},
  {.. dataset4 ..},
  {.. dataset5 ..},
]

Within each dataset, there are five top-level objects that may be applicable. Of these objects, name, format, and schema are required.

{
 "name": "dataset1",
 "format": {},
 "schema": {},
 "geometry": {},
 "time": {}
}

Name

The name object is required and defines the name of the dataset. This must be unique within the manifest.

Format

The format object is required and defines the dataset type and its format.

SyntaxExample
"format" : {
 "type" :  "< delimited | shapefile | orc | parquet >",
 "extension" : "< csv | tsv | shp | orc | parquet >",
 "fieldDelimiter" : "< delimiter >",
 "recordTerminator: "< terminator >",
 "quoteChar":  "< character for quotes>",
 "hasHeaderRow" :  < true | false >, 
 "encoding" : "< encoding format >"
}

Example using a shapefile:

"format" : {
 "type": "shapefile",
 "extension": "shp"
}

Example using a delimited file:

"format" : {
 "type": "delimited",
 "extension": "csv",
 "fieldDelimiter": ",",
 "recordTerminator": "\n", 
 "quoteChar" "\"",
 "hasHeaderRow": true,
 "encoding" : "UTF-8"
}

Description

  • type—A required property that defines the source data. This can either be delimited, shapefile, parquet, or orc.
  • extension—A required property denoting the file extension. For shapefiles, this is shp; delimited files use the file extension of the data (for example, csv or tsv); ORC files use orc; and parquet files use parquet.
  • fieldDelimiter—This is required when type is delimited. This field represents what separates fields in the delimited file.
  • recordTerminator—This is only required when type is delimited. This field specifies what terminates features in the delimited file.
  • quoteChar—This is only required when type is delimited. The character denotes how quotes are specified in the delimited file.
  • hasHeaderRow—This is only required when type is delimited. This property specifies whether the first row in a delimited file should be treated as a header or as the first feature.
  • encoding—This is only required when type is delimited. This property specifies the type of encoding used.

Schema

The schema object is required; it defines the dataset fields and field type.

SyntaxExample
"schema" : {
 "fields" : {
  "name": "< fieldName >",
  "type" : "< esriFieldTypeString | 
     esriFieldTypeBigInteger | 
     esriFieldTypeDouble >"
 }
}
"schema" : {
 "fields":[
  {
   "name": "trackid",
   "type": "esriFieldTypeString"
  },
  {
   "name": "x",
   "type": "esriFieldTypeDouble"
  },
  {
   "name": "y",
   "type": "esriFieldTypeDouble"
  },
  {
   "name": "time",
   "type": "esriFieldTypeBigInteger"
  },
  {
   "name": "value",
   "type": "esriFieldTypeBigInteger"
  }
 ]
}

Description

  • fields—A required property that defines the fields in the schema.
  • name—A required property denoting the field name. The field name must be unique to the dataset, and it can only contain alphanumeric characters and underscores.
  • type—This is a required property that defines the type of the field. Options include the following:
    • esriFieldTypeInteger—For integers.
    • esriFieldTypeSmallInteger—For integers.
    • esriFieldTypeBigInteger—For integers. Big integer fields will be stored as double fields in a feature service.
    • esriFieldTypeString—For strings.
    • esriFieldTypeDouble—For doubles or floats.
    • esriFieldTypeDate—For shapefiles with date fields. Delimited, ORC, and parquet datasets with fields representing a date must have dates represented by a esriFieldTypeString field.
    • esriFieldTypeSingle—For singles.
    • esriFieldTypeBlob—For binary values. Blob fields will be stored as string fields in a feature service.
Note:

When big data file shares are analyzed through GeoAnalytics Toolss and saved as a feature service, the types might change. For example, an esriFieldTypeBigInteger in a big data file share will become a esriFieldTypeDouble field in a feature service.

Geometry

The geometry object is optional. It's required if a dataset has a spatial representation, such as a point, polyline, or polygon.

SyntaxExample
"geometry" : {
 "geometryType" : "< esriGeometryType >",
 "spatialReference" : {
  "wkid": <wkidNum>,
  "latestwkid" : <latestWkidNum>
  },
 "fields": [
 {
  "name": "<fieldName1>",
  "formats": ["<fieldFormat1>"]
 },
 {
  "name": "<fieldName2>",
  "formats": ["<fieldFormat2>"]
 }
 ]
}

Example using a delimited file with x- and y-values:

"geometry" : {
 "geometryType" : "esriGeometryPoint",
 "spatialReference" : {
  "wkid": 3857
 },
 "fields": [
 {
  "name": "XValue",
  "formats": ["x"]
 },
 {
  "name": "YValue",
  "formats": ["y"]
 }
 ]
}

Example using a delimited file with x-, y-, and z-values:

"geometry" : {
 "geometryType" : "esriGeometryPoint",
 "spatialReference" : {
  "wkid": 4326
 },
 "fields": [
 {
  "name": "Longitude",
  "formats": ["x"]
 },
 {
  "name": "Latitude",
  "formats": ["y"]
 },
 {
  "name": "Height",
  "formats": ["z"]
 }
 ]
}

Example using a .tsv file:

"geometry" : {
 "geometryType" : "esriGeometryPolygon",
 "dropSourceFields": true,
 "spatialReference" : {
  "wkid": 3857
 },
 "fields": [
 {
  "name": "Shapelocation",
  "formats": ["WKT"]
 }
 ]
}

Description

Note:

Since the geometry object is optional, the following properties are listed as required or optional, assuming that a geometry is used:

  • geometryType—This is required. Options include the following:
    • esriGeometryPoint
    • esriGeometryPolyline
    • esriGeometryPolygon
  • spatialReference—A required property denoting the spatial reference of the dataset.
    • wkid—A field that denotes the spatial reference, where wkid or latestWkid is required for a dataset with a geometry.
    • latestWkid—A field that denotes the spatial reference at a given software release, where wkid or latestWkid is required for a dataset with geometry.
  • fields—A required property for delimited datasets with a spatial representation. This denotes the field name or names and formats of the geometry.
    • name—A required property for delimited datasets with a spatial representation. This denotes the name of the field used to represent the geometry. There can be multiple instances of this.
    • formats—A required property for delimited datasets with a spatial representation. This denotes the format of the field used to represent the geometry. There can be multiple instances of this.
  • dropSourceFields—An optional property for datasets with fields representing the geometry. This denotes whether the fields used to specify the geometry will be used as fields in analysis. If set to true, the fields used for geometry will not be visible as analysis fields (like summary statistics) and dropped when running tools. The default is false. This property cannot be set on shapefile datasets.

Time

The time object is optional. It is required if a dataset has a temporal representation.

SyntaxExample
"time" : {
 "timeType" : "< instant | interval >",
 "timeReference" : {
  "timeZone" : "<timeZone >"
  },
  "fields": [
  {
   "name": "<fieldName1>",
   "formats": ["<fieldFormat1>"]
   "role": "< start | end >"
  }
 ]
}

Example using an instant, with multiple formats in the time fields:

"time": {
 "timeType": "instant",
 "timeReference": {"timeZone": "UTC"},
 "fields": [
 {
  "name": "iso_time",
  "formats": [
   "yyyy-MM-dd HH:mm:ss",
   "MM/dd/yyyy HH:mm"
   ]
  }
 ]
}

Example using an interval, with multiple fields used for startTime:

"time": {
 "timeType": "interval",
 "timeReference": {"timeZone": "-0900"},
 "dropSourceFields" : true,
 "fields": [
 {
  "name": "time_start",
  "formats": ["HH:mm:ss"],
  "role" : "start"
  },
 {
  "name": "date_start",
  "formats": ["yyyy-MM-dd"],
  "role" : "start"
  },
 {
  "name": "datetime_ending",
  "formats": ["yyyy-MM-dd HH:mm:ss"],
  "role" : "end"
  }
 ]
}

Description

Note:

Since the time object is optional, the following properties are listed as required or optional, assuming that time is used:

  • timeType—The time type is required if there is time included in the dataset. Options include the following:
    • instant—For a single moment in time
    • interval—For a time interval represented by a start and stop time
  • timeReference—A required field if the dataset is time-enabled, denoting the time zone (timeZone).
    • timeZone—A required field of timeReference that denotes the time zone format of the data. Time zones are based on Joda-Time. To learn about Joda-Time formats, see Joda-Time Available Time Zones. timeZone can be formatted as follows:
      • Using the full name of the time zone: Pacific Standard Time.
      • Using the time zone offset expressed in hours: -0100 or -01:00.
      • You may use time zone abbreviations for UTC or GMT only; otherwise, use the full name or the hours offset.
  • fields—A required field to denote the field names and formats of the time. Required properties of fields are as follows:
    • name—A required field that denotes the name of the field used to represent time. There may be multiple instances of this object.
    • formats—A required field that denotes the format of the field used to represent the time. There may be multiple formats for a single field (as shown above). There may be multiple instances of this object. To learn how fields may be formatted, see Time formats.
    • role—A required field when timeType is interval. It can represent either the startTime or endTime of a time interval.
  • dropSourceFields—An optional property for datasets with fields representing the time. This denotes whether the fields used to specify the time will be used as fields in analysis. If set to true, the fields used for time will not be visible as analysis fields (like summary statistics) and dropped when running tools. The default is false.