Java SparkSql 2.4.0 ArrayIndexOutOfBoundsException error










1














I am new to Spark and trying to read a CSV file using Java maven project but getting ArrayIndexOutOfBoundsException error.



Dependencies:



 <dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.0</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>2.4.0</version>
</dependency>
</dependencies>


Code:



SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL Example")
.config("spark.master", "local")
.getOrCreate();

//Read file
Dataset<Row> df = spark.read()
.format("csv")
.load("test.csv");


CSV File



name,code
A,1
B,3
C,5


Here is stacktrace



 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/12 11:16:25 INFO SparkContext: Running Spark version 2.4.0
18/11/12 11:16:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/12 11:16:25 INFO SparkContext: Submitted application: Java Spark SQL Example
18/11/12 11:16:25 INFO SecurityManager: Changing view acls to: vkumar
18/11/12 11:16:25 INFO SecurityManager: Changing modify acls to: vkumar
18/11/12 11:16:25 INFO SecurityManager: Changing view acls groups to:
18/11/12 11:16:25 INFO SecurityManager: Changing modify acls groups to:
18/11/12 11:16:25 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vkumar); groups with view permissions: Set(); users with modify permissions: Set(vkumar); groups with modify permissions: Set()
18/11/12 11:16:26 INFO Utils: Successfully started service 'sparkDriver' on port 54382.
18/11/12 11:16:26 INFO SparkEnv: Registering MapOutputTracker
18/11/12 11:16:26 INFO SparkEnv: Registering BlockManagerMaster
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/11/12 11:16:26 INFO DiskBlockManager: Created local directory at /private/var/folders/32/7klbv5_94wbddn9kgzvbwnkr0000gn/T/blockmgr-ddd2a79d-7fae-48e4-9658-1a8e2a8bb734
18/11/12 11:16:26 INFO MemoryStore: MemoryStore started with capacity 4.1 GB
18/11/12 11:16:26 INFO SparkEnv: Registering OutputCommitCoordinator
18/11/12 11:16:26 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/11/12 11:16:26 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://kirmac133.domain.com:4040
18/11/12 11:16:26 INFO Executor: Starting executor ID driver on host localhost
18/11/12 11:16:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54383.
18/11/12 11:16:26 INFO NettyBlockTransferService: Server created on kirmac133.domain.com:54383
18/11/12 11:16:26 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/11/12 11:16:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: Registering block manager kirmac133.domain.com:54383 with 4.1 GB RAM, BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/vkumar/Documents/intellij/tmvalidator/spark-warehouse').
18/11/12 11:16:26 INFO SharedState: Warehouse path is 'file:/Users/vkumar/Documents/intellij/tmvalidator/spark-warehouse'.
18/11/12 11:16:27 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
18/11/12 11:16:28 INFO FileSourceStrategy: Pruning directories with:
18/11/12 11:16:28 INFO FileSourceStrategy: Post-Scan Filters: (length(trim(value#0, None)) > 0)
18/11/12 11:16:28 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
18/11/12 11:16:28 INFO FileSourceScanExec: Pushed Filters:
18/11/12 11:16:29 INFO CodeGenerator: Code generated in 130.688148 ms
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.Iterator.foreach(Iterator.scala:937)
at scala.collection.Iterator.foreach$(Iterator.scala:937)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
at scala.collection.IterableLike.foreach(IterableLike.scala:70)
at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findConstructorParam$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$19(BeanIntrospector.scala:176)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194)
at scala.collection.TraversableLike.map(TraversableLike.scala:233)
at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:194)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14(BeanIntrospector.scala:170)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14$adapted(BeanIntrospector.scala:169)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.immutable.List.foreach(List.scala:388)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.immutable.List.flatMap(List.scala:351)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:169)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$._descriptorFor(ScalaAnnotationIntrospectorModule.scala:22)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.fieldName(ScalaAnnotationIntrospectorModule.scala:30)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.findImplicitPropertyName(ScalaAnnotationIntrospectorModule.scala:78)
at com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.findImplicitPropertyName(AnnotationIntrospectorPair.java:467)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:351)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:283)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueMethod(POJOPropertiesCollector.java:169)
at com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueMethod(BasicBeanDescription.java:223)
at com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:348)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:210)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:153)
at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1203)
at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1157)
at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:481)
at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:679)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:107)
at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:142)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3384)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3365)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2759)
at org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.infer(CSVDataSource.scala:232)
at org.apache.spark.sql.execution.datasources.csv.CSVDataSource.inferSchema(CSVDataSource.scala:68)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:183)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:180)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at kir.com.tmvalidator.Validator.init(Validator.java:30)
at kir.com.tmvalidator.Home.main(Home.java:7)
18/11/12 11:16:29 INFO SparkContext: Invoking stop() from shutdown hook
18/11/12 11:16:29 INFO SparkUI: Stopped Spark web UI at http://kirmac133.domain.com:4040
18/11/12 11:16:29 INFO ContextCleaner: Cleaned accumulator 4
18/11/12 11:16:29 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/11/12 11:16:29 INFO MemoryStore: MemoryStore cleared
18/11/12 11:16:29 INFO BlockManager: BlockManager stopped
18/11/12 11:16:29 INFO BlockManagerMaster: BlockManagerMaster stopped
18/11/12 11:16:29 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/11/12 11:16:29 INFO SparkContext: Successfully stopped SparkContext
18/11/12 11:16:29 INFO ShutdownHookManager: Shutdown hook called
18/11/12 11:16:29 INFO ShutdownHookManager: Deleting directory /private/var/folders/32/7klbv5_94wbddn9kgzvbwnkr0000gn/T/spark-1bde2bb5-603d-4fae-9128-7d92f259077b









share|improve this question



















  • 1




    Can you include the stack trace?
    – shriyog
    Nov 12 at 11:16










  • @shriyog - Added
    – VK321
    Nov 12 at 11:21










  • Are you getting this error of above-mentioned input in csv?
    – shriyog
    Nov 12 at 11:28










  • @shriyog - Yes :(
    – VK321
    Nov 12 at 11:28










  • I tested the same for 2.3.0 versions, it works without issues. Checking with 2.4.0
    – shriyog
    Nov 12 at 11:32















1














I am new to Spark and trying to read a CSV file using Java maven project but getting ArrayIndexOutOfBoundsException error.



Dependencies:



 <dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.0</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>2.4.0</version>
</dependency>
</dependencies>


Code:



SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL Example")
.config("spark.master", "local")
.getOrCreate();

//Read file
Dataset<Row> df = spark.read()
.format("csv")
.load("test.csv");


CSV File



name,code
A,1
B,3
C,5


Here is stacktrace



 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/12 11:16:25 INFO SparkContext: Running Spark version 2.4.0
18/11/12 11:16:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/12 11:16:25 INFO SparkContext: Submitted application: Java Spark SQL Example
18/11/12 11:16:25 INFO SecurityManager: Changing view acls to: vkumar
18/11/12 11:16:25 INFO SecurityManager: Changing modify acls to: vkumar
18/11/12 11:16:25 INFO SecurityManager: Changing view acls groups to:
18/11/12 11:16:25 INFO SecurityManager: Changing modify acls groups to:
18/11/12 11:16:25 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vkumar); groups with view permissions: Set(); users with modify permissions: Set(vkumar); groups with modify permissions: Set()
18/11/12 11:16:26 INFO Utils: Successfully started service 'sparkDriver' on port 54382.
18/11/12 11:16:26 INFO SparkEnv: Registering MapOutputTracker
18/11/12 11:16:26 INFO SparkEnv: Registering BlockManagerMaster
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/11/12 11:16:26 INFO DiskBlockManager: Created local directory at /private/var/folders/32/7klbv5_94wbddn9kgzvbwnkr0000gn/T/blockmgr-ddd2a79d-7fae-48e4-9658-1a8e2a8bb734
18/11/12 11:16:26 INFO MemoryStore: MemoryStore started with capacity 4.1 GB
18/11/12 11:16:26 INFO SparkEnv: Registering OutputCommitCoordinator
18/11/12 11:16:26 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/11/12 11:16:26 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://kirmac133.domain.com:4040
18/11/12 11:16:26 INFO Executor: Starting executor ID driver on host localhost
18/11/12 11:16:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54383.
18/11/12 11:16:26 INFO NettyBlockTransferService: Server created on kirmac133.domain.com:54383
18/11/12 11:16:26 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/11/12 11:16:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: Registering block manager kirmac133.domain.com:54383 with 4.1 GB RAM, BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/vkumar/Documents/intellij/tmvalidator/spark-warehouse').
18/11/12 11:16:26 INFO SharedState: Warehouse path is 'file:/Users/vkumar/Documents/intellij/tmvalidator/spark-warehouse'.
18/11/12 11:16:27 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
18/11/12 11:16:28 INFO FileSourceStrategy: Pruning directories with:
18/11/12 11:16:28 INFO FileSourceStrategy: Post-Scan Filters: (length(trim(value#0, None)) > 0)
18/11/12 11:16:28 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
18/11/12 11:16:28 INFO FileSourceScanExec: Pushed Filters:
18/11/12 11:16:29 INFO CodeGenerator: Code generated in 130.688148 ms
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.Iterator.foreach(Iterator.scala:937)
at scala.collection.Iterator.foreach$(Iterator.scala:937)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
at scala.collection.IterableLike.foreach(IterableLike.scala:70)
at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findConstructorParam$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$19(BeanIntrospector.scala:176)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194)
at scala.collection.TraversableLike.map(TraversableLike.scala:233)
at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:194)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14(BeanIntrospector.scala:170)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14$adapted(BeanIntrospector.scala:169)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.immutable.List.foreach(List.scala:388)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.immutable.List.flatMap(List.scala:351)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:169)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$._descriptorFor(ScalaAnnotationIntrospectorModule.scala:22)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.fieldName(ScalaAnnotationIntrospectorModule.scala:30)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.findImplicitPropertyName(ScalaAnnotationIntrospectorModule.scala:78)
at com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.findImplicitPropertyName(AnnotationIntrospectorPair.java:467)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:351)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:283)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueMethod(POJOPropertiesCollector.java:169)
at com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueMethod(BasicBeanDescription.java:223)
at com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:348)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:210)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:153)
at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1203)
at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1157)
at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:481)
at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:679)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:107)
at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:142)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3384)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3365)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2759)
at org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.infer(CSVDataSource.scala:232)
at org.apache.spark.sql.execution.datasources.csv.CSVDataSource.inferSchema(CSVDataSource.scala:68)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:183)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:180)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at kir.com.tmvalidator.Validator.init(Validator.java:30)
at kir.com.tmvalidator.Home.main(Home.java:7)
18/11/12 11:16:29 INFO SparkContext: Invoking stop() from shutdown hook
18/11/12 11:16:29 INFO SparkUI: Stopped Spark web UI at http://kirmac133.domain.com:4040
18/11/12 11:16:29 INFO ContextCleaner: Cleaned accumulator 4
18/11/12 11:16:29 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/11/12 11:16:29 INFO MemoryStore: MemoryStore cleared
18/11/12 11:16:29 INFO BlockManager: BlockManager stopped
18/11/12 11:16:29 INFO BlockManagerMaster: BlockManagerMaster stopped
18/11/12 11:16:29 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/11/12 11:16:29 INFO SparkContext: Successfully stopped SparkContext
18/11/12 11:16:29 INFO ShutdownHookManager: Shutdown hook called
18/11/12 11:16:29 INFO ShutdownHookManager: Deleting directory /private/var/folders/32/7klbv5_94wbddn9kgzvbwnkr0000gn/T/spark-1bde2bb5-603d-4fae-9128-7d92f259077b









share|improve this question



















  • 1




    Can you include the stack trace?
    – shriyog
    Nov 12 at 11:16










  • @shriyog - Added
    – VK321
    Nov 12 at 11:21










  • Are you getting this error of above-mentioned input in csv?
    – shriyog
    Nov 12 at 11:28










  • @shriyog - Yes :(
    – VK321
    Nov 12 at 11:28










  • I tested the same for 2.3.0 versions, it works without issues. Checking with 2.4.0
    – shriyog
    Nov 12 at 11:32













1












1








1







I am new to Spark and trying to read a CSV file using Java maven project but getting ArrayIndexOutOfBoundsException error.



Dependencies:



 <dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.0</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>2.4.0</version>
</dependency>
</dependencies>


Code:



SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL Example")
.config("spark.master", "local")
.getOrCreate();

//Read file
Dataset<Row> df = spark.read()
.format("csv")
.load("test.csv");


CSV File



name,code
A,1
B,3
C,5


Here is stacktrace



 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/12 11:16:25 INFO SparkContext: Running Spark version 2.4.0
18/11/12 11:16:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/12 11:16:25 INFO SparkContext: Submitted application: Java Spark SQL Example
18/11/12 11:16:25 INFO SecurityManager: Changing view acls to: vkumar
18/11/12 11:16:25 INFO SecurityManager: Changing modify acls to: vkumar
18/11/12 11:16:25 INFO SecurityManager: Changing view acls groups to:
18/11/12 11:16:25 INFO SecurityManager: Changing modify acls groups to:
18/11/12 11:16:25 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vkumar); groups with view permissions: Set(); users with modify permissions: Set(vkumar); groups with modify permissions: Set()
18/11/12 11:16:26 INFO Utils: Successfully started service 'sparkDriver' on port 54382.
18/11/12 11:16:26 INFO SparkEnv: Registering MapOutputTracker
18/11/12 11:16:26 INFO SparkEnv: Registering BlockManagerMaster
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/11/12 11:16:26 INFO DiskBlockManager: Created local directory at /private/var/folders/32/7klbv5_94wbddn9kgzvbwnkr0000gn/T/blockmgr-ddd2a79d-7fae-48e4-9658-1a8e2a8bb734
18/11/12 11:16:26 INFO MemoryStore: MemoryStore started with capacity 4.1 GB
18/11/12 11:16:26 INFO SparkEnv: Registering OutputCommitCoordinator
18/11/12 11:16:26 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/11/12 11:16:26 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://kirmac133.domain.com:4040
18/11/12 11:16:26 INFO Executor: Starting executor ID driver on host localhost
18/11/12 11:16:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54383.
18/11/12 11:16:26 INFO NettyBlockTransferService: Server created on kirmac133.domain.com:54383
18/11/12 11:16:26 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/11/12 11:16:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: Registering block manager kirmac133.domain.com:54383 with 4.1 GB RAM, BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/vkumar/Documents/intellij/tmvalidator/spark-warehouse').
18/11/12 11:16:26 INFO SharedState: Warehouse path is 'file:/Users/vkumar/Documents/intellij/tmvalidator/spark-warehouse'.
18/11/12 11:16:27 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
18/11/12 11:16:28 INFO FileSourceStrategy: Pruning directories with:
18/11/12 11:16:28 INFO FileSourceStrategy: Post-Scan Filters: (length(trim(value#0, None)) > 0)
18/11/12 11:16:28 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
18/11/12 11:16:28 INFO FileSourceScanExec: Pushed Filters:
18/11/12 11:16:29 INFO CodeGenerator: Code generated in 130.688148 ms
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.Iterator.foreach(Iterator.scala:937)
at scala.collection.Iterator.foreach$(Iterator.scala:937)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
at scala.collection.IterableLike.foreach(IterableLike.scala:70)
at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findConstructorParam$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$19(BeanIntrospector.scala:176)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194)
at scala.collection.TraversableLike.map(TraversableLike.scala:233)
at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:194)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14(BeanIntrospector.scala:170)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14$adapted(BeanIntrospector.scala:169)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.immutable.List.foreach(List.scala:388)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.immutable.List.flatMap(List.scala:351)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:169)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$._descriptorFor(ScalaAnnotationIntrospectorModule.scala:22)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.fieldName(ScalaAnnotationIntrospectorModule.scala:30)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.findImplicitPropertyName(ScalaAnnotationIntrospectorModule.scala:78)
at com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.findImplicitPropertyName(AnnotationIntrospectorPair.java:467)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:351)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:283)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueMethod(POJOPropertiesCollector.java:169)
at com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueMethod(BasicBeanDescription.java:223)
at com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:348)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:210)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:153)
at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1203)
at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1157)
at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:481)
at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:679)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:107)
at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:142)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3384)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3365)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2759)
at org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.infer(CSVDataSource.scala:232)
at org.apache.spark.sql.execution.datasources.csv.CSVDataSource.inferSchema(CSVDataSource.scala:68)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:183)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:180)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at kir.com.tmvalidator.Validator.init(Validator.java:30)
at kir.com.tmvalidator.Home.main(Home.java:7)
18/11/12 11:16:29 INFO SparkContext: Invoking stop() from shutdown hook
18/11/12 11:16:29 INFO SparkUI: Stopped Spark web UI at http://kirmac133.domain.com:4040
18/11/12 11:16:29 INFO ContextCleaner: Cleaned accumulator 4
18/11/12 11:16:29 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/11/12 11:16:29 INFO MemoryStore: MemoryStore cleared
18/11/12 11:16:29 INFO BlockManager: BlockManager stopped
18/11/12 11:16:29 INFO BlockManagerMaster: BlockManagerMaster stopped
18/11/12 11:16:29 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/11/12 11:16:29 INFO SparkContext: Successfully stopped SparkContext
18/11/12 11:16:29 INFO ShutdownHookManager: Shutdown hook called
18/11/12 11:16:29 INFO ShutdownHookManager: Deleting directory /private/var/folders/32/7klbv5_94wbddn9kgzvbwnkr0000gn/T/spark-1bde2bb5-603d-4fae-9128-7d92f259077b









share|improve this question















I am new to Spark and trying to read a CSV file using Java maven project but getting ArrayIndexOutOfBoundsException error.



Dependencies:



 <dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.0</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>2.4.0</version>
</dependency>
</dependencies>


Code:



SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL Example")
.config("spark.master", "local")
.getOrCreate();

//Read file
Dataset<Row> df = spark.read()
.format("csv")
.load("test.csv");


CSV File



name,code
A,1
B,3
C,5


Here is stacktrace



 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/12 11:16:25 INFO SparkContext: Running Spark version 2.4.0
18/11/12 11:16:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/12 11:16:25 INFO SparkContext: Submitted application: Java Spark SQL Example
18/11/12 11:16:25 INFO SecurityManager: Changing view acls to: vkumar
18/11/12 11:16:25 INFO SecurityManager: Changing modify acls to: vkumar
18/11/12 11:16:25 INFO SecurityManager: Changing view acls groups to:
18/11/12 11:16:25 INFO SecurityManager: Changing modify acls groups to:
18/11/12 11:16:25 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vkumar); groups with view permissions: Set(); users with modify permissions: Set(vkumar); groups with modify permissions: Set()
18/11/12 11:16:26 INFO Utils: Successfully started service 'sparkDriver' on port 54382.
18/11/12 11:16:26 INFO SparkEnv: Registering MapOutputTracker
18/11/12 11:16:26 INFO SparkEnv: Registering BlockManagerMaster
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/11/12 11:16:26 INFO DiskBlockManager: Created local directory at /private/var/folders/32/7klbv5_94wbddn9kgzvbwnkr0000gn/T/blockmgr-ddd2a79d-7fae-48e4-9658-1a8e2a8bb734
18/11/12 11:16:26 INFO MemoryStore: MemoryStore started with capacity 4.1 GB
18/11/12 11:16:26 INFO SparkEnv: Registering OutputCommitCoordinator
18/11/12 11:16:26 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/11/12 11:16:26 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://kirmac133.domain.com:4040
18/11/12 11:16:26 INFO Executor: Starting executor ID driver on host localhost
18/11/12 11:16:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54383.
18/11/12 11:16:26 INFO NettyBlockTransferService: Server created on kirmac133.domain.com:54383
18/11/12 11:16:26 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/11/12 11:16:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManagerMasterEndpoint: Registering block manager kirmac133.domain.com:54383 with 4.1 GB RAM, BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, kirmac133.domain.com, 54383, None)
18/11/12 11:16:26 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/vkumar/Documents/intellij/tmvalidator/spark-warehouse').
18/11/12 11:16:26 INFO SharedState: Warehouse path is 'file:/Users/vkumar/Documents/intellij/tmvalidator/spark-warehouse'.
18/11/12 11:16:27 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
18/11/12 11:16:28 INFO FileSourceStrategy: Pruning directories with:
18/11/12 11:16:28 INFO FileSourceStrategy: Post-Scan Filters: (length(trim(value#0, None)) > 0)
18/11/12 11:16:28 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
18/11/12 11:16:28 INFO FileSourceScanExec: Pushed Filters:
18/11/12 11:16:29 INFO CodeGenerator: Code generated in 130.688148 ms
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.Iterator.foreach(Iterator.scala:937)
at scala.collection.Iterator.foreach$(Iterator.scala:937)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
at scala.collection.IterableLike.foreach(IterableLike.scala:70)
at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findConstructorParam$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$19(BeanIntrospector.scala:176)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194)
at scala.collection.TraversableLike.map(TraversableLike.scala:233)
at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:194)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14(BeanIntrospector.scala:170)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14$adapted(BeanIntrospector.scala:169)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
at scala.collection.immutable.List.foreach(List.scala:388)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:240)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:237)
at scala.collection.immutable.List.flatMap(List.scala:351)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:169)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$._descriptorFor(ScalaAnnotationIntrospectorModule.scala:22)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.fieldName(ScalaAnnotationIntrospectorModule.scala:30)
at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.findImplicitPropertyName(ScalaAnnotationIntrospectorModule.scala:78)
at com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.findImplicitPropertyName(AnnotationIntrospectorPair.java:467)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:351)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:283)
at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueMethod(POJOPropertiesCollector.java:169)
at com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueMethod(BasicBeanDescription.java:223)
at com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:348)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:210)
at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:153)
at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1203)
at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1157)
at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:481)
at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:679)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:107)
at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:142)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3384)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3365)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2545)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2759)
at org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.infer(CSVDataSource.scala:232)
at org.apache.spark.sql.execution.datasources.csv.CSVDataSource.inferSchema(CSVDataSource.scala:68)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:183)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:180)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at kir.com.tmvalidator.Validator.init(Validator.java:30)
at kir.com.tmvalidator.Home.main(Home.java:7)
18/11/12 11:16:29 INFO SparkContext: Invoking stop() from shutdown hook
18/11/12 11:16:29 INFO SparkUI: Stopped Spark web UI at http://kirmac133.domain.com:4040
18/11/12 11:16:29 INFO ContextCleaner: Cleaned accumulator 4
18/11/12 11:16:29 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/11/12 11:16:29 INFO MemoryStore: MemoryStore cleared
18/11/12 11:16:29 INFO BlockManager: BlockManager stopped
18/11/12 11:16:29 INFO BlockManagerMaster: BlockManagerMaster stopped
18/11/12 11:16:29 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/11/12 11:16:29 INFO SparkContext: Successfully stopped SparkContext
18/11/12 11:16:29 INFO ShutdownHookManager: Shutdown hook called
18/11/12 11:16:29 INFO ShutdownHookManager: Deleting directory /private/var/folders/32/7klbv5_94wbddn9kgzvbwnkr0000gn/T/spark-1bde2bb5-603d-4fae-9128-7d92f259077b






java apache-spark hive apache-spark-sql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 12 at 11:20

























asked Nov 12 at 11:14









VK321

2,07121524




2,07121524







  • 1




    Can you include the stack trace?
    – shriyog
    Nov 12 at 11:16










  • @shriyog - Added
    – VK321
    Nov 12 at 11:21










  • Are you getting this error of above-mentioned input in csv?
    – shriyog
    Nov 12 at 11:28










  • @shriyog - Yes :(
    – VK321
    Nov 12 at 11:28










  • I tested the same for 2.3.0 versions, it works without issues. Checking with 2.4.0
    – shriyog
    Nov 12 at 11:32












  • 1




    Can you include the stack trace?
    – shriyog
    Nov 12 at 11:16










  • @shriyog - Added
    – VK321
    Nov 12 at 11:21










  • Are you getting this error of above-mentioned input in csv?
    – shriyog
    Nov 12 at 11:28










  • @shriyog - Yes :(
    – VK321
    Nov 12 at 11:28










  • I tested the same for 2.3.0 versions, it works without issues. Checking with 2.4.0
    – shriyog
    Nov 12 at 11:32







1




1




Can you include the stack trace?
– shriyog
Nov 12 at 11:16




Can you include the stack trace?
– shriyog
Nov 12 at 11:16












@shriyog - Added
– VK321
Nov 12 at 11:21




@shriyog - Added
– VK321
Nov 12 at 11:21












Are you getting this error of above-mentioned input in csv?
– shriyog
Nov 12 at 11:28




Are you getting this error of above-mentioned input in csv?
– shriyog
Nov 12 at 11:28












@shriyog - Yes :(
– VK321
Nov 12 at 11:28




@shriyog - Yes :(
– VK321
Nov 12 at 11:28












I tested the same for 2.3.0 versions, it works without issues. Checking with 2.4.0
– shriyog
Nov 12 at 11:32




I tested the same for 2.3.0 versions, it works without issues. Checking with 2.4.0
– shriyog
Nov 12 at 11:32












2 Answers
2






active

oldest

votes


















1














The paranamer version referred in Spark 2.4.0 is 2.7 which is causing the issue. Adding following dependency before the spark-core/ spark-sql solved the issue for me.



<dependency>
<groupId>com.thoughtworks.paranamer</groupId>
<artifactId>paranamer</artifactId>
<version>2.8</version>
</dependency>





share|improve this answer




























    0














    I meet the same problem, but when I tried scala, this problem solved!



    val tp = StructType(Seq(StructField("name", DataTypes.StringType), StructField("code", DataTypes.IntegerType)))



    val df = spark.read.format("csv").option("header", "true").schema(tp).load("test.csv")






    share|improve this answer




















    • yes for now its working for me on 2.3.0
      – VK321
      Nov 13 at 6:15










    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260980%2fjava-sparksql-2-4-0-arrayindexoutofboundsexception-error%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    The paranamer version referred in Spark 2.4.0 is 2.7 which is causing the issue. Adding following dependency before the spark-core/ spark-sql solved the issue for me.



    <dependency>
    <groupId>com.thoughtworks.paranamer</groupId>
    <artifactId>paranamer</artifactId>
    <version>2.8</version>
    </dependency>





    share|improve this answer

























      1














      The paranamer version referred in Spark 2.4.0 is 2.7 which is causing the issue. Adding following dependency before the spark-core/ spark-sql solved the issue for me.



      <dependency>
      <groupId>com.thoughtworks.paranamer</groupId>
      <artifactId>paranamer</artifactId>
      <version>2.8</version>
      </dependency>





      share|improve this answer























        1












        1








        1






        The paranamer version referred in Spark 2.4.0 is 2.7 which is causing the issue. Adding following dependency before the spark-core/ spark-sql solved the issue for me.



        <dependency>
        <groupId>com.thoughtworks.paranamer</groupId>
        <artifactId>paranamer</artifactId>
        <version>2.8</version>
        </dependency>





        share|improve this answer












        The paranamer version referred in Spark 2.4.0 is 2.7 which is causing the issue. Adding following dependency before the spark-core/ spark-sql solved the issue for me.



        <dependency>
        <groupId>com.thoughtworks.paranamer</groupId>
        <artifactId>paranamer</artifactId>
        <version>2.8</version>
        </dependency>






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 20 at 4:24









        Shasankar

        18719




        18719























            0














            I meet the same problem, but when I tried scala, this problem solved!



            val tp = StructType(Seq(StructField("name", DataTypes.StringType), StructField("code", DataTypes.IntegerType)))



            val df = spark.read.format("csv").option("header", "true").schema(tp).load("test.csv")






            share|improve this answer




















            • yes for now its working for me on 2.3.0
              – VK321
              Nov 13 at 6:15















            0














            I meet the same problem, but when I tried scala, this problem solved!



            val tp = StructType(Seq(StructField("name", DataTypes.StringType), StructField("code", DataTypes.IntegerType)))



            val df = spark.read.format("csv").option("header", "true").schema(tp).load("test.csv")






            share|improve this answer




















            • yes for now its working for me on 2.3.0
              – VK321
              Nov 13 at 6:15













            0












            0








            0






            I meet the same problem, but when I tried scala, this problem solved!



            val tp = StructType(Seq(StructField("name", DataTypes.StringType), StructField("code", DataTypes.IntegerType)))



            val df = spark.read.format("csv").option("header", "true").schema(tp).load("test.csv")






            share|improve this answer












            I meet the same problem, but when I tried scala, this problem solved!



            val tp = StructType(Seq(StructField("name", DataTypes.StringType), StructField("code", DataTypes.IntegerType)))



            val df = spark.read.format("csv").option("header", "true").schema(tp).load("test.csv")







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 13 at 6:14









            LiJianing

            373




            373











            • yes for now its working for me on 2.3.0
              – VK321
              Nov 13 at 6:15
















            • yes for now its working for me on 2.3.0
              – VK321
              Nov 13 at 6:15















            yes for now its working for me on 2.3.0
            – VK321
            Nov 13 at 6:15




            yes for now its working for me on 2.3.0
            – VK321
            Nov 13 at 6:15

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260980%2fjava-sparksql-2-4-0-arrayindexoutofboundsexception-error%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            How to read a connectionString WITH PROVIDER in .NET Core?

            In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

            Museum of Modern and Contemporary Art of Trento and Rovereto