Using Spark how to add column at the end
up vote
-1
down vote
favorite
Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided
`sal >= 1000 && sal <=2000 = Level 1
sal > 2000 && sal <= 3000 = Level 2
sal >3000 && sal <=4000 = Level 3
+-----+-------+----+----+
|empid|empName| sal|dept|
+-----+-------+----+----+
| 100| EMP1 |1000|IT |
| 101| EMP2 |2500|ITES|
| 102| EMP3 |3000|BPO |
| 104| EMP4 |4000|ENGG|
+-----+-------+----+----+`
Output
+-----+-------+----+----+-----+
|empid|empName| sal|dept|Level|
+-----+-------+----+----+-----+
| 100| EMP1 |1000|IT |Level 1|
| 101| EMP2 |2500|ITES|Level 2|
| 102| EMP3 |3000|BPO |Level 3|
| 104| EMP4 |4000|ENGG|Level 3|
+-----+-------+----+----+-----+
I have written below code -
case class mySchema(empid: Int, empName: String, sal: Int, post: String)
import spark.implicits._
val rdd1 = spark.read.csv("file:///E:/dev/tools/SampleData/emp.csv").select($"_c0".cast("integer").as("empid"),$"_c1".cast("string").as("empName"),$"_c2".cast("integer").as("sal"),$"_c3".cast("string").as("post"))
val df1 = rdd1.toDF()
val dfTods = df1.as[mySchema]
dfTods.createTempView("Employee")
val resDS = spark.sql("""select *
case when (sal === 1000) then 'ASE'
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee""")
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)
== SQL ==
select * case when (sal === 1000) then 'ASE'
----------------------------------------------------------------------^^^
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee
apache-spark apache-spark-sql
add a comment |
up vote
-1
down vote
favorite
Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided
`sal >= 1000 && sal <=2000 = Level 1
sal > 2000 && sal <= 3000 = Level 2
sal >3000 && sal <=4000 = Level 3
+-----+-------+----+----+
|empid|empName| sal|dept|
+-----+-------+----+----+
| 100| EMP1 |1000|IT |
| 101| EMP2 |2500|ITES|
| 102| EMP3 |3000|BPO |
| 104| EMP4 |4000|ENGG|
+-----+-------+----+----+`
Output
+-----+-------+----+----+-----+
|empid|empName| sal|dept|Level|
+-----+-------+----+----+-----+
| 100| EMP1 |1000|IT |Level 1|
| 101| EMP2 |2500|ITES|Level 2|
| 102| EMP3 |3000|BPO |Level 3|
| 104| EMP4 |4000|ENGG|Level 3|
+-----+-------+----+----+-----+
I have written below code -
case class mySchema(empid: Int, empName: String, sal: Int, post: String)
import spark.implicits._
val rdd1 = spark.read.csv("file:///E:/dev/tools/SampleData/emp.csv").select($"_c0".cast("integer").as("empid"),$"_c1".cast("string").as("empName"),$"_c2".cast("integer").as("sal"),$"_c3".cast("string").as("post"))
val df1 = rdd1.toDF()
val dfTods = df1.as[mySchema]
dfTods.createTempView("Employee")
val resDS = spark.sql("""select *
case when (sal === 1000) then 'ASE'
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee""")
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)
== SQL ==
select * case when (sal === 1000) then 'ASE'
----------------------------------------------------------------------^^^
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee
apache-spark apache-spark-sql
What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31
1
Missing,
between*
andcase
.
– user10465355
Nov 11 at 21:44
@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51
Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided
`sal >= 1000 && sal <=2000 = Level 1
sal > 2000 && sal <= 3000 = Level 2
sal >3000 && sal <=4000 = Level 3
+-----+-------+----+----+
|empid|empName| sal|dept|
+-----+-------+----+----+
| 100| EMP1 |1000|IT |
| 101| EMP2 |2500|ITES|
| 102| EMP3 |3000|BPO |
| 104| EMP4 |4000|ENGG|
+-----+-------+----+----+`
Output
+-----+-------+----+----+-----+
|empid|empName| sal|dept|Level|
+-----+-------+----+----+-----+
| 100| EMP1 |1000|IT |Level 1|
| 101| EMP2 |2500|ITES|Level 2|
| 102| EMP3 |3000|BPO |Level 3|
| 104| EMP4 |4000|ENGG|Level 3|
+-----+-------+----+----+-----+
I have written below code -
case class mySchema(empid: Int, empName: String, sal: Int, post: String)
import spark.implicits._
val rdd1 = spark.read.csv("file:///E:/dev/tools/SampleData/emp.csv").select($"_c0".cast("integer").as("empid"),$"_c1".cast("string").as("empName"),$"_c2".cast("integer").as("sal"),$"_c3".cast("string").as("post"))
val df1 = rdd1.toDF()
val dfTods = df1.as[mySchema]
dfTods.createTempView("Employee")
val resDS = spark.sql("""select *
case when (sal === 1000) then 'ASE'
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee""")
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)
== SQL ==
select * case when (sal === 1000) then 'ASE'
----------------------------------------------------------------------^^^
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee
apache-spark apache-spark-sql
Below is my dataset using spark I want to add one more column at the end with name Level. Based on the salary the Level will be decided
`sal >= 1000 && sal <=2000 = Level 1
sal > 2000 && sal <= 3000 = Level 2
sal >3000 && sal <=4000 = Level 3
+-----+-------+----+----+
|empid|empName| sal|dept|
+-----+-------+----+----+
| 100| EMP1 |1000|IT |
| 101| EMP2 |2500|ITES|
| 102| EMP3 |3000|BPO |
| 104| EMP4 |4000|ENGG|
+-----+-------+----+----+`
Output
+-----+-------+----+----+-----+
|empid|empName| sal|dept|Level|
+-----+-------+----+----+-----+
| 100| EMP1 |1000|IT |Level 1|
| 101| EMP2 |2500|ITES|Level 2|
| 102| EMP3 |3000|BPO |Level 3|
| 104| EMP4 |4000|ENGG|Level 3|
+-----+-------+----+----+-----+
I have written below code -
case class mySchema(empid: Int, empName: String, sal: Int, post: String)
import spark.implicits._
val rdd1 = spark.read.csv("file:///E:/dev/tools/SampleData/emp.csv").select($"_c0".cast("integer").as("empid"),$"_c1".cast("string").as("empName"),$"_c2".cast("integer").as("sal"),$"_c3".cast("string").as("post"))
val df1 = rdd1.toDF()
val dfTods = df1.as[mySchema]
dfTods.createTempView("Employee")
val resDS = spark.sql("""select *
case when (sal === 1000) then 'ASE'
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee""")
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'when' expecting (line 2, pos 70)
== SQL ==
select * case when (sal === 1000) then 'ASE'
----------------------------------------------------------------------^^^
when (sal === 2000) then 'SE'
else 'SSE'
end as level from Employee
apache-spark apache-spark-sql
apache-spark apache-spark-sql
edited Nov 11 at 9:33
asked Nov 11 at 6:27
Rahul Wagh
227
227
What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31
1
Missing,
between*
andcase
.
– user10465355
Nov 11 at 21:44
@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51
Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43
add a comment |
What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31
1
Missing,
between*
andcase
.
– user10465355
Nov 11 at 21:44
@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51
Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43
What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31
What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31
1
1
Missing
,
between *
and case
.– user10465355
Nov 11 at 21:44
Missing
,
between *
and case
.– user10465355
Nov 11 at 21:44
@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51
@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51
Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43
Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
select
*,
case
when (sal >=1000 and sal <= 2000) then 'Level 1'
when (sal > 2000 and sal <= 3000) then 'Level 2'
when (sal > 3000 and sal <= 4000) then 'Level 3'
end
as level
from Employee
instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13
Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19
if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21
mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24
1
@mangusta Saying that you cannot use*
along with a case/when statement is simply wrong. He simply forgot to put a comma,
after the*
!
– cheseaux
Nov 11 at 20:46
|
show 5 more comments
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
select
*,
case
when (sal >=1000 and sal <= 2000) then 'Level 1'
when (sal > 2000 and sal <= 3000) then 'Level 2'
when (sal > 3000 and sal <= 4000) then 'Level 3'
end
as level
from Employee
instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13
Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19
if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21
mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24
1
@mangusta Saying that you cannot use*
along with a case/when statement is simply wrong. He simply forgot to put a comma,
after the*
!
– cheseaux
Nov 11 at 20:46
|
show 5 more comments
up vote
0
down vote
select
*,
case
when (sal >=1000 and sal <= 2000) then 'Level 1'
when (sal > 2000 and sal <= 3000) then 'Level 2'
when (sal > 3000 and sal <= 4000) then 'Level 3'
end
as level
from Employee
instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13
Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19
if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21
mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24
1
@mangusta Saying that you cannot use*
along with a case/when statement is simply wrong. He simply forgot to put a comma,
after the*
!
– cheseaux
Nov 11 at 20:46
|
show 5 more comments
up vote
0
down vote
up vote
0
down vote
select
*,
case
when (sal >=1000 and sal <= 2000) then 'Level 1'
when (sal > 2000 and sal <= 3000) then 'Level 2'
when (sal > 3000 and sal <= 4000) then 'Level 3'
end
as level
from Employee
select
*,
case
when (sal >=1000 and sal <= 2000) then 'Level 1'
when (sal > 2000 and sal <= 3000) then 'Level 2'
when (sal > 3000 and sal <= 4000) then 'Level 3'
end
as level
from Employee
edited Nov 16 at 17:01
answered Nov 11 at 10:05
mangusta
1,46421326
1,46421326
instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13
Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19
if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21
mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24
1
@mangusta Saying that you cannot use*
along with a case/when statement is simply wrong. He simply forgot to put a comma,
after the*
!
– cheseaux
Nov 11 at 20:46
|
show 5 more comments
instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13
Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19
if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21
mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24
1
@mangusta Saying that you cannot use*
along with a case/when statement is simply wrong. He simply forgot to put a comma,
after the*
!
– cheseaux
Nov 11 at 20:46
instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13
instead of putting all column names in select can't I use * for all columns rather than typing all the columns
– Rahul Wagh
Nov 11 at 10:13
Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19
Okay so it means if you want all the columns of the table in the output then you have to put all the column names...the asterisk(*) does not work
– Rahul Wagh
Nov 11 at 10:19
if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21
if you want to add a new column to your select clause in addition to already existing columns, then yes, you have to specify all columns and you can't use asterisk
– mangusta
Nov 11 at 10:21
mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24
mangusta thanks for your answer it helps. I had one more question I am trying to do same thing using select but it is not working - import org.apache.spark.sql.functions._` val resDF = df1.select($"empid, empName,sal,post", (when($"sal" >= 1000 and $"sal" <= 2000,"ASE")) (when($"sal" === 20000 and $"sal" <= 3000,"SE")) (when($"sal" > 20000 and $"sal" <= 4000,"SSE")).as("Level"))`
– Rahul Wagh
Nov 11 at 10:24
1
1
@mangusta Saying that you cannot use
*
along with a case/when statement is simply wrong. He simply forgot to put a comma ,
after the *
!– cheseaux
Nov 11 at 20:46
@mangusta Saying that you cannot use
*
along with a case/when statement is simply wrong. He simply forgot to put a comma ,
after the *
!– cheseaux
Nov 11 at 20:46
|
show 5 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53246387%2fusing-spark-how-to-add-column-at-the-end%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What exactly are you asking? How to get to the desired output when you have a table like the one shown above? Where is what you tried so far? Please edit the question to show what you've tried, so as to illustrate a specific problem you're having in a Minimal, Complete, and Verifiable example <stackoverflow.com/help/mcve>. For more information, please see How to Ask<stackoverflow.com/help/how-to-ask> and take the tour<stackoverflow.com/tour>.
– quant
Nov 11 at 6:31
1
Missing
,
between*
andcase
.– user10465355
Nov 11 at 21:44
@RahulWagh could you unaccept my answer? I want to delete it because it's wrong
– mangusta
Nov 12 at 0:51
Unaccepted the answer
– Rahul Wagh
Nov 16 at 9:43