How to Compile Equivalent SQL code written in Python / Spark
up vote
-2
down vote
favorite
I am trying to clean data using Pyspark techniques with an application called Optimus, see link
https://github.com/ironmussa/Optimus
As you can see it uses the following lines of code to transform the data:
df
.rows.sort("product","desc")
.cols.lower(["firstName","lastName"])
.cols.date_transform("birth", "new_date", "yyyy/MM/dd", "dd-MM-YYYY")
.cols.years_between("birth", "years_between", "yyyy/MM/dd")
.cols.remove_accents("lastName")
.cols.remove_special_chars("lastName")
.cols.replace("product","taaaccoo","taco")
.cols.replace("product",["piza","pizzza"],"pizza")
.rows.drop(df["id"]<7)
.cols.drop("dummyCol")
.cols.rename(str.lower)
.cols.apply_by_dtypes("product",func,"string", data_type="integer")
.cols.trim("*")
.show()
Can someone let me know what the equivalent commands would be in SQL?
sql python-3.x apache-spark
add a comment |
up vote
-2
down vote
favorite
I am trying to clean data using Pyspark techniques with an application called Optimus, see link
https://github.com/ironmussa/Optimus
As you can see it uses the following lines of code to transform the data:
df
.rows.sort("product","desc")
.cols.lower(["firstName","lastName"])
.cols.date_transform("birth", "new_date", "yyyy/MM/dd", "dd-MM-YYYY")
.cols.years_between("birth", "years_between", "yyyy/MM/dd")
.cols.remove_accents("lastName")
.cols.remove_special_chars("lastName")
.cols.replace("product","taaaccoo","taco")
.cols.replace("product",["piza","pizzza"],"pizza")
.rows.drop(df["id"]<7)
.cols.drop("dummyCol")
.cols.rename(str.lower)
.cols.apply_by_dtypes("product",func,"string", data_type="integer")
.cols.trim("*")
.show()
Can someone let me know what the equivalent commands would be in SQL?
sql python-3.x apache-spark
How far have you gotten before posting this question?
– cricket_007
Nov 10 at 23:36
add a comment |
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
I am trying to clean data using Pyspark techniques with an application called Optimus, see link
https://github.com/ironmussa/Optimus
As you can see it uses the following lines of code to transform the data:
df
.rows.sort("product","desc")
.cols.lower(["firstName","lastName"])
.cols.date_transform("birth", "new_date", "yyyy/MM/dd", "dd-MM-YYYY")
.cols.years_between("birth", "years_between", "yyyy/MM/dd")
.cols.remove_accents("lastName")
.cols.remove_special_chars("lastName")
.cols.replace("product","taaaccoo","taco")
.cols.replace("product",["piza","pizzza"],"pizza")
.rows.drop(df["id"]<7)
.cols.drop("dummyCol")
.cols.rename(str.lower)
.cols.apply_by_dtypes("product",func,"string", data_type="integer")
.cols.trim("*")
.show()
Can someone let me know what the equivalent commands would be in SQL?
sql python-3.x apache-spark
I am trying to clean data using Pyspark techniques with an application called Optimus, see link
https://github.com/ironmussa/Optimus
As you can see it uses the following lines of code to transform the data:
df
.rows.sort("product","desc")
.cols.lower(["firstName","lastName"])
.cols.date_transform("birth", "new_date", "yyyy/MM/dd", "dd-MM-YYYY")
.cols.years_between("birth", "years_between", "yyyy/MM/dd")
.cols.remove_accents("lastName")
.cols.remove_special_chars("lastName")
.cols.replace("product","taaaccoo","taco")
.cols.replace("product",["piza","pizzza"],"pizza")
.rows.drop(df["id"]<7)
.cols.drop("dummyCol")
.cols.rename(str.lower)
.cols.apply_by_dtypes("product",func,"string", data_type="integer")
.cols.trim("*")
.show()
Can someone let me know what the equivalent commands would be in SQL?
sql python-3.x apache-spark
sql python-3.x apache-spark
asked Nov 10 at 22:15
user485868
32
32
How far have you gotten before posting this question?
– cricket_007
Nov 10 at 23:36
add a comment |
How far have you gotten before posting this question?
– cricket_007
Nov 10 at 23:36
How far have you gotten before posting this question?
– cricket_007
Nov 10 at 23:36
How far have you gotten before posting this question?
– cricket_007
Nov 10 at 23:36
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53243961%2fhow-to-compile-equivalent-sql-code-written-in-python-spark%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
How far have you gotten before posting this question?
– cricket_007
Nov 10 at 23:36