How to use pandas? (https://pandas.pydata.org/)
What is Pandas? why use Pandas?
How to generate graph?
Simple, download and use Anaconda, everything comes bundled.
if you wish to install it on Server then do all the pip install, etc.
To use pandas in any python script using following:
import numpy as np
import pandas as pd
You are all set to use Pandas.
What is Pandas? why use Pandas?
Pandas is useful for data manipulation, just like you do in Excel.
you may retrieve & store data from Excel, CSV, JSON files or even Database tables using SQL queries.
It seems some people use it as ETL (extract transform and load)
What Pandas can do?
Pandas is good at data (array) manipulation (called as DataFrames), it gives great functionality like SQL to carry out similar operations on 2D arrays.
Functionality to expect out of the box?
How to schedule a batch to reduce I/O?
What Pandas can do?
Pandas is good at data (array) manipulation (called as DataFrames), it gives great functionality like SQL to carry out similar operations on 2D arrays.
Functionality to expect out of the box?
- How to load the data into 2D array?
Load the data using following command:
mydata = pd.read_csv('Documents\mydata.csv',index_col=1)
Output:
| fname | lname | grade | department | birthdate | joindate | |
|---|---|---|---|---|---|---|
| empid | ||||||
| emp01 | Asish | Kulkarni | 4 | IT | 1977-01-02 | 2014-02-04 |
| emp02 | Bharat | Joshi | 5 | Finance | 1967-05-05 | 2014-02-04 |
| emp03 | Ashish | Goenka | 3 | Sales | 1999-07-08 | 2015-02-07 |
| emp04 | Himesh | Shah | 2 | Pre-Sales | 1977-01-02 | 2019-02-04 |
| emp05 | Asish | Deshpande | 4 | Accounts | 1988-01-02 | 2014-02-04 |
| emp06 | Amit | Kulkarni | 4 | IT | 1977-01-02 | 2014-02-04 |
| emp07 | Asish | Lele | 1 | IT | 2000-01-02 | 2014-02-04 |
| emp08 | Manish | Bhate | 1 | IT | 2000-01-09 | 2014-02-04 |
| emp08 | Nilima | Bhat | 1 | IT | 2000-01-10 | 2014-02-04 |
| emp09 | Aparna | Jamatani | 1 | IT | 2000-07-02 | 2014-02-04 |
One more example:
twoD = pd.DataFrame({'lname' : 'Downey','fname': pd.Categorical(["Robert","Pepper","Maguna","Bob"]),'age': pd.Categorical(["43","40","12","4"]), 'bdate': pd.Categorical([pd.Timestamp('19700101'),pd.Timestamp('19760101'),pd.Timestamp('20080702'),pd.Timestamp('20080702')])})
Output:
| lname | fname | age | bdate | |
|---|---|---|---|---|
| 0 | Downey | Robert | 43 | 1970-01-01 |
| 1 | Downey | Pepper | 40 | 1976-01-01 |
| 2 | Downey | Maguna | 12 | 2008-07-02 |
| 3 | Downey | Bob | 4 | 2008-07-02 |
- How to save data to desired destination?
df.to_csv('output.csv', mode='w')
mydata2.to_json(r'newDataFrame.json',orient='table')
- How to display data with desired columns or rows?
Similar to Unix head and tail command:
df.head(5)
df.tail(5)
Using key as 'emp09' from index and selecting columns
df.loc['emp09',['fname','lname']]
Using numeric value for selecting range 3 to 5 and column 0 & 1.
df.iloc[3:5,[0,1]]
- How to sort and list the data?
mydata2.sort_values(by="grade")
- Filter data using like
officer = mydata2[mydata2["grade"] > 1]
senior = mydata2[mydata2["birthdate"] < "19990101"]
empoloyee = mydata2[mydata2["fname"].str.contains("sis")]
- How to add columns & rows?
Add a column name City with default value as "Mumbai"
mydata2.insert(5, "City","Mumbai", True)
Add a row with data with values
newrec2 = pd.Series({'fname': 'Manoj', 'lname': 'Pethe'},name='emp10')
mydata2.append(newrec2)
- How to produce statistics? average / max / min / aggregation /cumsum
run the method on the column, returns a value
mydata2['grade'].sum()
mydata2['grade'].max()
mydata2['grade'].min()
- Group by
mydata2.groupby(['department']).count()
mydata2.groupby(['department','grade']).agg('sum')
- Joins
- Merge
Merge DataFrames df1 and df2 with specified left and right suffixes appended to any overlapping columns.
- Delete column
del df['column_name']
- Date & Time functions
How to schedule a batch to reduce I/O?
Simple answer is reduce the size of dataset given to Pandas before you begin the transformation.
How to generate graph?
Since this is getting too lengthy, I would create another article.
No comments:
Post a Comment