Pandas,  Python

Pandas: Python Data Analysis Library

Pandas is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool, built on top of the Python programming language. Data analysis one of the first most job in machine learning and more used by data scientists. Panda library is designed and developed by “Wes Mckinney” which is initial released in Jan 2008. The features of this library are.

  • Generating random DataFrame and manipulation with index.
  • In bult tools to read and write the data between memory data structure and different file formats.
  • Data Preprocessing like handling missing data, data alignment and dropping columns etc.
  • Merging and Splitting data base.
  • Hierarchical axis indexing to work with high-dimensional data in a lower-dimensional data structure.

This are the few important literary features mentioned above. The few code lines as shown below

  1. If we want to use the pandas library needs to import.
import pandas as pd

2. Creating a DataFrame with 2 columns label as “numbers” and “chars”, data for label number is [0,1,2,3,4,5,6,7] and data for label chars is [g] for all row.

df = pd.DataFrame({'numbers': range(8), 'chars':['g']*8})
Fig 2: DataFrame

3. Clearing the memory occupied by the dataFrame.

df = []

4. Creating a DataFrame by reading data from a file. The data file could be in any formate like CSV, EXL, IMG, etc.

Data = pd.read_csv('adult_data_mini.csv', header=0)

5. To know the Size of the DataFrame.

Data.shape

6. Quick view of the DataFrame structure (first 5 rows).

Data.head()

7. Quick view of the DataFrame structure (last 5 rows).

Data.tail()

8. Add a New line with all the data information using Append-method.

New_Line = {'Column_name1' : 'data',
column_name2 : 'data'}
#example
New_line = {'numbers':'8','chars':'g'}
df.append(New_line, ignore_index = True)
Fig 2: Appending data to existing DataFrame

9. Delete the tenth and twelfth roes by drop-method. For rows: Axis = 0 ; for columns Axis = 1 Inplace=True is for save changing in DataFrame

#Dropping rows
Data.drop([index], axis=0, inplace=True)
#Example dropping two rows of index 3 and 5
Data.drop([3,5], axis=0, inplace=True) 

#DropPing Columns
Data.drop([column_name], axis=1, inplace=True)
# Example dropping a whole column.
Data.drop(['numbers'], axis=1, inplace=True)

10. Add a column with “label” and “data entry”

Data[‘new column label’] = data elements
#Example creating a new column "condition" with data entry as False for first 3 rows and rest all are True
Data['condition'] = [False]*3 + [True]

11. Information about columns(Data type of columns).

Data.dtype

12. More DataFrame information using info() function.

Data.info()

13. NaN (Not a Number) – it’s missing in the data. It is extremely inconvenient to work with missing data. fillnan() function will help to fill all the missing data. Here we can use the direct data to reflect in all the missing index or mean() to the all int and float datatypes and mode() for object datatypes.

Data[‘column_label’] = Data[‘column_label’].fillna(direct data)
#mean()
Data[‘column_label’] = Data[‘column_label’].fillna(Data['column_label'].mean())
#mode()
Data[‘column_label’] = Data[‘column_label’].fillna(Data['column_label'].mode()[0])

14. Demonstrating more then one columns.

Data[[‘column1_label’,’column2_label’]]

15. Display countable rows from first and last. For example of displaying first and last 4 roes.

#First 4 Rows.
Data[:4]

#Last 4 Rows.
Data[-4:]

16. Display the specified rows and columns.

Data.loc[[index_number],["column1_label","column2_label"]]

17. Conditional based row selection. Comparison operator (<,>,=,!=,<=,>=). Logical operator (&, |).

Data[(Data['column_label']<Comparison_operator>)<Logical_operative> (Data['column_label'] <Comparison_operator> )]

18. Searching for unique elements in particular column using “.unique()” function.

Data[‘column_label’].unique()

19. Column analysis: The “.value_counts()” function counts the number of unique entries.

Data[‘column_label’].value_counts()

20. The “sort_values()” function used to sort the Dataset. Inplace = false by default; if the inplace = True the dataset will save with a new updater.

Data.sort_values(by=’column_label’)

For more information please visit the pandas official site. https://pandas.pydata.org/docs

Leave a Reply

Your email address will not be published. Required fields are marked *