What are data files?
In Data Science, we use the term “data files” to refer to the raw data that we need to analyze. These files can contain structured or unstructured data, including text, numbers, images, or videos. Depending on the type of data, we can use different file formats such as CSV, Excel, JSON, XML, or HDF5.Reading CSV files
CSV (Comma Separated Values) files are one of the most popular file formats for storing data. Python’s Pandas library provides many functions to read CSV files. We can use the ‘read_csv’ function to read a CSV file and store it in a Pandas DataFrame. Here’s an example code:import pandas as pddata = pd.read_csv('filename.csv')
After executing this code, ‘data’ becomes a DataFrame containing the data from the CSV file.Reading Excel files
Excel files are another common file format used to store data. We can read Excel files in Python with the help of the ‘openpyxl’ and ‘xlrd’ libraries. These libraries provide functions to read and write Excel files. Here’s an example code using the ‘xlrd’ library:import xlrdworkbook = xlrd.open_workbook('filename.xlsx')worksheet = workbook.sheet_by_index(0)for row in range(worksheet.nrows): for col in range(worksheet.ncols): cell_value = worksheet.cell(row, col).value print(cell_value)
This code reads the data from the first sheet of the Excel file and prints it on the console.Reading JSON files
JSON (JavaScript Object Notation) is a text-based file format, used for storing data in a key-value pair format. Python’s built-in ‘json’ library provides functions to read and write JSON files. Here’s an example code that reads a JSON file:import jsonwith open('filename.json', 'r') as f: data = json.load(f)
After executing this code, ‘data’ becomes a Python dictionary containing the data from the JSON file.