After reading a data in R, the 1st thing we should do before processing/enriching/ preparing the data in require format is to retain the required columns that can be used further and remove rest of the columns. This will improve the performance in the subsequent steps.
There could be 2 scenarios
- Dropping list of columns from a data frame
- Keeping required columns
Here is an example of data frame (Testdata)
Scenario 1 – (Dropping/deleting) list of columns from a data frame)
Method 1: Delete column by name
We are going to delete/drop Vendor Type and Country
df= subset(Testdata, select = -c ( Vendor Type, Country))
Note: “-“ sign indicates dropping variables
Make sure the variable/column names should not specify in a quote when using () function
Method 2: Delete column by column index number
We are going to delete/drop Payment ID, Country and Sales
df = subset (Testdata[-c(2,4:5)]
Note: 2,4,5 are position of the variable in the data frame
Method 3: Delete columns by index number using “dplyr” package (install package “dplyr” )
We are going to delete/drop Payment ID, Country and Sales
df = select (Testdata,-2,-4:-5]
Note: 2,4,5 are position of the variable in the data frame
Method 4: Delete columns by name using “dplyr” package (install package “dplyr”)
We are going to delete/drop Payment ID, Country and Sales
There are 2 ways:
df = select (Testdata, -Payment ID, - Country, - Sales] (or) df = select (Testdata, -C (Payment ID, Country, Sales)]
Method 5: Delete columns whose name starts with e.g. “Vendor”
df = Testdata[,!grepl(“^Vendor”, names(Testdata))]
Note: “!” sign indicates negation, It will retain column Payment ID, country, sales and invoice number
Method 6: Delete columns whose name ends with letter “e”
df = Testdata[,!grepl(“e$”, names(Testdata))]
Note: This will remove column that end with “e” that is “Vendor Name” and “Vendor Type”
Method 7: Delete columns whose name contains “Type”
df = Testdata[,!grepl(“*Type”, names(Testdata))]
Note: This will remove column that end with “Type” that is “Vendor Type”
Scenario 2 – (Keeping required columns in a data frame)
Method 1: Keep column by name
We are going to keep Vendor Type and Country
df= subset(Testdata, select = c ( Vendor Type, Country))
Note: Make sure the variable/column names should not specify in a quote when using () function
Method 2: Keep column by column index number
We are going to keep Payment ID, Country and Sales
df = subset (Testdata[c(2,4:5)]
Note: 2,4,5 are position of the variable in the data frame
Method 3: Keep columns by index number using “dplyr” package (install package “dplyr” )
We are going to keep Payment ID, Country and Sales
df = select (Testdata,2,4:5]
Note: 2,4,5 are position of the variable in the data frame
Method 4: Keep columns by name using “dplyr” package (install package “dplyr”)
We are going to keep Payment ID, Country and Sales
There are 2 ways:
df = select (Testdata, Payment ID, Country, Sales] (or) df = select (Testdata, C (Payment ID, Country, Sales)]
Method 5: Keep columns whose name starts with e.g. “Vendor”
df = Testdata[,grepl(“^Vendor”, names(Testdata))]
Method 6: Keep columns whose name ends with letter “e”
df = Testdata[,grepl(“e$”, names(Testdata))]
Note: This will retains columns that end with “e” that is “Vendor Name” and “Vendor Type”
Method 7: keep columns whose name contains “Type”
df = Testdata[,grepl(“*Type”, names(Testdata))]
Note: This will retain column that end with “Type” that is “Vendor Type”