eCommerce data export/import – Clean Up Customer Address

When migrating from one eCommerce platform to another, one of the most tedious task is data migration. While orders usually reside in your CRM so there is not need for migration, customer data however needs to be migrated such as their name, email, phone, address and marketing email/SMS subscription status. Among which, address is often filled with typos and other illegal characters making this almost an impossible task if you have a large database. This is largely a criticise to Magento 2 as even at 2.4 there is not proper in build method for address validation!

Customer data are valuable to eCommerce businesses so I’m going to explain how I tackle this step by step:

In your customer export file (I actually prefer to use order export file, then get the address from there but it does require a custom order export module), you can try filter the problematic address by one of the illegal characters then cycle through each one.

Joint the address fields and request ChatGPT to correct the address for you. It will sometimes unhappy about the data processing, you just need to be creative when dealing with AI tools such as using reverse psychology!

Once you get the returned address, paste it back to Excel and you can remove the ordered list number you using below Excel function:

=SUBSTITUTE(A1, LEFT(A1, FIND(".", A1)), "")

This formula removes the first number and period from the address while keeping the rest of the text.

Your rectified address will look like this:

Address 1, Address 2, City, State, Postcode, Country.

The problem is they are still in one cell, so you will need below Excel functions to split the whole address into multiple fields. The default text to column is limited because you don’t always have the address field 2 available.

=IF(ISNUMBER(FIND(",", I2)), TRIM(RIGHT(SUBSTITUTE(I2, ",", REPT(" ", LEN(I2))), LEN(I2))), "") 

This formula will extract the text after the last comma and remove any leading or trailing spaces. If a comma is not found, the formula returns an empty string (“”).

Once the address field is split into a new column, you need to remove it from the existing cell:

=IF(ISNUMBER(FIND(",", I2)), LEFT(I2, FIND("@", SUBSTITUTE(I2, ",", "@", LEN(I2) - LEN(SUBSTITUTE(I2, ",", ""))))-1), I2) 

This removes everything after the last comma in a cell. If a comma is not found, the formula simply returns the original cell value (I2) without any modification. You can then repeat above two steps until all the address fields are in separate fields.

So basically here is what we need to do in the address clean up process:

  1. filter problematic addresses
  2. combine all address fields into one address
  3. correct the problematic addresses by using AI tools such as ChatGPT, you can also use Python to do this by query against Google Map API.
  4. ChatGPT ususally return a list of data with trailing ordered list number, you need to remove that first.
  5. Use the last two functions to split the address into all the required address fields.

Now after many hours of work, you have a cleaned address data ready to be imported to your new eCommerce platform and that’s a job well done!

By Ethan

To many, a business is a lifetime commitment. It's easy to start one yet difficult to make it successful. Attitude, skills, experiences and dedication help hone the craft along the way, but it's often the great vision and resilience to remain focused wins the game. Read more about me here