Managing Test Data All The Way To Production

From CitconWiki
Jump to navigationJump to search

Basic notes -> needs to be edited for readability:

very common to write tests against sample data but the won't deploy until run against more real data

the big concern is that in dev/CI there is a certain type of data vs the prod like which we not have access to

dream is to have an oracle type thing which can go get results from a prod like data

another idea would be to have prod like data in dev

for every test, you can calculate/create your data and therefore expected results from real data

- expense of doing this is too high

how to handle if steps have different implementation in prod-like vs in dev

if the app logic is used for both data setup and testing, "false == false is true"

somethings only need to be run in dev/CI not in prod-like

flag individual tests as "can be run in production" and then bring in new steps for those that can't be

- identify the highest value thing to be tested at prod and implement it into prod

- tests continue to run against prod so if different types of data come into prod then it can turn red

can do random generation of data or pre-set data

- random can be slow so not worth it

- most people do pre-set data

question: why do we care about sandbox data vs "real" prod data?

- volume

- diversity

are tests for anything other than holding regression?

- this is the problem and why the deploy guys do not want to take your sandbox data based test

- need to build trust with prod by proving you are using prod-like data

is an imaginative QA useful if that has never happened in prod? do we care about that bug then?

- does it require that someone manually defines what data is required to be passing before going to prod?

is valuable to run automated tests against prod

- NO: more worthwhile to monitor prod because prod should find things naturally not thru automation

- NO: if something is caught in prod it will be pushed back to the CI/Local stage as a test

- YES: it is worth us finding it first

how to make effective sandbox data

- straight snap shots

-- problem is that it needs to stay in sync

-- must anonymize the data

- have a tool which can identify boundaries in a prod database to create the sandbox

- completely man made data

-- possibly with quickcheck(?) which can create your data based on rules

- create the relationships through the actual app

load testing can be handled against either prod-like or sandbox data

two different types of load testing needed

- a bottleneck in the backend code

- a traffic concerns with parallel computing