Judging From Experience: Things You NEVER Do To Big Databy QArea Expert on May 15, 2015
We have been through many projects as a company that provides outsourcing solutions. In some worst case scenarios we had to fix things other teams before us failed at. It seems quite odd that many businesses choose price over quality and outsource to cheapest Indian service providers without even a glance at their portfolio or qualifications.
No, I do understand we cannot be called the only fish in the sea, but seriously, why choose a partner that will most certainly fail a project? You can always tell about such unqualified businesses from others, by their experience, website, pricing policy, ways of communications, etc. However, that’s not our point right now. What were the worst things we have seen people do to something as vital as Big Data?
- Using MongoDB as the platform of choice is just wrong in many ways. It’s not that Mongo is all terrible, no. It is really sweet at numerous things if it’s your operational base. And still it’s a terrible analytical system. In simple words you don’t analyze with Mongo, but you can collect data for further analysis with it.
- Data ponds are a bad decision. Divide and conquer does not work with Big Data in a way you would have expected. If every business group will create a personal data pond on your way to the data lake creation you will end up with results that are not as good as you wish them to be. Data will get changed, shifted and manipulated leading to you having multiple answers to the same question at the end when all data is collected together. You see, dividing data is not bad, but making too many separate ponds is terrible. Plan ahead, but don’t try to structure every single detail. Go for most general queries.
- SQL is not the only possible solution and everything related to Big Data cannot be achieved with SQL only. Hive, MapReduce, Pig, Uzi; all of them were created with a purpose so not using them is mere stupidity and thickheaded behavior.
- Did you even know that HDFS is in no way a file system? It’s not that you dump some files in it and you are done. I mean, sure there are various tools that assist in multiple things like Hive or Pig that were mentioned earlier but it really is not an excuse to just mindlessly dump everything in without second thoughts. Big Data simply does not work that way. You necessarily have to plan what you are putting data into and why are you doing so. Also there are security measures that mean you must know what to protect.
Surely this is in no way a complete list of horrible things you can allow to be done with Big Data, but I, personally, hate those the most and I do advise you: don’t allow this to happen. It will never lead to any good.