Global Knowledge

Doug Cutting is the creator and founder of Apache Hadoop. He was also the creator of many open-source projects like Nutch, Lucene, and other successful ones. Cloudera was started by Doug Cutting in 2009 when he joined Yahoo as an architect. Doug is also the chairman and founder of Apache Software Foundation. This foundation supports open-source projects by providing free software products to a large user base.
We spoke with him about his role at Cloudera as well as the training options for IT professionals interested in Big Data. We also discussed how Cloudera compares to Red Hat.
Global Knowledge: When did Apache Hadoop become a breakthrough technology for you? Was there an “Aha!” moment? moment?
Doug Cutting (Doug). It was obvious to me immediately, and I realized that it would allow us to do things we could not before. Although I was very interested in search engines, I didn’t consider other applications. I was just trying to achieve what I wanted. It would have been useful for other purposes, if someone had asked, but I wasn’t thinking about challenging database technology.
Global Knowledge: Big Data has a lot to offer. What are the biggest myths and limitations of this technology?
Doug: Some believe Big Data is Big Brother. It’s a method of analysing everything and then using it. It is often confused by the cloud. Although Hadoop technology can be used to analyze and collect more data, it does not actually perform the analysis. It is only plumbing that allows people access that data.
Global Knowledge: Is it divided into self-hosted or hosted?
Doug: Cloudera’s vast majority are self-hosting customers. Many people begin hosting their clusters in cloud. As they use them more, they may decide to move them in-house. It can be very costly to host your cluster in the cloud if you use it all day. Cloud is not a one-size-fits all approach to Big Data. It can be difficult to transfer data from your data center to the cloud. It is possible to have some data there, and not others.
Global Knowledge: Cloudera is often used to compare it to Red Hat’s older Big Data version. Do you think this is fair? Or do you believe there are real business differences between the two?
Doug: That’s a great analogy. We are starting to see Hadoop become a kernel for a distributed OS. Cloudera is a packager that works with a distribution built around the kernel. This is a different area as we are talking about an operating platform that runs on top of Linux and runs across many machines. It’s not a solution we offer, but a platform that can be used for general purposes. We work with other vendors to ensure the platform works and promote it.
Global Knowledge: How do you maintain technology that is accepted by traditional software vendors such as IBM and HP? What does this relationship look like and what effect does it have for Apache Hadoop?
Doug: It’s been amazing. It was amazing to see Hadoop become a part of the Microsoft and Oracle platforms so quickly. I was hoping that they would develop their own technology and compete. Instead, they chose to join the open-source community to help build Hadoop. It’s been a great experience for the Hadoop community.