The citrus industry as a whole collects huge amounts of data. In addition to numerous government programs, growers collect soil, variety, rootstock, yield and financial data to optimize production and returns. Different organizations focus on different kinds of data dictated by their needs. And now, more organizations have to share more knowledge between themselves if field solutions to problems such as HLB are going to be rapidly found and implemented.
Newsletters and reports are a common way to share knowledge between industry stakeholders. But, what if we want to examine relationships between sets of data housed at multiple organizations? All farms are part of a similar ecosystem and information collected across many farms can improve decision-making abilities. Consider the following questions:
- What are some specific relationships between soil, variety, rootstock and disease and yield data?
- Which nutrient management programs provide the best results for most growers?
- Does citrus grown on certain soil types have a higher brix, acid or ratio than on other soil types?
- Are unusual responses in a particular variety/rootstock/soil combination occurring around the state that I need to know about before planting?
Research certainly provides valuable guidance. But other than through a lot of copying and pasting, using precise data to answer these basic examples and dozens of other daily production questions seems to be getting more difficult. Many people refer to this precise data-collection and analysis activity as “precision agriculture”. The idea of precision agriculture has been around for more than 20 years, but it is only recently gaining a lot of traction. Two reasons for this are that the citrus landscape is changing more rapidly and the amount of field data is exploding. It seems like there’s simply not time enough to collect, organize and analyze data about a production problem before the situation changes. Is there a faster way to obtain the data and knowledge we need? Computers, drones, aerial imaging and other technologies can assist in the collection of quantitative data. Another answer lies in being able to work with unstructured data. To do this, we need to learn more about “Big Data”.
What is Big Data? You’ve probably heard the term, “Big Data” used to refer to massive sets of data, such as those collected by Google, Amazon, Facebook or even the government. Forbes provides what I think is one of the best definitions of Big Data because it gets to the core of what we need to do (emphasis added).
“Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.” (www.forbes.com/sites/lisaarthur/2013/08/15/what-is-big-data/).
There are privacy considerations with large-scale data sharing. We’ll talk about those in a minute. For now, though, one main advantage of Big Data I’d like to highlight is that it provides powerful tools to rapidly collect and analyze both structured and unstructured information in real-time. Let’s take a closer look at the difference between the two and how they can help solve HLB management problems.
Structured data In the citrus industry, data is typically “structured”, collected and then organized to facilitate analysis. Structured data is most easily thought of as that which you find in spreadsheets, tables and databases, and which is collected by drones, scouts or similar methods. Smaller growers tend to use spreadsheets, while larger companies may also use relational databases, custom software and GIS applications. Forward-thinking firms probably rely to a large extent on cloud-based software-as-a-service. For business reasons, different organizations structure data differently. The accounting department at Grower X is likely not going to use the same data structure that’s used by the Florida Department of Agriculture, for example. Nor would we expect them to do so. Locally-managed structured data is very useful, but it has limitations when trying to uncover new knowledge in rapidly changing environments. What the citrus industry needs to get better at is collecting and analyzing unstructured data, and being able to do so across many farms in the process. Information collected across different kinds of farms or ecosystems can even be designed to work together to provide valuable decision-making capabilities.
Unstructured data Unstructured data, is rich content that does not follow a specified format. In the citrus industry, unstructured data is used moreso for general communications than in data analysis programs per se. Examples of unstructured data include:
- Text files
- Audio files
- Video files
- Panel Discussions
- Phone Conferences
- Weekly Email Reports
Data from sources like these is fragmented and dispersed, but it is also growing rapidly. (Data can also be “multi-structured”, which occurs in a variety of formats derived from interactions between people and machines (e.g., a blog.)) All organizations have this kind of information, but it is not often treated as data that can be explored. Collectively, structured and unstructured data are often what people talk about when they mention Big Data, but there’s really much more to it than that.
Intel tells us that Big Data has three defining characteristics (the so-called, “Three V’s”): Volume, Variety and Velocity (intel.ly/1g5ZPsn). The rapid growth and scale of unstructured data is outpacing traditional sources and storage capabilities (Volume). The diversity of information is originating from materials not typically relied on as data sources in the past (Variety). And the information needs to be processed in real-time to obtain the greatest benefit (Velocity). The good news is that new technologies can harvest and parse information in these forms of data and apply methods of artificial intelligence to analyze it. Formal experiments are not needed to gain value from Big Data. Confidence provided in scientific experiments that use statistics could also be dealt with by including a huge variety of locations and organizations in a Big Data operation. Knowledge that might be undetectable in other ways can be revealed this way. This way, the effort moves away from figuring out how to organize massive amounts of information and instead the data are put to work for us through automation.
Importance of visual data Some organizations are breaking new ground in data analytics and are incorporating some principles of Big Data in their programs. One such firm is Plant Food Systems, Inc. (PFS). I believe PFS is set apart from others in their attention to detail in the design of their data systems and the accuracy of their photographic records. Drew Dyess, PFS Field Specialist, explains photo-documentation is an important part of their citrus tree health screening program. (PFS provides once-a-month field inspections to its customers at no cost.) Drew is working to develop a reporting protocol that combines accurately exposed photos of trees with visual inspections and data about responses to nutrition programs (Fig. 1). Parameters observed include vegetation, soil, roots, leaf and fruit drop, and more. PFS hopes to develop animations and other innovative techniques to show changes in tree condition over time. It can be difficult to describe these changes after the fact, but PFS’ visual approach can help growers more clearly understand season-to-season and year-to-year changes in disease symptoms and tree health. Currently, this PFS inspection program serves both large and small growers ranging from Apopka to Immokalee. And it can be used along with tissue, soil and yield data to identify the most profitable management programs.
Bringing Big Data to Citrus Two other types of unstructured data relied on by growers for decision-making include word-of-mouth and email. What if we could somehow harvest some of that communication traffic, collect it in a computer system and analyze it for solutions to HLB? One way to do so would be to create a software application that could take, say, once a week call-ins from growers in which they describe in a conversational way (similar to how you leave a voice message) what they did recently that gave good (or bad) results. Growers that didn’t want to call might prefer instead to forward an email to the system instead. (Many companies already produce weekly reports that could be forwarded here.)
Our app would transcribe the calls and emails using “Natural Language Processing” and parse the information so that pertinent data is harvested. A neural network based on artificial intelligence (similar to what is used to detect credit card fraud) could analyze the transcribed data looking for key words and trends (“text analytics”). Eventually what’s working best for most growers would bubble to the top. When growers call in or forward email, they could mention the varieties and rootstocks, insecticides and nutritional product lines giving them the best results. They simply describe what they are doing just as they would to a colleague, and the software parses the information into a form suitable for analysis. Web reporting, of course, would be part of the ultimate presentation of what is discovered here. Growers would have an interface to query, mine, search and report on the information developed by the neural network.
Technology similar to this is already used in many areas, including the healthcare field to identify treatment solutions (“prescriptive analytics”) (www.ibm.com/smarterplanet/us/en/ibmwatson/). In citrus, our system would involve growers making a single five minute phone call per week or simply taking a moment to forward an email they had already written anyway. Participation in this kind of system would be easy, involves minimal computer work on the part of the growers, and it gets information from a lot of people and organizations into a collection so that data can be shared without sacrificing privacy. The app could even be improved on when it’s developed (registration, obfuscation, call quotas, other data sources). I’ve described it as related to field management, but it can also work with mid- and long-term solutions.
Privacy and Big Data Some organizations may not wish to share in-house corporate data in a public forum. There are valid privacy reasons why a grower would not want to make financial or insect survey data from his groves available to others, for example. Several weeks ago, in fact, farmers from around the nation met in Washington and a main topic was farmer involvement in Big Data programs. A number of companies providing products and services to the agricultural industry expressed concern about the amount and type of data that might become public. Methods for protecting privacy and business interests must be a part of any discussion centering on data sharing and exploration. Whatever program is developed, participation should be voluntary. This way, a balance can be found between data that can accelerate HLB solutions and data that must be kept confidential.
Getting serious about Big Data According to IBM, there are three steps an organization or industry should take to take advantage of what Big Data has to offer:
- Build a culture that infuses Big Data analytics into daily operations;
- Be proactive about privacy, security and governance;
- Get serious about Big Data and invest in a platform suitable for the task.
Tom Turpen, Program Manager for the Citrus Research and Development Foundation tells us, “The only way to speed new solutions for HLB is for both researchers and growers to get better at sharing information. There is not enough time for traditional development cycles with extensive greenhouse testing and trials that may not extrapolate to commercial production practices.” Growers need to understand Big Data and what it can do for them. One of its most important features is reducing the time it takes to generate useable knowledge from large amounts of data. Big Data can reveal what you don’t know. The industry needs to adopt new ways of working with multi-structured data as a valuable information asset. Growers need to become part of the data economy and start participating in Big Data programs to more efficiently collect and share what they find with others.
Author disclaimer The information in this article is provided “as is”. The author and publisher of this article disclaim any loss or liability, either directly or indirectly as a consequence of reading or applying any information presented herein, or in regard to the dissemination, use or application of said information for any purpose whatsoever. No guarantee is given, either expressed or implied, in regard to the accuracy, merchantability or acceptability of this information.