The edge cloud then computes a prediction using the last CPU value it received

However, it requires that the device and temperature sensor be co-located so that we can continuously update the regression coefficients. We, therefore, consider a second configuration that does not continuously update the coefficients using the most recent temperature data. We refer to this configuration as a “practical application” of our approach. For this configuration, we co-locate a temperature sensor with each device for a short, fixed period of time, which we refer to as the calibration period. We then remove the temperature sensor . We apply the regression coefficients from the calibration period to CPU measurements reported by the device to predict the outdoor temperature at the device. For the calibrated results, we use the algorithm above with minor modifications. The remote device transmits only its CPU measurement values via low power radio to the edge cloud every 5 minutes. The edge cloud keeps a CPU history from the device for the same duration as the calibration period. It smooths these values if necessary and chooses the best-performing smoothing parameterization using R2 .For the results in this paper, we compare this prediction against that from a co-located temperature sensor. However, we only use data from this co-located sensor to compute the prediction error after the devices have been “separated”. The locations include a residential backyard in Goleta, CA, an experimental citrus farm at the Lindcove Research and Extension Center in Exeter, CA, and an experimental almond farm on the campus of the California State University, in Fresno, CA. There are multiple Pi Zero devices at the Goleta location , a Pi Zero and Pi 3 at LREC, and a NUC at Fresno State . All devices are in shaded,vertical grow racks weather proof enclosures outdoors; the NUC is in a tin shed housing a powered irrigation pump next to the almond orchard.

Each location is very different in terms of its vegetation and topography. LREC is located in the foot hills of the Sierra mountains; the Fresno State farm is flat and in the central valley of California; and the Goleta residence is near the ocean. We measure atmospheric temperature using device-attached temperature sensors which we refer to as DHT for Goleta devices, a high end weather station at LREC called the Flux tower, and the nearest Weather Underground station in Fresno. We also consider a Weather Underground station for Goleta devices. We begin by examining the effect of smoothing on each regression as part of a “limit study”. To do so, we compare SSA and no smoothing over a number of different training window sizes. As described previously, we use the regression coefficients for the number of lags for SSA that results in the highest R2 value. We detail the effect of using smoothing and training window size to enhance regression on temperature prediction. At time step t we predict the outdoor temperature at time step t + 1 . Since an application may need the temperature at an arbitrary moment in time , this prediction error serves as an upper bound on the error that an application which is not time-synchronized with the measurement system might experience. We then compare different sources for predictions and we conclude with results showing the application of our approach in a practical IoT setting. In Figure 4.3 we show the Mean Absolute Error for the one-step-ahead prediction as a function of history size . Each graph compares the effect on prediction accuracy of different smoothing methods for the different locations and devices for a prediction period of 3 days. The x-axis is the training window size; the y-axis shows errors in ◦F. From the graphs in Figure 4.3, we see that SSA improves prediction accuracy compared to the absence of smoothing . In this study, 15 minutes corresponds to 3 measurements. When the temperature is slowly changing regression becomes numerically unstable . Compared to Fresno or Goleta, for example, the CPU temperatures at LREC is more stable since the devices are sited near a large irrigation reservoir. SSA smooths the previous 3 measurements more than the other methods, occasionally generating regression coefficients that are very large or numerically infinite as a result of trying to invert the covariance matrix.

Our system detects this condition and disables smoothing when it leads to a failed regression. Also, note that the errors are relatively small. All of the locations we have tested are located in California and during the prediction periods, the temperature varied from the mid 40s to the mid 80s ◦F. In each case, the MAE error is under 1 ◦F for a TW of 1 hour or less. The Arduino Uno produces the lowest error and the error does not grow with window size. We believe this isdue to the very consistent and slowly changing temperature of the location during the prediction period . The accuracy of our approach is similar regardless of location , or Fresno and source of ground truth temperature measurement , or WU for a TW of 1 hour or less.The data and analysis presented in the previous subsection show the minimum error that is possible. That is, they verify that it is possible to predict the outdoor temperature from the internal CPU temperature sensor with a high degree of accuracy in a variety of meteorological settings. To be practically useful, however, the technique must be able to predict outdoor temperature without the presence of an outdoor thermometer . That is, our goal is to investigate whether we can use the CPU temperature sensor as a replacement for a localized outdoor thermometer. Specifically, in a practical application of this technique, with no outdoor thermometer, it is not possible to perform a regression at each time step using the current outdoor temperature reading. Instead, our approach is to generate a regression coefficients from a calibration period that we then use over a later prediction period. We site single-board computers in each location with an attached DHT outdoor temperature sensor for a fixed, continuous calibration period. Then we remove the DHT sensor and estimate outdoor temperature from the computer’s CPU temperature using the regression coefficients we computed at the end of the calibration period. Table 4.1 shows the Mean Absolute Errors for different calibration periods and three Pi Zero devices.Pi1, Pi2,vertical hydroponics and Pi4 are all Raspberry Pi Zero single-board computers with externally attached DHT temperature sensors. All three were located in the same outdoor setting in Goleta, California. We chose a random date between January 1 st, 2018 and May 15th, 2018 in each case to use as the start of the test/prediction period. We installed the Arduino too late to include in this study, but we plan to include it once we collect sufficient data.

In each experiment, we use a trace of the DHT external measurements and the corresponding CPU measurements over a fixed calibration period to compute a set of regression coefficients. We then use the coefficients to predict the DHT measurements from the CPU measurements for the next two weeks following the calibration period. Columns 2 through 7 show the Mean Absolute Error during the measurement period immediately following calibration without smoothing and with SSA for the calibration regression. Thus, this table shows the errors when one set of regression coefficients is used to predict the next two weeks of outdoor temperature . While SSA improves the errors in the piece wise regression case , it is less effective when one set of coefficients must be used over a long period of time when a sufficiently long calibration period is available. Note that for some short calibration periods, SSA can improve accuracy, but only when there is sufficient variation to maintain numerical stability in the regression. Further,the calibration period should include at least one full diurnal cycle to be effective. Finally, the minimum error is consistently 1.3 ◦F or 1.4 ◦F. Finally, not all time periods during a diurnal cycle may be needed for certain applications. As part of SmartFarm, for example, we are developing a new algorithm for computing localized evapotranspiration Penman . ET is an often-used metric for computing crop water stress or water requirements and it is typically based on meteorological measurements that cover large areas . ET computations rely, in part, on the outdoor temperature measured during “solar max” – typically between noon and 3 PM in North America. Similarly, frost prevention using wind machines mixes warm air aloft with colder air that has settled near the ground during the nighttime hours . Thus, it may be that it is possible to obtain more accurate measurements by including only those hours that are of interest during a diurnal cycle. Tables 4.2 and 4.3 show the MAE for non-smoothed and SSA calibration using only data gathered from noon to 3 PM and from 10 PM to 7 AM respectively. We show only results for calibration periods of at least 24 hours since the calibration period must span at least one diurnal cycle. In most cases the best prediction improves when we use only the periods of interest for the regression. However, the improvements are small in absolute terms . We have yet to determine whether the additional accuracy is necessary for either localized ET calculation or frost prevention. Doing so is the subject of the on-going SmartFarm work that is leveraging this technique. In the prior sections, we use univariate linear regression to estimate the outdoor temperature based on the CPU temperature of a single co-located SBC.

In two of the experiments, the model does not perform as well when the training window is smaller than 6h or larger than one week . The reason for this is that because the technique uses computed estimates rather than actual measurement, it also introduces additional error beyond measurement error. We next investigate novel ways of reducing this error, explore the efficacy of alternative smoothing techniques, and evaluate the impact of processor load on prediction. To reduce this error, we consider processor temperature measurements from multiple SBCs , and outdoor temperature from a remote weather station, as possible predictors. We use the term processor and CPU interchangeably throughout. We deploy four Raspberry Pi Zero RPi devices equipped with temperature sensors, at different locations in an agricultural setting . We place a pair of RPis within 3 feet of each other, in two different trees, spaced 10 feet apart. Pi1 and Pi2 monitor tree #1 and Pi3 and Pi4 monitor tree #2. Each device is housed in an inexpensive plastic enclosure and has an on-board processor temperature sensor that is part of its hardware/software interface. The devices read their processor temperature sensor value every 5 minutes and can process, store, or wirelessly transmit their measurements. We label the measurements CPU-1, CPU-2, CPU-3, and CPU-4, for the CPUs of Pi1 through Pi4, respectively. The RPi devices then transmit the measurements to an on-farm computer for aggregation and analysis. Each RPi is additionally equipped with an AM2302 DHT22 digital temperature and humidity sensor Ada , which we use to measure ground truth. The devices read and transmit these values every 5 minutes along with their CPU temperature readings to a remote analysis system. We only use this DHT22 data as ground truth , i.e., it is not used as part of modeling or prediction. Finally, we also consider the use of freely available, high-end weather station data from the Internet weather service Weather Underground Weather Underground . The closest weather station is 2640 feet away from our field deployment. We collect the temperature reported by the Weather Underground station closest to the deployment site every five minutes . We align the measurements using the nearest timestamp. If there is data dropout, i.e, if one of the three temperature values is missing, we skip all measurements for that five-minute interval.To evaluate this approach, we analyze models with testing windows of size one hour to two weeks, which correspond to 12 and 4032 data points respectively. To measure error, we compute the mean absolute error  because of its direct utility in our IoT agriculture applications. In particular, we are interested in using the models to make predictions and not in their explanatory power.