Our warehouse is the size of two Red Square and 5 floors high and works all year round and never sleeps – 24/7 364 days a year (the only day off is January 1). We store and service more than 8, 000, 000 products, with more than 300 operators on duty every day. They work with goods coming from all over the world, and collect orders for users from four countries: Russia, Ukraine, Belarus and Kazakhstan. On such a scale, business requires flawless automation.
Here I, Pasha Finkelstein – team leader of warehouse automation and development team – will tell you what open sourcesolution can become, if you add to it a good development team and a very concrete business goal.
Basic operation logic
The three main processes in any warehouse are: receiving, storage and shipping. Simplified, the cycle of our warehouse is as follows: initial identification, quality control, placement, selection and reservation for the order, search, sorting, packaging, transfer to the delivery service. When the customer returns the goods, the cycle repeats. Each physical entity involved in these processes has its own information representation, e.g. : truck, item, cabinet, parcel, packing material, container, etc. All significant movements and changes of goods status are translated into the accounting systems and absolutely every action of the goods within the warehouse is logged.
WMS (Warehouse Management System) controls the life cycle of each product in the warehouse, from the arrival of the truck with the goods supplier to the warehouse until the shipment of goods to the customer.
Specifics of fashion automation
Our company works in the field of fashion and lifestyle, which sets certain problems in the warehouse operation: goods may be fragile (glasses, watches), non-standard size (winter boots or jewelry), premium (in a special package), or have other specific characteristics, which the warehouse must take into account. Therefore, the use of manual labor in the storage areas cannot be completely abandoned.
All other processes are automated – receiving of goods, moving to the shipping area, sorting, packing and preparation for shipment. Each of these processes requires special equipment and operational process. Magic happens when all these processes "glue" together and start working together – thanks to our systems.
Any misstep in warehouse automation – whether it’s an interface that contributes to operator errors, a suboptimal process, etc. – is shipment delays, downtime of the entire complex, and huge losses. In addition, with every mistake we create a negative customer experience. That’s why it’s important for us to make sure the warehouse runs like clockwork.
Open source and the path to in-house development
In the opening phase, we were using an external warehouse. As our volumes grew, we began to realize that we needed full control over operational processes and a high rate of change in those processes, so we decided to move toward our own warehouse and development.
The main issue that came up at the time was to work out in all the details of the operational processes. Right down to where and how employees go, how many scans they do, etc. And over these processes it was necessary to deploy a WMS, which manages the operations and automates routine operations.
We started with an open source solution in Java and then decided to build our own development team, especially since we already had a suitable framework in place. We added more functionality, then started working on the core system: got rid of legacy and thick client, did refactoring, developed new services to support the operations processes.
Stages of automation
The major changes were made in "waves, " along with the restructuring of the processes themselves.
To date, it has gone through nine stages of modernization, and we don’t plan to stop there.
- In the first and second stages, we automated order shipping processes – added conveyors, product sorting logic, and automated pallet sorting of orders.
- The third and fourth stages focused on receiving processes : learned how to separate flows of incoming goods into different types and storage areas.
- Phase 5 added automated elevators between floors – so began work in the storage area.
- The sixth phase was the most critical, when we closed the receiving and shipping areas, thus looping all the automation.
- In the seventh and eighth phases, we made changes to the processes in the receiving area and added new areas, elevators and conveyors : we scaled up the automation we already had.
- Phase 9 attached a new building to the warehouse and integrated it with the existing automation system.
Our main technologies : Java, Postgres, Wildfly, Redis, ActiveMQ.
WMS is written in Java 8. But not so long ago we fixed the last module that was interfering with the transition to Java 11, we will update it in the near future.
The WMS is built on a server rack right in the warehouse. This gives us much more confidence that the WMS will work even if the power and/or internet is cut. The only thing that will suffer is the messages to the merchandise accounting system will come with a delay. As an application server WildFly is used, though not the latest version yet. Migration to the latest version is also in the plans. Everything is already written for the move, but we did not have time to conduct functional and load testing, and before the new year the load is relatively high. Also the tested ActiveMQ is used.
We store the data in PostgreSQL. The main entity in our system is obviously the product. Sometimes warehouse employees come up with workarounds to simplify their work, such as scanning the same barcode 50 times and just throwing the item itself by hand without scanning, without going into details whether it’s jeans or t-shirts, so we introduced labels that identify a specific item, supporting this in the infrastructure. The information about these items is exactly what’s stored in the 2 terabyte PostgreSQL database.
It’s not even the goods that take up most of the space there, it’s the auditing of the actions of the warehouse employees. As a business-critical system, the warehouse needs to know why something appeared in the system or disappeared – we can’t have non-traceable changes. Right now we’re thinking about bringing this part of the database into a separate entity in MongoDB.
Warehouse employee workstations are thin web clients. Somewhere in the early days of automation, it all worked on a thick client basis, which created some challenges, particularly with major releases involving interface changes : about 150 workstations had to be updated manually. This, and the fact that we could not release without downtime, put us under constraints – we could deploak no more than twice a week, in the early morning, when the night shift ends, which can by no means be called a convenient schedule. Now we have moved the WMS to the web and by the end of the year we will definitively do away with thick clients, which will make it very easy for us to change the user interface. The web and the clustering added at one stage removes restrictions on the frequency and timing of releases – already users will only know about releases if something has gone wrong.
There are some interesting "exotics" in our stock. For example, the one mentioned in Technoradar Haskell, in which the item sorter visualization backend is written (this is the kind of machine that can put items from one parcel together and give them to the operator to assemble). There’s a purely computational problem, which is conveniently solved in a functional style. Naturally, no one is going to use Haskell for any large-scale projects.
Another element of the warehouse that we mentioned in the Technoradar article – a self-written state machine that "keeps track" of the correct sequence of actions with each item. It, like the whole system, evolved iteratively, starting with a simple set of constraints. Now it’s a very handy thing, deeply integrated into our system. We hope to put it out in the near future in open source – maybe it will be useful for more than just us.
What is automation without equipment! The entire warehouse is entangled in a network of conveyors.
The above-mentioned item sorter works at the shipping stage, making it possible to sort tens of thousands of items collected from the stock for specific orders. The sorter has eliminated the need for our operators to travel all over the warehouse with a cart to collect the right items. Orders are split, each operator only collects goods from his floor (saving time on travel), and the sorter makes sure that goods from different floors get into the right orders automatically. The change in the operational process has made order picking four times faster and significantly reduced errors.
All the automated equipment is provided to us by our partner. They have their own system for the management of specific units, which is located in the server rack next to our WMS. There is a quite high-level integration between the systems – we communicate via SOAP. From our operational processes within the WMS we turn to their system, when we need, for example, to move the container with the goods from point A to point B. So from our system’s point of view all this automation looks pretty simple, despite its real internal complexity.
Of course, this apparent simplicity did not work right away. In the first stages of automation, we had a "mutual lapping" of technologies. Once the conveyor literally burned our product – the speed of the conveyor belt was too high, it "chewed up" the product and it burned, which blocked the assembly of other orders. Perhaps the toughest story happened at the start of automation, when we were launching phase one. Just yesterday the warehouse was completely manual, and today, after switching the switch, it should have become automatic. But it didn’t work: Due to an integration error the systems misinterpreted each other’s messages, which resulted in several days of warehouse downtime and multimillion losses for us.
The partner is now present in our warehouse, planning equipment placement with us when it comes to a new round of automation, helping to test new units.
Team and scrumban
The development of this entire system is now being handled by a team of 12 people. At one of the last stages in the peak of modernization, when separately automated processes had to merge into something whole, up to 20 developers alone were involved (that stage took 132 man-months and involved more than 1, 500 commits). But as the massive transformation ended, some people decided to learn Go or Python and moved on to other development teams.
In our team, we have "classic" project managers, combining the functions of a product and prodect on the IT side (on average, one PM for 5-6 people). His task is to communicate with our main customer – the warehouse represented by its director and the development department of operational processes. For our part, we mostly take care of technical upgrades – choosing the right stack, upgrades, etc. – And the guys on the warehouse side think about process optimization.
Sometimes we take time to "RD in the field" ourselves. Literally, we come to the warehouse, talk to shift supervisors and operators, find out what problems they have, what is convenient and inconvenient to work with. In other words, we do user experience research.
Thanks to this approach, for example, we have transformed the interface of the workplace employee who carries out the receipt of goods. Initially it was an enterprisingly complicated interface with lots of fields, buttons, and abbreviations instead of text explanations. But we tried to optimize the process, as well as the design, making it more like the main page of Google search – not as pretty, but very functional. The simpler the interface and the fewer choices the operator has about where to click and what to scan, the fewer errors (and the less time it takes to fix them).
And the accumulated knowledge of detail optimization now catches us at the most unexpected moments: once our team was sitting in a place and at one point almost all the participants stared at the sequence of the cashier’s actions. After about 40 seconds, a colleague voiced a common thought: "Not very optimal, can be simplified.
Although the relationship between the roles in our team is quite classical, we have chosen the scrumban development methodology.
We experimented a lot with methodologies, with the "input" data being non-standard. For example, we had rather rare releases. The aforementioned limit of two releases per week was in effect on the processes side, but in fact we were deploying much less frequently, on average once every two weeks. In addition, we had a hardware part of the warehouse automation, which is being developed by an external company on a pure waterfall, where all the changes are scheduled for two years ahead with all the necessary documentation. However, we could not follow their example: we needed to make some changes to the system on a regular basis, and forcing the customer to write a detailed job for each of them made no sense.
So scrumban is a compromise that suits everyone. We use an iterative process, but the sprint for us is the release. Once a month we meet with the customer and do release planning : we discuss what we roll out and in which week. Inside the sprint we implement a kanban – with a task backlog, progress, etc. It is true that this process is also gradually changing – for example, we do not have a kanban board. Simply, when one developer finishes his task, he is given the next one from the pool according to the plans for the next release and the competence of the developer himself.
We like this approach. It provides the necessary flexibility within iterations, and it gives the business customer predictability of the dates by which certain commits will be implemented. And we don’t really care what this methodology is called. The main thing is to make it work.
Not like the rest of us – by example of inventory and monitoring
In developing our operational processes, we started with the needs of our industry, so we have quite a few individual peculiarities.
A good example is inventory. It’s required by law to be done in the warehouse once a year, but our business requirements dictate that we monitor the stock more closely. First of all, we want to reflect on the website up-to-date information about availability, and secondly, the same up-to-date information is required by our B2B partners, the fashion brands. That’s why we take inventory every day, 364 days a year, shelf by shelf in the entire 5-story complex of several buildings. And this process is fully supported by our WMS – an off-the-shelf solution would be difficult to implement.
Inventory is now in the process of another update to make the process more efficient.
Another example of in-house development is monitoring. It is implemented through a web client and allows you to display and track very interesting metrics. Moreover, the visual representation of these metrics is important for us. In fact, monitoring is a simple graphical representation of a warehouse, where we can clearly see, in which places everything is working well and where there are problems (down to a specific operator). The main thing is that with this view we can understand why these problems occur.
KPI for warehouse workers and Redis
Implementing new technology, updates, refactoring – that’s all great. But our WMS works in real business, so there’s more than just those challenges. Part of our job is protecting against internal "hackers" – resourceful warehouse employees who invent new ways to meet KPIs to circumvent the task at hand.
For example, not so long ago we had to add Redis to the stack to prevent users from logging in from multiple workstations simultaneously and implement session timeout. The thing is that the warehouse workers had figured out that it is much more profitable to work under one login and get a bonus for exceeding KPIs than to increase their own productivity.
Since the business problem required changes in many different places in the system, it was a very interesting challenge from a technical point of view.
The surprises from the warehouse staff did not end there. Almost immediately after the release of the session we began to have a PostgreSQL crash. We searched for a few days for the reasons for the unexpected database degradation, until we discovered that it was, once again, a matter of ingenuity. One girl often went for a smoke. When she left the workplace, she would get kicked out of her session, and to log back in you had to find the shift supervisor and scan his badge. Reducing her wandering around the warehouse, she simply ripped off a barcode from one of the carts and taped the scanner button, setting it to scan that barcode at all times. And it might have gone unnoticed for a long time if the barcode wasn’t from a cart that contained 800 items. Each scan generated a huge SQL query to validate the items, which "killed" the database with such "internal DDoS". I had to take care of the restrictions on the number of scans per time unit and on the number of items in the cart.
There are quite a few such stories, and we are constantly faced with new ones. At the same time, the system must adapt to new conditions each time. In such situations, you can’t just limit yourself to administrative methods – what happened once may well happen again.
Where do we go from here?
Process optimization and warehouse automation seems impossible to complete. It has been going on in the company for 5 years now, and as I said above, even after stage 9 we are not going to stop. The company continues to expand in both B2C and B2B, so in the near future we are planning another big project – opening another warehouse, which will require either a major rewriting of the existing system or creating a similar one from scratch in a new location. And this is an interesting new challange at the junction of business, physical facilities, operational processes and technical solutions.